# Home 5: Build a seq2seq model for machine translation.

### Name: [Amir Morcos]

### Task: Translate English to [Dutch]

## 0. You will do the following:

1. Read and run my code.
2. Complete the code in Section 1.1 and Section 4.2.

    * Translation **English** to **German** is not acceptable!!! Try another pair of languages.
    
3. **Make improvements.** Directly modify the code in Section 3. Do at least one of the two. By doing both correctly, you will get up to 1 bonus score to the total.

    * Bi-LSTM instead of LSTM.
        
    * Attention. (You are allowed to use existing code.)
    
4. Evaluate the translation using the BLEU score. 

    * Optional. Up to 1 bonus scores to the total.
    
5. Convert the notebook to .HTML file. 

    * The HTML file must contain the code and the output after execution.

6. Put the .HTML file in your Google Drive, Dropbox, or Github repo.  (If you submit the file to Google Drive or Dropbox, you must make the file "open-access". The delay caused by "deny of access" may result in late penalty.)

7. Submit the link to the HTML file to Canvas.    


### Hint: 

To implement ```Bi-LSTM```, you will need the following code to build the encoder. Do NOT use Bi-LSTM for the decoder.

In [1]:
#from keras.layers import Bidirectional, Concatenate

#encoder_bilstm = Bidirectional(LSTM(latent_dim, return_state=True, 
#                                  dropout=0.5, name='encoder_lstm'))
#_, forward_h, forward_c, backward_h, backward_c = encoder_bilstm(encoder_inputs)

#state_h = Concatenate()([forward_h, backward_h])
#state_c = Concatenate()([forward_c, backward_c])

## 1. Data preparation

1. Download data (e.g., "deu-eng.zip") from http://www.manythings.org/anki/
2. Unzip the .ZIP file.
3. Put the .TXT file (e.g., "deu.txt") in the directory "./Data/".

### 1.1. Load and clean text


In [2]:
import re
import string
from unicodedata import normalize
import numpy

# load doc into memory
def load_doc(filename):
    # open the file as read only
    file = open(filename, mode='rt', encoding='utf-8')
    # read all text
    text = file.read()
    # close the file
    file.close()
    return text


# split a loaded document into sentences
def to_pairs(doc):
    lines = doc.strip().split('\n')
    pairs = [line.split('\t') for line in  lines]
    return pairs

def clean_data(lines):
    cleaned = list()
    # prepare regex for char filtering
    re_print = re.compile('[^%s]' % re.escape(string.printable))
    # prepare translation table for removing punctuation
    table = str.maketrans('', '', string.punctuation)
    for pair in lines:
        clean_pair = list()
        for line in pair:
            # normalize unicode characters
            line = normalize('NFD', line).encode('ascii', 'ignore')
            line = line.decode('UTF-8')
            # tokenize on white space
            line = line.split()
            # convert to lowercase
            line = [word.lower() for word in line]
            # remove punctuation from each token
            line = [word.translate(table) for word in line]
            # remove non-printable chars form each token
            line = [re_print.sub('', w) for w in line]
            # remove tokens with numbers in them
            line = [word for word in line if word.isalpha()]
            # store as string
            clean_pair.append(' '.join(line))
        cleaned.append(clean_pair)
    return numpy.array(cleaned)

#### Fill the following blanks:

In [3]:
# e.g., filename = 'Data/deu.txt'
filename = 'Data/nld.txt'

# e.g., n_train = 20000

n_train = 35000

In [4]:
# load dataset
TestSize = 100
doc = load_doc(filename)

# split into Language1-Language2 pairs
pairs = to_pairs(doc)
rand_indices = numpy.random.permutation(n_train+TestSize)


# clean sentences
clean_pairs = clean_data(pairs)[rand_indices[0:n_train], :]
Test_pairs = clean_data(pairs)[rand_indices[n_train:n_train+TestSize], :]

In [5]:
for i in range(3000, 3010):
    print('[' + clean_pairs[i, 0] + '] => [' + clean_pairs[i, 1] + ']')

[im a girl] => [ik ben een meisje]
[the birds wing was broken] => [de vogel had een gebroken vleugel]
[turtles hibernate] => [schildpadden overwinteren]
[you stay where you are tom] => [blijf waar je bent tom]
[he worked hard] => [hij heeft hard gewerkt]
[tom lived in australia] => [tom woonde in australie]
[tom cant help you now] => [tom kan u nu niet helpen]
[that will cost thirty euros] => [dat wordt dan dertig euro]
[see you soon] => [tot weerziens]
[im studying french] => [ik ben frans aan het studeren]


In [6]:
input_texts = clean_pairs[:, 0]
target_texts = ['\t' + text + '\n' for text in clean_pairs[:, 1]]

print('Length of input_texts:  ' + str(input_texts.shape))
print('Length of target_texts: ' + str(input_texts.shape))

Length of input_texts:  (30000,)
Length of target_texts: (30000,)


In [7]:
max_encoder_seq_length = max(len(line) for line in input_texts)
max_decoder_seq_length = max(len(line) for line in target_texts)

print('max length of input  sentences: %d' % (max_encoder_seq_length))
print('max length of target sentences: %d' % (max_decoder_seq_length))

max length of input  sentences: 29
max length of target sentences: 58


**Remark:** To this end, you have two lists of sentences: input_texts and target_texts

## 2. Text processing

### 2.1. Convert texts to sequences

- Input: A list of $n$ sentences (with max length $t$).
- It is represented by a $n\times t$ matrix after the tokenization and zero-padding.

In [8]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# encode and pad sequences
def text2sequences(max_len, lines):
    tokenizer = Tokenizer(char_level=True, filters='')
    tokenizer.fit_on_texts(lines)
    seqs = tokenizer.texts_to_sequences(lines)
    seqs_pad = pad_sequences(seqs, maxlen=max_len, padding='post')
    return seqs_pad, tokenizer.word_index


encoder_input_seq, input_token_index = text2sequences(max_encoder_seq_length, 
                                                      input_texts)
decoder_input_seq, target_token_index = text2sequences(max_decoder_seq_length, 
                                                       target_texts)

print('shape of encoder_input_seq: ' + str(encoder_input_seq.shape))
print('shape of input_token_index: ' + str(len(input_token_index)))
print('shape of decoder_input_seq: ' + str(decoder_input_seq.shape))
print('shape of target_token_index: ' + str(len(target_token_index)))

Using TensorFlow backend.


shape of encoder_input_seq: (30000, 29)
shape of input_token_index: 27
shape of decoder_input_seq: (30000, 58)
shape of target_token_index: 29


In [9]:
num_encoder_tokens = len(input_token_index) + 1
num_decoder_tokens = len(target_token_index) + 1

print('num_encoder_tokens: ' + str(num_encoder_tokens))
print('num_decoder_tokens: ' + str(num_decoder_tokens))

num_encoder_tokens: 28
num_decoder_tokens: 30


In [10]:
print(input_token_index)

{' ': 1, 'e': 2, 'o': 3, 't': 4, 'i': 5, 'a': 6, 's': 7, 'n': 8, 'h': 9, 'r': 10, 'l': 11, 'd': 12, 'y': 13, 'm': 14, 'u': 15, 'w': 16, 'c': 17, 'g': 18, 'p': 19, 'k': 20, 'f': 21, 'b': 22, 'v': 23, 'j': 24, 'x': 25, 'z': 26, 'q': 27}


In [11]:
print(target_token_index)

{' ': 1, 'e': 2, 'n': 3, 'i': 4, 't': 5, 'a': 6, 'o': 7, '\t': 8, '\n': 9, 'r': 10, 'd': 11, 's': 12, 'k': 13, 'l': 14, 'h': 15, 'm': 16, 'g': 17, 'j': 18, 'u': 19, 'w': 20, 'b': 21, 'v': 22, 'z': 23, 'p': 24, 'f': 25, 'c': 26, 'y': 27, 'x': 28, 'q': 29}


**Remark:** To this end, the input language and target language texts are converted to 2 matrices. 

- Their number of rows are both n_train.
- Their number of columns are respective max_encoder_seq_length and max_decoder_seq_length.

The followings print a sentence and its representation as a sequence.

In [12]:
target_texts[100]

'\ter is telefoon voor je\n'

In [13]:
decoder_input_seq[100, :]

array([ 8,  2, 10,  1,  4, 12,  1,  5,  2, 14,  2, 25,  7,  7,  3,  1, 22,
        7,  7, 10,  1, 18,  2,  9,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0])

## 2.2. One-hot encode

- Input: A list of $n$ sentences (with max length $t$).
- It is represented by a $n\times t$ matrix after the tokenization and zero-padding.
- It is represented by a $n\times t \times v$ tensor ($t$ is the number of unique chars) after the one-hot encoding.

In [14]:
from keras.utils import to_categorical

# one hot encode target sequence
def onehot_encode(sequences, max_len, vocab_size):
    n = len(sequences)
    data = numpy.zeros((n, max_len, vocab_size))
    for i in range(n):
        data[i, :, :] = to_categorical(sequences[i], num_classes=vocab_size)
    return data

encoder_input_data = onehot_encode(encoder_input_seq, max_encoder_seq_length, num_encoder_tokens)
decoder_input_data = onehot_encode(decoder_input_seq, max_decoder_seq_length, num_decoder_tokens)

decoder_target_seq = numpy.zeros(decoder_input_seq.shape)
decoder_target_seq[:, 0:-1] = decoder_input_seq[:, 1:]
decoder_target_data = onehot_encode(decoder_target_seq, 
                                    max_decoder_seq_length, 
                                    num_decoder_tokens)

print(encoder_input_data.shape)
print(decoder_input_data.shape)

(30000, 29, 28)
(30000, 58, 30)


## 3. Build the networks (for training)

- Build encoder, decoder, and connect the two modules to get "model". 

- Fit the model on the bilingual data to train the parameters in the encoder and decoder.

### 3.1. Encoder network

- Input:  one-hot encode of the input language

- Return: 

    -- output (all the hidden states   $h_1, \cdots , h_t$) are always discarded
    
    -- the final hidden state  $h_t$
    
    -- the final conveyor belt $c_t$

In [15]:
from keras.layers import Input, LSTM, Bidirectional, Concatenate
from keras.models import Model

latent_dim = 128*3 #was 256

# inputs of the encoder network

encoder_inputs = Input(shape=(None,num_encoder_tokens), name='encoder_inputs')
#if I need to go back to a single layer, return state NOT sequences

encoder_bilstm1 = Bidirectional(LSTM(latent_dim, return_sequences=True, 
                                     dropout=0.25, name='encoder_bilstm1'))(encoder_inputs)

_, forward_h, forward_c, backward_h, backward_c  = Bidirectional(LSTM(latent_dim, return_state=True, dropout=0.25,
                                                                      name='encoder_bilstm2'))(encoder_bilstm1)
state_Eh = Concatenate()([forward_h, backward_h])
state_Ec = Concatenate()([forward_c, backward_c])

#encoder_states = [state_Eh, state_Ec] # encoder_output discarded

# build the encoder network model
encoder_model = Model(inputs=encoder_inputs, 
                      outputs=[state_Eh, state_Ec],
                      name='encoder')

After struggling with adding a 2nd layer to the decoder, I found this link and was able to better follow how the model is constructed
https://github.com/google/seq2seq/issues/320

Print a summary and save the encoder network structure to "./encoder.pdf"

In [16]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(encoder_model, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=encoder_model, show_shapes=False,
    to_file='encoder.pdf'
)

encoder_model.summary()

Model: "encoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_inputs (InputLayer)     (None, None, 28)     0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 768)    1268736     encoder_inputs[0][0]             
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) [(None, 768), (None, 3542016     bidirectional_1[0][0]            
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 768)          0           bidirectional_2[0][1]            
                                                                 bidirectional_2[0][3]      

### 3.2. Decoder network

- Inputs:  

    -- one-hot encode of the target language
    
    -- The initial hidden state $h_t$ 
    
    -- The initial conveyor belt $c_t$ 

- Return: 

    -- output (all the hidden states) $h_1, \cdots , h_t$

    -- the final hidden state  $h_t$ (discarded in the training and used in the prediction)
    
    -- the final conveyor belt $c_t$ (discarded in the training and used in the prediction)

In [18]:
from keras.layers import Input, LSTM, Dense
from keras.models import Model

# inputs of the decoder network
decoder_input_h = Input(shape=(latent_dim*2,), name='decoder_input_h')#we ad *2 because we made the encoder bidirectional
decoder_input_c = Input(shape=(latent_dim*2,), name='decoder_input_c')
decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')

encoder_state = [decoder_input_h,decoder_input_c]

# set the LSTM layer
decoder_lstm1_layer = LSTM(latent_dim*2, return_sequences=True, 
                                    return_state=True, dropout=0.25, name='decoder_lstm1')
decoder_lstm1_layer_output, state_hd1,state_cd1= decoder_lstm1_layer(decoder_input_x, initial_state=encoder_state)


#decoder_lstm2_layer = LSTM(latent_dim*2,return_sequences=True,  
#                                    return_state=True, dropout=0.25, name='decoder_lstm2')
#decoder_lstm2_layer_output, state_hd2,state_cd2 = decoder_lstm2_layer(decoder_lstm1_layer_output)
                                                #May need to set initial state 

#Skip_A_Few = Concatenate()([decoder_lstm2_layer_output, decoder_input_x])

#Fully_Connected1 = Dense(512, activation='relu', name='Fully_Connected1')
#Fully_Connected_Out1 = Fully_Connected1(decoder_lstm2_layer_output) 

#Fully_Connected2 = Dense(512, activation='relu', name='Fully_Connected2')
#Fully_Connected_Out2 = Fully_Connected2(Fully_Connected_Out1) 


decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_lstm1_layer_output)

# build the decoder network model
decoder_model = Model(inputs=[decoder_input_x, decoder_input_h, decoder_input_c],
                      outputs=[decoder_outputs, state_hd1, state_cd1],
                      #outputs=[decoder_outputs],
                      name='decoder_model')


Print a summary and save the encoder network structure to "./decoder.pdf"

In [19]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(decoder_model, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=decoder_model, show_shapes=False,
    to_file='decoder.pdf'
)

decoder_model.summary()

Model: "decoder_model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
decoder_input_x (InputLayer)    (None, None, 30)     0                                            
__________________________________________________________________________________________________
decoder_input_h (InputLayer)    (None, 768)          0                                            
__________________________________________________________________________________________________
decoder_input_c (InputLayer)    (None, 768)          0                                            
__________________________________________________________________________________________________
decoder_lstm1 (LSTM)            [(None, None, 768),  2454528     decoder_input_x[0][0]            
                                                                 decoder_input_h[0][0]

### 3.3. Connect the encoder and decoder

In [20]:
# input layers
encoder_input_x = Input(shape=(None, num_encoder_tokens), name='encoder_input_x')
decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')

# connect encoder to decoder
encoder_final_states_h,encoder_final_states_c  = encoder_model([encoder_input_x])
decoder_pred,_,_ = decoder_model([decoder_input_x,encoder_final_states_h,encoder_final_states_c])


model = Model(inputs=[encoder_input_x, decoder_input_x], 
              outputs=decoder_pred, 
              name='model_training')

In [21]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot, plot_model

SVG(model_to_dot(model, show_shapes=False).create(prog='dot', format='svg'))

plot_model(
    model=model, show_shapes
    =False,
    to_file='model_training.pdf'
)

model.summary()

Model: "model_training"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input_x (InputLayer)    (None, None, 28)     0                                            
__________________________________________________________________________________________________
decoder_input_x (InputLayer)    (None, None, 30)     0                                            
__________________________________________________________________________________________________
encoder (Model)                 [(None, 768), (None, 4810752     encoder_input_x[0][0]            
__________________________________________________________________________________________________
decoder_model (Model)           [(None, None, 30), ( 2477598     decoder_input_x[0][0]            
                                                                 encoder[1][0]       

### 3.5. Fit the model on the bilingual dataset

- encoder_input_data: one-hot encode of the input language

- decoder_input_data: one-hot encode of the input language

- decoder_target_data: labels (left shift of decoder_input_data)

- tune the hyper-parameters

- stop when the validation loss stop decreasing.

In [22]:
print('shape of encoder_input_data' + str(encoder_input_data.shape))
print('shape of decoder_input_data' + str(decoder_input_data.shape))
print('shape of decoder_target_data' + str(decoder_target_data.shape))

shape of encoder_input_data(30000, 29, 28)
shape of decoder_input_data(30000, 58, 30)
shape of decoder_target_data(30000, 58, 30)


In [46]:
# Reverse-lookup token index to decode sequences back to something readable.
reverse_input_char_index = dict((i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict((i, char) for char, i in target_token_index.items())

temperature = 0.5
def decode_sequence(input_seq):
    states_value = encoder_model.predict(input_seq)

    target_seq = numpy.zeros((1, 1, num_decoder_tokens))
    target_seq[0, 0, target_token_index['\t']] = 1.

    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # this line of code is greedy selection
        # try to use multinomial sampling instead (with temperature)
        output_tokens = output_tokens ** (1 / temperature)        
        output_tokens = output_tokens / numpy.sum(output_tokens)
     
        sampled_token_index = numpy.argmax(output_tokens[0, -1, :])
        if(sampled_token_index == 0):      
            sampled_token_index=target_token_index['\n']#replacing 0 with end of string
            print("Encountered Zero sampled_token_index")

        sampled_char = reverse_target_char_index[sampled_token_index]

        decoded_sentence += sampled_char

        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True


        target_seq = numpy.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        states_value = [h, c]

    return decoded_sentence






In [53]:
def testThething(input_sentence):
    input_sequence = [input_token_index[n] for n in list(input_sentence.lower())]
    while len(input_sequence)<max_encoder_seq_length:
        input_sequence.append(0)
        
    input_x = onehot_encode(numpy.array(input_sequence), max_encoder_seq_length, num_encoder_tokens)
    return decode_sequence([[input_x[:,0]]])

In [25]:
from keras import optimizers
model.compile(optimizer=optimizers.rmsprop(learning_rate =0.0005 ), loss='categorical_crossentropy')
EPOCHS = 30
#model.fit([encoder_input_data, decoder_input_data],  # training data
#          decoder_target_data,                       # labels (left shift of the target sequences)
#          batch_size=64, epochs=20, validation_split=0.2)

for i in range(EPOCHS):
    model.fit([encoder_input_data, decoder_input_data],  # training data
          decoder_target_data,                       # labels (left shift of the target sequences)
          batch_size=32, validation_split=0.2)

    print("Epoch  = ",i+1)
    print(testThething('Thank you'))
    

model.save('seq2seq_2b.h5')

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  1
dat je het geen

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  2
deek je het niet

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  3
deek je het

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  4
dank u tom

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  5
bedankt

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  6
dank u

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  7
bedankt

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  8
dank u

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  9
dank u

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  10
dank u

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  11
dank u

Train on 24000 samples, validate on 6000 samples
Epoch 1/1
Epoch  =  12
dank u

Train on 24000 samp

## 4. Make predictions


### 4.1. Translate English to Dutch

1. Encoder read a sentence (source language) and output its final states, $h_t$ and $c_t$.
2. Take the [star] sign "\t" and the final state $h_t$ and $c_t$ as input and run the decoder.
3. Get the new states and predicted probability distribution.
4. sample a char from the predicted probability distribution
5. take the sampled char and the new states as input and repeat the process (stop if reach the [stop] sign "\n").

In [54]:
for seq_index in range(2100, 2120):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    
    input_seq = encoder_input_data[seq_index: seq_index + 1]    
    decoded_sentence = decode_sequence(input_seq)
    print(seq_index)
    print('English:       ', input_texts[seq_index])
    print('Dutch (true): ', target_texts[seq_index][1:-1])
    print('Dutch (pred): ', decoded_sentence[0:-1])


2100
English:        they want to talk
Dutch (true):  ze willen praten
Dutch (pred):  ze willen praten
2101
English:        frogs eat flies
Dutch (true):  kikkers eten vliegen
Dutch (pred):  kikkers eten van pizza
2102
English:        i need a tissue
Dutch (true):  ik heb een zakdoek nodig
Dutch (pred):  ik heb een taken nodig
2103
English:        shouldnt we help tom
Dutch (true):  moeten we tom niet helpen
Dutch (pred):  moeten we tom niet helpen
2104
English:        tom wont succeed
Dutch (true):  tom zal geen succes hebben
Dutch (pred):  tom zal niet succeren
2105
English:        tom respects your opinion
Dutch (true):  tom respecteert jouw mening
Dutch (pred):  tom respecteert jouw maniaan
2106
English:        you owe me a kiss
Dutch (true):  jullie zijn mij een kus verschuldigd
Dutch (pred):  je bent me een kus verschuldigd
2107
English:        i cant explain it to you now
Dutch (true):  ik kan het je nu niet uitleggen
Dutch (pred):  ik kan dat niet u helpen of wel
2108
English: 

### 4.2. Translate an English sentence to the target language

1. Tokenization
2. One-hot encode
3. Translate

In [55]:
input_sentence = 'I love you'

translated_sentence = testThething(input_sentence)

print('source sentence is: ' + input_sentence)
print('translated sentence is: ' + translated_sentence[0:-1])

source sentence is: I love you
translated sentence is: ik hou van je


## 5. Evaluate the translation using BLEU score

Reference: 
- https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
- https://en.wikipedia.org/wiki/BLEU


**Hint:** 

- Randomly partition the dataset to training, validation, and test. 

- Evaluate the BLEU score using the test set. Report the average.

- A reasonable BLEU score should be 0.1 ~ 0.5.

In [56]:
from nltk.translate.bleu_score import sentence_bleu


# Test sentences

#print("Printing The first 10 Test Samples:\n")
#for i in range(10):
    #print('[' + Test_pairs[i, 0] + '] => [' + Test_pairs[i, 1] + ']')

MyBleuScore = 0.0

for i in range(TestSize):
    weights=(0.25, 0.25, 0.25, 0.25)
    reference = [Test_pairs[i, 1].split(' ')]
    
    #We do the translation here
    translated_sentence = testThething(Test_pairs[i, 0])            
    candidate = translated_sentence[0:-1].split(' ')#so we can drop the \n
    
    #Borrowed section on dealing with short grams from https://github.com/nltk/nltk/issues/1554
    if len(candidate)<4:
        weights = ( 1 / (len(candidate)) ,) * (len(candidate))
        LastBleuScore = sentence_bleu(reference, candidate,weights)
    else:
        LastBleuScore = sentence_bleu(reference, candidate)
    
    print("Compared::",Test_pairs[i, 1], "::to::", translated_sentence, "::BLEU Score = %f" %LastBleuScore)
    MyBleuScore += LastBleuScore

MyBleuScore = MyBleuScore/TestSize

print("The Avaerage BLEU Score for the test set is: %f" %MyBleuScore )

Compared:: tom loopt rond ::to:: tom wordt aan het lieften
 ::BLEU Score = 0.000000
Compared:: hoe laat is het nu ::to:: hoe laat is het nu
 ::BLEU Score = 1.000000
Compared:: ik haat mijn zus ::to:: ik haat mijn suisen
 ::BLEU Score = 0.000000
Compared:: geef me een sinaasappel ::to:: geef me een fans
 ::BLEU Score = 0.000000
Compared:: van wie heb je dit gekregen ::to:: van wie heb je dit gebroken
 ::BLEU Score = 0.759836
Compared:: je rijdt te snel ::to:: je dertreken te schrijven
 ::BLEU Score = 0.000000
Compared:: ik ben wie ik ben ::to:: ik ben in ik ook
 ::BLEU Score = 0.000000
Compared:: ik denk dat tom een student is ::to:: ik denk dat tom een stima is
 ::BLEU Score = 0.643459
Compared:: tom is een vriendelijk persoon ::to:: tom is een vriendelijk persoon
 ::BLEU Score = 1.000000
Compared:: ze wonen aan de overkant van de rivier ::to:: zij wonen dicht bij de rivier
 ::BLEU Score = 0.000000
Compared:: waarom is de hemel blauw ::to:: waarom is de bus bestugeng
 ::BLEU Score = 0.

Compared:: ik geloof stellig daarin ::to:: ik heb dat gelekend dat gelogen
 ::BLEU Score = 0.000000
Compared:: ik wil iets zoets ::to:: ik wil iets te enee
 ::BLEU Score = 0.000000
Compared:: waar is er tandpasta ::to:: waar kan ik niets voorstellen
 ::BLEU Score = 0.000000
Compared:: ik vertrouw niemand hier ::to:: ik vertrouw hier niemand
 ::BLEU Score = 0.000000
Compared:: begrijp je wat ik wil zeggen ::to:: vergeten jullie dat ik wie
 ::BLEU Score = 0.000000
Compared:: wat is uw bloedgroep ::to:: wat is jullie bliddoom
 ::BLEU Score = 0.000000
Compared:: ik denk dat tom zal winnen ::to:: ik denk dat tom niet gaan weggen
 ::BLEU Score = 0.411134
The Avaerage BLEU Score for the test set is: 0.130349
