# Home 5: Build a seq2seq model for machine translation.

### Name: [Huayi Xu]

### Task: Translate English to [French]

## 0. You will do the following:

1. Read and run my code.
2. Complete the code in Section 1.1 and Section 4.2.

    * Translation **English** to **German** is not acceptable!!! Try another pair of languages.
    
3. **Make improvements.** Directly modify the code in Section 3. Do at least one of the two. By doing both correctly, you will get up to 1 bonus score to the total.

    * Bi-LSTM instead of LSTM.
        
    * Attention. (You are allowed to use existing code.)
    
4. Evaluate the translation using the BLEU score. 

    * Optional. Up to 1 bonus scores to the total.
    
5. Convert the notebook to .HTML file. 

    * The HTML file must contain the code and the output after execution.

6. Put the .HTML file in your Google Drive, Dropbox, or Github repo.  (If you submit the file to Google Drive or Dropbox, you must make the file "open-access". The delay caused by "deny of access" may result in late penalty.)

7. Submit the link to the HTML file to Canvas.    


## 1. Data preparation

1. Download data (e.g., "deu-eng.zip") from http://www.manythings.org/anki/
2. Unzip the .ZIP file.
3. Put the .TXT file (e.g., "deu.txt") in the directory "./Data/".

### 1.1. Load and clean text


In [1]:
import re
import string
from unicodedata import normalize
import numpy

# load doc into memory
def load_doc(filename):
    # open the file as read only
    file = open(filename, mode='rt', encoding='utf-8')
    # read all text
    text = file.read()
    # close the file
    file.close()
    return text


# split a loaded document into sentences
def to_pairs(doc):
    lines = doc.strip().split('\n')
    pairs = [line.split('\t') for line in  lines]
    return pairs

def clean_data(lines):
    cleaned = list()
    # prepare regex for char filtering
    re_print = re.compile('[^%s]' % re.escape(string.printable))
    # prepare translation table for removing punctuation
    table = str.maketrans('', '', string.punctuation)
    for pair in lines:
        clean_pair = list()
        for line in pair:
            # normalize unicode characters
            line = normalize('NFD', line).encode('ascii', 'ignore')
            line = line.decode('UTF-8')
            # tokenize on white space
            line = line.split()
            # convert to lowercase
            line = [word.lower() for word in line]
            # remove punctuation from each token
            line = [word.translate(table) for word in line]
            # remove non-printable chars form each token
            line = [re_print.sub('', w) for w in line]
            # remove tokens with numbers in them
            line = [word for word in line if word.isalpha()]
            # store as string
            clean_pair.append(' '.join(line))
        cleaned.append(clean_pair)
    return numpy.array(cleaned)

#### Fill the following blanks:

In [2]:
# e.g., filename = 'Data/deu.txt'
filename = 'Data/fra.txt'

# e.g., n_train = 20000
n_train = 20000

In [3]:
# load dataset
doc = load_doc(filename)

# split into Language1-Language2 pairs
pairs = to_pairs(doc)

# clean sentences
clean_pairs = clean_data(pairs)[0:n_train, :]

In [4]:
for i in range(3000, 3010):
    print('[' + clean_pairs[i, 0] + '] => [' + clean_pairs[i, 1] + ']')

[i messed up] => [jai foire]
[i messed up] => [jai degueulasse]
[i messed up] => [jai sali]
[i messed up] => [jai mis la pagaille]
[i messed up] => [jai mis le souk]
[i must hide] => [je dois me cacher]
[i must obey] => [il me faut obeir]
[i must obey] => [je dois obeir]
[i nailed it] => [cest dans la poche]
[i nailed it] => [je lai cloue]


In [5]:
input_texts = clean_pairs[:, 0]
target_texts = ['\t' + text + '\n' for text in clean_pairs[:, 1]]

print('Length of input_texts:  ' + str(input_texts.shape))
print('Length of target_texts: ' + str(input_texts.shape))

Length of input_texts:  (20000,)
Length of target_texts: (20000,)


In [6]:
max_encoder_seq_length = max(len(line) for line in input_texts)
max_decoder_seq_length = max(len(line) for line in target_texts)

print('max length of input  sentences: %d' % (max_encoder_seq_length))
print('max length of target sentences: %d' % (max_decoder_seq_length))

max length of input  sentences: 16
max length of target sentences: 56


**Remark:** To this end, you have two lists of sentences: input_texts and target_texts

## 2. Text processing

### 2.1. Convert texts to sequences

- Input: A list of $n$ sentences (with max length $t$).
- It is represented by a $n\times t$ matrix after the tokenization and zero-padding.

In [7]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# encode and pad sequences
def text2sequences(max_len, lines):
    tokenizer = Tokenizer(char_level=True, filters='')
    tokenizer.fit_on_texts(lines)
    seqs = tokenizer.texts_to_sequences(lines)
    seqs_pad = pad_sequences(seqs, maxlen=max_len, padding='post')
    return seqs_pad, tokenizer.word_index


encoder_input_seq, input_token_index = text2sequences(max_encoder_seq_length, 
                                                      input_texts)
decoder_input_seq, target_token_index = text2sequences(max_decoder_seq_length, 
                                                       target_texts)

print('shape of encoder_input_seq: ' + str(encoder_input_seq.shape))
print('shape of input_token_index: ' + str(len(input_token_index)))
print('shape of decoder_input_seq: ' + str(decoder_input_seq.shape))
print('shape of target_token_index: ' + str(len(target_token_index)))

Using TensorFlow backend.


shape of encoder_input_seq: (20000, 16)
shape of input_token_index: 27
shape of decoder_input_seq: (20000, 56)
shape of target_token_index: 29


In [8]:
num_encoder_tokens = len(input_token_index) + 1
num_decoder_tokens = len(target_token_index) + 1

print('num_encoder_tokens: ' + str(num_encoder_tokens))
print('num_decoder_tokens: ' + str(num_decoder_tokens))

num_encoder_tokens: 28
num_decoder_tokens: 30


**Remark:** To this end, the input language and target language texts are converted to 2 matrices. 

- Their number of rows are both n_train.
- Their number of columns are respective max_encoder_seq_length and max_decoder_seq_length.

The followings print a sentence and its representation as a sequence.

In [9]:
target_texts[100]

'\tje payai\n'

In [10]:
decoder_input_seq[100, :]

array([ 7, 19,  1,  2, 16,  4, 27,  4,  5,  8,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0])

## 2.2. One-hot encode

- Input: A list of $n$ sentences (with max length $t$).
- It is represented by a $n\times t$ matrix after the tokenization and zero-padding.
- It is represented by a $n\times t \times v$ tensor ($t$ is the number of unique chars) after the one-hot encoding.

In [11]:
from keras.utils import to_categorical

# one hot encode target sequence
def onehot_encode(sequences, max_len, vocab_size):
    n = len(sequences)
    data = numpy.zeros((n, max_len, vocab_size))
    for i in range(n):
        data[i, :, :] = to_categorical(sequences[i], num_classes=vocab_size)
    return data

encoder_input_data = onehot_encode(encoder_input_seq, max_encoder_seq_length, num_encoder_tokens)
decoder_input_data = onehot_encode(decoder_input_seq, max_decoder_seq_length, num_decoder_tokens)

decoder_target_seq = numpy.zeros(decoder_input_seq.shape)
decoder_target_seq[:, 0:-1] = decoder_input_seq[:, 1:]
decoder_target_data = onehot_encode(decoder_target_seq, 
                                    max_decoder_seq_length, 
                                    num_decoder_tokens)

print(encoder_input_data.shape)
print(decoder_input_data.shape)

(20000, 16, 28)
(20000, 56, 30)


## 3. Build the networks (for training)

- Build encoder, decoder, and connect the two modules to get "model". 

- Fit the model on the bilingual data to train the parameters in the encoder and decoder.

# BiLSTM

### 3.1. Encoder network

- Input:  one-hot encode of the input language

- Return: 

    -- output (all the hidden states   $h_1, \cdots , h_t$) are always discarded
    
    -- the final hidden state  $h_t$
    
    -- the final conveyor belt $c_t$

In [12]:
from keras.layers import Input, LSTM
from keras.models import Model
from keras.layers import Bidirectional, concatenate

latent_dim = 256

# inputs of the encoder network
encoder_inputs = Input(shape=(None, num_encoder_tokens), name='encoder_inputs')

encoder_bilstm = Bidirectional(LSTM(latent_dim, return_state=True, 
                                  dropout=0.5, name='encoder_bilstm'))
_, forward_h, forward_c, backward_h, backward_c = encoder_bilstm(encoder_inputs)

state_h = concatenate([forward_h, backward_h],axis=1)
state_c = concatenate([forward_c, backward_c],axis=1)
encoder_final_states = [state_h,state_c]

# build the encoder network model
encoder_model = Model(inputs=encoder_inputs, 
                      outputs=encoder_final_states,
                      name='encoder')

In [13]:
encoder_model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_inputs (InputLayer)     (None, None, 28)     0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) [(None, 512), (None, 583680      encoder_inputs[0][0]             
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 512)          0           bidirectional_1[0][1]            
                                                                 bidirectional_1[0][3]            
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 512)          0           bidirectional_1[0][2]            
          

### 3.2. Decoder network

- Inputs:  

    -- one-hot encode of the target language
    
    -- The initial hidden state $h_t$ 
    
    -- The initial conveyor belt $c_t$ 

- Return: 

    -- output (all the hidden states) $h_1, \cdots , h_t$

    -- the final hidden state  $h_t$ (discarded in the training and used in the prediction)
    
    -- the final conveyor belt $c_t$ (discarded in the training and used in the prediction)

In [14]:
from keras.layers import Input, LSTM, Dense
from keras.models import Model

# inputs of the decoder network
decoder_input_h = Input(shape=(latent_dim*2,), name='decoder_input_h')
decoder_input_c = Input(shape=(latent_dim*2,), name='decoder_input_c')
decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')

# set the LSTM layer
decoder_lstm = LSTM(latent_dim*2, return_sequences=True, 
                    return_state=True, dropout=0.5, name='decoder_lstm')
decoder_lstm_outputs, state_h, state_c = decoder_lstm(decoder_input_x, initial_state=[decoder_input_h, decoder_input_c])

# set the dense layer
decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_lstm_outputs)

# build the decoder network model
decoder_model = Model(inputs=[decoder_input_x, decoder_input_h, decoder_input_c],
                      outputs=[decoder_outputs, state_h, state_c],
                      name='decoder')

In [15]:
decoder_model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
decoder_input_x (InputLayer)    (None, None, 30)     0                                            
__________________________________________________________________________________________________
decoder_input_h (InputLayer)    (None, 512)          0                                            
__________________________________________________________________________________________________
decoder_input_c (InputLayer)    (None, 512)          0                                            
__________________________________________________________________________________________________
decoder_lstm (LSTM)             [(None, None, 512),  1112064     decoder_input_x[0][0]            
                                                                 decoder_input_h[0][0]            
          

### 3.3. Connect the encoder and decoder

In [16]:
decoder_input_x = Input(shape=(None, num_decoder_tokens), name='decoder_input_x')

# connect encoder to decoder
decoder_lstm_output, _, _ = decoder_lstm(decoder_input_x, initial_state=encoder_final_states)
decoder_pred = decoder_dense(decoder_lstm_output)

model = Model(inputs=[encoder_inputs, decoder_input_x], 
              outputs=decoder_pred, 
              name='model_training')

In [17]:
print(state_h)
print(decoder_input_h)

Tensor("decoder_lstm/while/Exit_2:0", shape=(?, 512), dtype=float32)
Tensor("decoder_input_h:0", shape=(?, 512), dtype=float32)


In [18]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_inputs (InputLayer)     (None, None, 28)     0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) [(None, 512), (None, 583680      encoder_inputs[0][0]             
__________________________________________________________________________________________________
decoder_input_x (InputLayer)    (None, None, 30)     0                                            
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 512)          0           bidirectional_1[0][1]            
                                                                 bidirectional_1[0][3]            
__________

### 3.5. Fit the model on the bilingual dataset

- encoder_input_data: one-hot encode of the input language

- decoder_input_data: one-hot encode of the input language

- decoder_target_data: labels (left shift of decoder_input_data)

- tune the hyper-parameters

- stop when the validation loss stop decreasing.

In [20]:
print('shape of encoder_input_data' + str(encoder_input_data.shape))
print('shape of decoder_input_data' + str(decoder_input_data.shape))
print('shape of decoder_target_data' + str(decoder_target_data.shape))

shape of encoder_input_data(20000, 16, 28)
shape of decoder_input_data(20000, 56, 30)
shape of decoder_target_data(20000, 56, 30)


In [21]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.save_weights('model_pretrain.h5')

In [20]:
model.load_weights('model_pretrain.h5')
model.fit([encoder_input_data, decoder_input_data],  # training data
          decoder_target_data,                       # labels (left shift of the target sequences)
          batch_size=64, epochs=29, validation_split=0.2)

model.save('seq2seq.h5')

Train on 16000 samples, validate on 4000 samples
Epoch 1/29
Epoch 2/29
Epoch 3/29
Epoch 4/29
Epoch 5/29
Epoch 6/29
Epoch 7/29
Epoch 8/29
Epoch 9/29
Epoch 10/29
Epoch 11/29
Epoch 12/29
Epoch 13/29
Epoch 14/29
Epoch 15/29
Epoch 16/29
Epoch 17/29
Epoch 18/29
Epoch 19/29
Epoch 20/29
Epoch 21/29
Epoch 22/29
Epoch 23/29
Epoch 24/29
Epoch 25/29
Epoch 26/29
Epoch 27/29
Epoch 28/29
Epoch 29/29


  str(node.arguments) + '. They will not be included '


# BiLSTM + Attention

In [41]:
import tensorflow as tf
import os
from tensorflow.python.keras.layers import Layer
from tensorflow.python.keras import backend as K


class AttentionLayer(Layer):
    """
    This class implements Bahdanau attention (https://arxiv.org/pdf/1409.0473.pdf).
    There are three sets of weights introduced W_a, U_a, and V_a
     """

    def __init__(self, **kwargs):
        super(AttentionLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        assert isinstance(input_shape, list)
        # Create a trainable weight variable for this layer.

        self.W_a = self.add_weight(name='W_a',
                                   shape=tf.TensorShape((input_shape[0][2], input_shape[0][2])),
                                   initializer='uniform',
                                   trainable=True)
        self.U_a = self.add_weight(name='U_a',
                                   shape=tf.TensorShape((input_shape[1][2], input_shape[0][2])),
                                   initializer='uniform',
                                   trainable=True)
        self.V_a = self.add_weight(name='V_a',
                                   shape=tf.TensorShape((input_shape[0][2], 1)),
                                   initializer='uniform',
                                   trainable=True)

        super(AttentionLayer, self).build(input_shape)  # Be sure to call this at the end

    def call(self, inputs, verbose=False):
        """
        inputs: [encoder_output_sequence, decoder_output_sequence]
        """
        assert type(inputs) == list
        encoder_out_seq, decoder_out_seq = inputs
        if verbose:
            print('encoder_out_seq>', encoder_out_seq.shape)
            print('decoder_out_seq>', decoder_out_seq.shape)

        def energy_step(inputs, states):
            """ Step function for computing energy for a single decoder state
            inputs: (batchsize * 1 * de_in_dim)
            states: (batchsize * 1 * de_latent_dim)
            """

            assert_msg = "States must be an iterable. Got {} of type {}".format(states, type(states))
            assert isinstance(states, list) or isinstance(states, tuple), assert_msg

            """ Some parameters required for shaping tensors"""
            en_seq_len, en_hidden = encoder_out_seq.shape[1], encoder_out_seq.shape[2]
            de_hidden = inputs.shape[-1]

            """ Computing S.Wa where S=[s0, s1, ..., si]"""
            # <= batch size * en_seq_len * latent_dim
            W_a_dot_s = K.dot(encoder_out_seq, self.W_a)

            """ Computing hj.Ua """
            U_a_dot_h = K.expand_dims(K.dot(inputs, self.U_a), 1)  # <= batch_size, 1, latent_dim
            if verbose:
                print('Ua.h>', U_a_dot_h.shape)

            """ tanh(S.Wa + hj.Ua) """
            # <= batch_size*en_seq_len, latent_dim
            Ws_plus_Uh = K.tanh(W_a_dot_s + U_a_dot_h)
            if verbose:
                print('Ws+Uh>', Ws_plus_Uh.shape)

            """ softmax(va.tanh(S.Wa + hj.Ua)) """
            # <= batch_size, en_seq_len
            e_i = K.squeeze(K.dot(Ws_plus_Uh, self.V_a), axis=-1)
            # <= batch_size, en_seq_len
            e_i = K.softmax(e_i)

            if verbose:
                print('ei>', e_i.shape)

            return e_i, [e_i]

        def context_step(inputs, states):
            """ Step function for computing ci using ei """

            assert_msg = "States must be an iterable. Got {} of type {}".format(states, type(states))
            assert isinstance(states, list) or isinstance(states, tuple), assert_msg

            # <= batch_size, hidden_size
            c_i = K.sum(encoder_out_seq * K.expand_dims(inputs, -1), axis=1)
            if verbose:
                print('ci>', c_i.shape)
            return c_i, [c_i]

        fake_state_c = K.sum(encoder_out_seq, axis=1)
        fake_state_e = K.sum(encoder_out_seq, axis=2)  # <= (batch_size, enc_seq_len, latent_dim

        """ Computing energy outputs """
        # e_outputs => (batch_size, de_seq_len, en_seq_len)
        last_out, e_outputs, _ = K.rnn(
            energy_step, decoder_out_seq, [fake_state_e],
        )

        """ Computing context vectors """
        last_out, c_outputs, _ = K.rnn(
            context_step, e_outputs, [fake_state_c],
        )

        return c_outputs, e_outputs

    def compute_output_shape(self, input_shape):
        """ Outputs produced by the layer """
        return [
            tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[1][2])),
            tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
        ]

In [None]:
from keras.layers import Input, LSTM
from keras.models import Model
from keras.layers import Bidirectional, concatenate, Concatenate

latent_dim = 256

# inputs of the encoder network
encoder_inputs = Input(shape=(None, num_encoder_tokens), name='encoder_inputs')

encoder_bilstm = Bidirectional(LSTM(latent_dim, return_state=True, dropout=0.5, name='encoder_bilstm'))

encoder_outputs, forward_h, forward_c, backward_h, backward_c = encoder_bilstm(encoder_inputs)

state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
encoder_final_states = [state_h,state_c]

decoder_input = Input(shape=(None, num_decoder_tokens), name='decoder_input')

decoder_lstm = LSTM(latent_dim*2, return_sequences=True, return_state=True, dropout=0.5, name='decoder_lstm')

decoder_lstm_outputs, state_h, state_c = decoder_lstm(decoder_input, initial_state=encoder_final_states)

attn_layer = AttentionLayer()
attn_op, attn_state = attn_layer([encoder_outputs, decoder_lstm_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_lstm_outputs, attn_op])

decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_concat_input)

model = Model(inputs=[encoder_input, decoder_input], 
              outputs=decoder_pred, 
              name='model_training')

## 4. Make predictions


### 4.1. Translate English to XXX

1. Encoder read a sentence (source language) and output its final states, $h_t$ and $c_t$.
2. Take the [star] sign "\t" and the final state $h_t$ and $c_t$ as input and run the decoder.
3. Get the new states and predicted probability distribution.
4. sample a char from the predicted probability distribution
5. take the sampled char and the new states as input and repeat the process (stop if reach the [stop] sign "\n").

In [21]:
# Reverse-lookup token index to decode sequences back to something readable.
reverse_input_char_index = dict((i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict((i, char) for char, i in target_token_index.items())

In [22]:
def decode_sequence(input_seq):
    states_value = encoder_model.predict(input_seq)
    
    target_seq = numpy.zeros((1, 1, num_decoder_tokens))
    target_seq[0, 0, target_token_index['\t']] = 1.

    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # this line of code is greedy selection
        # try to use multinomial sampling instead (with temperature)
        sampled_token_index = numpy.argmax(output_tokens[0, -1, :])
        
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        target_seq = numpy.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        states_value = [h, c]

    return decoded_sentence


In [29]:
for seq_index in range(10100, 10120):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('English:       ', input_texts[seq_index])
    print('French (true): ', target_texts[seq_index][1:-1])
    print('French (pred): ', decoded_sentence[0:-1])


-
English:        i want to swim
French (true):  je veux nager
French (pred):  je veux des ander
-
English:        i want to talk
French (true):  je veux parler
French (pred):  je veux parler
-
English:        i want to talk
French (true):  je veux discuter
French (pred):  je veux parler
-
English:        i want to wait
French (true):  je veux attendre
French (pred):  je veux des atiles
-
English:        i want to walk
French (true):  je veux marcher
French (pred):  je veux parler
-
English:        i want to work
French (true):  je veux travailler
French (pred):  je veux parler
-
English:        i wanted to go
French (true):  je voulais y aller
French (pred):  je voulais partir
-
English:        i wanted to go
French (true):  je voulais partir
French (pred):  je voulais partir
-
English:        i was a doctor
French (true):  jetais medecin
French (pred):  jetais malheureux
-
English:        i was a member
French (true):  jetais membre
French (pred):  jetais malheureux
-
English:       

### 4.2. Translate an English sentence to the target language

1. Tokenization
2. One-hot encode
3. Translate

In [28]:
input_sentence = 'i love you'

input_sequence = []
for char in input_sentence:
    input_sequence.append(input_token_index[char])

input_sequence = [input_sequence]
seqs_pad = pad_sequences(input_sequence, maxlen=max_encoder_seq_length, padding='post')
print(seqs_pad)

input_x = onehot_encode(seqs_pad, max_encoder_seq_length, num_encoder_tokens)
print(input_x.shape)

translated_sentence = decode_sequence(input_x)

print('source sentence is: ' + input_sentence)
print('translated sentence is: ' + translated_sentence)

[[ 5  1 11  4 23  2  1 15  4 14  0  0  0  0  0  0]]
(1, 16, 28)
source sentence is: i love you
translated sentence is: je taime



## 5. Evaluate the translation using BLEU score

Reference: 
- https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
- https://en.wikipedia.org/wiki/BLEU


**Hint:** 

- Randomly partition the dataset to training, validation, and test. 

- Evaluate the BLEU score using the test set. Report the average.

- A reasonable BLEU score should be 0.1 ~ 0.5.

In [33]:
import numpy
rand_indices = numpy.random.permutation(20000)
train_indices = rand_indices[0:16000]
test_indices = rand_indices[16000:20000]

input_train = input_texts[train_indices]
input_test = input_texts[test_indices]

target_train = numpy.asarray(target_texts)[train_indices]
target_test = numpy.asarray(target_texts)[test_indices]

In [34]:
encoder_input_seq, input_token_index = text2sequences(max_encoder_seq_length, input_train)
decoder_input_seq, target_token_index = text2sequences(max_decoder_seq_length, target_train)
encoder_input_data = onehot_encode(encoder_input_seq, max_encoder_seq_length, num_encoder_tokens)
decoder_input_data = onehot_encode(decoder_input_seq, max_decoder_seq_length, num_decoder_tokens)

decoder_target_seq = numpy.zeros(decoder_input_seq.shape)
decoder_target_seq[:, 0:-1] = decoder_input_seq[:, 1:]
decoder_target_data = onehot_encode(decoder_target_seq, max_decoder_seq_length, num_decoder_tokens)

print(encoder_input_data.shape)
print(decoder_input_data.shape)

(16000, 16, 28)
(16000, 56, 30)


In [25]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.load_weights('model_pretrain.h5')
model.fit([encoder_input_data, decoder_input_data],
          decoder_target_data,                       
          batch_size=64, epochs=29)

model.save('seq2seq_new.h5')

Epoch 1/29
Epoch 2/29
Epoch 3/29
Epoch 4/29
Epoch 5/29
Epoch 6/29
Epoch 7/29
Epoch 8/29
Epoch 9/29
Epoch 10/29
Epoch 11/29
Epoch 12/29
Epoch 13/29
Epoch 14/29
Epoch 15/29
Epoch 16/29
Epoch 17/29
Epoch 18/29
Epoch 19/29
Epoch 20/29
Epoch 21/29
Epoch 22/29
Epoch 23/29
Epoch 24/29
Epoch 25/29
Epoch 26/29
Epoch 27/29
Epoch 28/29
Epoch 29/29


  str(node.arguments) + '. They will not be included '


In [35]:
reverse_input_char_index = dict((i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict((i, char) for char, i in target_token_index.items())

In [36]:
def decode_sequence(input_seq):
    states_value = encoder_model.predict(input_seq)
    
    target_seq = numpy.zeros((1, 1, num_decoder_tokens))
    target_seq[0, 0, target_token_index['\t']] = 1.

    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # this line of code is greedy selection
        # try to use multinomial sampling instead (with temperature)
        sampled_token_index = numpy.argmax(output_tokens[0, -1, :])
        
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        target_seq = numpy.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        states_value = [h, c]

    return decoded_sentence

In [40]:
from nltk.translate.bleu_score import sentence_bleu

def BLEU(reference, translation):
    translation_words = str(translation).split()
    reference_words = str(reference).split()
    print("The reference is ", reference_words)
    print("The translation is ", translation_words)
    score = sentence_bleu([reference_words], translation_words)
    return score


sentence_idx = 0
scores_list = []
for input_sentence in input_test[0:1000]:
    input_sequence = []

    for char in input_sentence:
      input_sequence.append(input_token_index[char])
    input_sequence = [input_sequence]
    input_sequence = numpy.array(input_sequence).reshape(1, len(input_sequence[0]))
    seqs_pad = pad_sequences(input_sequence, maxlen=max_encoder_seq_length, padding='post')
    input_x = onehot_encode(seqs_pad, max_encoder_seq_length, num_encoder_tokens)

    translated_sentence = decode_sequence(input_x)

    print('\n' + 'English sentence is: ' + input_sentence)

    score = BLEU(target_test[sentence_idx], translated_sentence)
    scores_list.append(score)
    sentence_idx += 1
    print("Sentence number ", sentence_idx, "has a score of ", score)


print("the average of the bleu scores is:", numpy.mean(scores_list))


English sentence is: im between jobs
The reference is  ['je', 'suis', 'entre', 'deux', 'postes']
The translation is  ['je', 'suis', 'encore', 'ici']
Sentence number  1 has a score of  7.422680762211792e-155

English sentence is: i cant sing
The reference is  ['je', 'suis', 'incapable', 'de', 'chanter']
The translation is  ['je', 'ne', 'peux', 'pas', 'magnter']
Sentence number  2 has a score of  1.2183324802375697e-231

English sentence is: tom is hurt
The reference is  ['tom', 'est', 'blesse']
The translation is  ['tom', 'est', 'pret']
Sentence number  3 has a score of  1.133422688662942e-154

English sentence is: ill come by
The reference is  ['je', 'viendrai', 'a', 'dix', 'heures']
The translation is  ['je', 'vais', 'prendre', 'ma', 'chercher']
Sentence number  4 has a score of  1.2183324802375697e-231


The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()



English sentence is: its just a game
The reference is  ['ce', 'nest', 'rien', 'quun', 'jeu']
The translation is  ['ce', 'nest', 'que', 'de', 'sager']
Sentence number  5 has a score of  8.38826642100846e-155

English sentence is: the door opened
The reference is  ['la', 'porte', 'sest', 'ouverte']
The translation is  ['le', 'piege', 'est', 'normale']
Sentence number  6 has a score of  0

English sentence is: sing with us
The reference is  ['chante', 'avec', 'nous']
The translation is  ['de', 'quitte', 'pas', 'confiance']
Sentence number  7 has a score of  0

English sentence is: tom was sincere
The reference is  ['tom', 'etait', 'direct']
The translation is  ['tom', 'etait', 'nerveux']
Sentence number  8 has a score of  1.133422688662942e-154

English sentence is: tom kept reading
The reference is  ['tom', 'continua', 'de', 'lire']
The translation is  ['tom', 'continua', 'a', 'courir']
Sentence number  9 has a score of  9.53091075863908e-155

English sentence is: get going
The referenc


English sentence is: tom stopped it
The reference is  ['tom', 'la', 'arretee']
The translation is  ['tom', 'a', 'tire', 'marie']
Sentence number  52 has a score of  1.2882297539194154e-231

English sentence is: do you feel sick
The reference is  ['te', 'senstu', 'malade']
The translation is  ['avezvous', 'des', 'bonnes']
Sentence number  53 has a score of  0

English sentence is: now its my turn
The reference is  ['cest', 'mon', 'tour', 'maintenant']
The translation is  ['de', 'manger', 'ne', 'sortire', 'pas', 'de', 'marcher']
Sentence number  54 has a score of  0

English sentence is: youre not dead
The reference is  ['tu', 'nes', 'pas', 'mort']
The translation is  ['vous', 'netes', 'pas', 'mort']
Sentence number  55 has a score of  9.53091075863908e-155

English sentence is: open the doors
The reference is  ['ouvrez', 'les', 'portes']
The translation is  ['ouvre', 'les', 'foits']
Sentence number  56 has a score of  1.384292958842266e-231

English sentence is: tom just called
The ref


English sentence is: let me check
The reference is  ['laissemoi', 'verifier']
The translation is  ['laissemoi', 'voir', 'fait']
Sentence number  99 has a score of  1.384292958842266e-231

English sentence is: whats inside
The reference is  ['quy', 'atil', 'a', 'linterieur']
The translation is  ['questce', 'qui', 'se', 'pourrait']
Sentence number  100 has a score of  0

English sentence is: brush your hair
The reference is  ['brossetoi', 'les', 'cheveux']
The translation is  ['ferme', 'les', 'mers']
Sentence number  101 has a score of  1.384292958842266e-231

English sentence is: we smiled
The reference is  ['nous', 'avons', 'souri']
The translation is  ['nous', 'avons', 'serieux']
Sentence number  102 has a score of  1.133422688662942e-154

English sentence is: tom felt lonely
The reference is  ['tom', 'sest', 'senti', 'seul']
The translation is  ['tom', 'sest', 'senti', 'mal']
Sentence number  103 has a score of  8.636168555094496e-78

English sentence is: did she like it
The referen


English sentence is: have a nice day
The reference is  ['bonne', 'journee']
The translation is  ['faites', 'un', 'bon', 'voyage']
Sentence number  144 has a score of  0

English sentence is: dont stop him
The reference is  ['ne', 'larrete', 'pas']
The translation is  ['ne', 'te', 'pas', 'pas', 'ca']
Sentence number  145 has a score of  1.4488496539373276e-231

English sentence is: who helps her
The reference is  ['qui', 'laide']
The translation is  ['qui', 'la', 'detende']
Sentence number  146 has a score of  1.384292958842266e-231

English sentence is: tom talked
The reference is  ['tom', 'a', 'parle']
The translation is  ['tom', 'a', 'fait', 'une', 'contravante']
Sentence number  147 has a score of  8.38826642100846e-155

English sentence is: tom sat quietly
The reference is  ['tom', 'sassit', 'tranquillement']
The translation is  ['tom', 'est', 'terti']
Sentence number  148 has a score of  1.384292958842266e-231

English sentence is: i looked down
The reference is  ['je', 'baissai'


English sentence is: that works
The reference is  ['ca', 'fonctionne']
The translation is  ['cest', 'portigle']
Sentence number  190 has a score of  0

English sentence is: are you serious
The reference is  ['tu', 'es', 'serieux']
The translation is  ['etesvous', 'serieuses']
Sentence number  191 has a score of  0

English sentence is: i felt scared
The reference is  ['je', 'me', 'sentis', 'apeuree']
The translation is  ['je', 'me', 'suis', 'senti', 'trompe']
Sentence number  192 has a score of  8.38826642100846e-155

English sentence is: im on the roof
The reference is  ['je', 'suis', 'sur', 'le', 'toit']
The translation is  ['me', 'ne', 'sais', 'pas', 'chez', 'lui']
Sentence number  193 has a score of  0

English sentence is: we overslept
The reference is  ['nous', 'navons', 'pas', 'entendu', 'le', 'reveil']
The translation is  ['nous', 'nous', 'sommes', 'sences']
Sentence number  194 has a score of  7.813508425061864e-232

English sentence is: go ahead
The reference is  ['en', 'ava


English sentence is: its cheap
The reference is  ['cest', 'bon', 'marche']
The translation is  ['cest', 'chaud']
Sentence number  236 has a score of  9.291879812217675e-232

English sentence is: shes hot
The reference is  ['elle', 'est', 'chaude']
The translation is  ['elle', 'est', 'la']
Sentence number  237 has a score of  1.133422688662942e-154

English sentence is: tom got beat
The reference is  ['tom', 'a', 'ete', 'battu']
The translation is  ['tom', 'a', 'des', 'servettes']
Sentence number  238 has a score of  9.53091075863908e-155

English sentence is: this one is ours
The reference is  ['celuici', 'est', 'le', 'notre']
The translation is  ['ce', 'sonne', 'est', 'neuve']
Sentence number  239 has a score of  1.2882297539194154e-231

English sentence is: can i go now
The reference is  ['estce', 'que', 'je', 'pourrais', 'y', 'aller', 'maintenant']
The translation is  ['puisje', 'men', 'aller']
Sentence number  240 has a score of  3.648956622645728e-232

English sentence is: the do


English sentence is: toms clueless
The reference is  ['tom', 'est', 'perdu']
The translation is  ['tom', 'est', 'plein', 'de', 'chance']
Sentence number  279 has a score of  8.38826642100846e-155

English sentence is: itll snow today
The reference is  ['il', 'va', 'neiger', 'aujourdhui']
The translation is  ['il', 'va', 'tent', 'frie']
Sentence number  280 has a score of  9.53091075863908e-155

English sentence is: here i am
The reference is  ['me', 'voici']
The translation is  ['voila', 'monche']
Sentence number  281 has a score of  0

English sentence is: ill do it now
The reference is  ['je', 'vais', 'le', 'faire', 'maintenant']
The translation is  ['ouvrezmoi', 'ca']
Sentence number  282 has a score of  0

English sentence is: he came by car
The reference is  ['il', 'est', 'venu', 'en', 'voiture']
The translation is  ['il', 'a', 'fait', 'mon', 'chapeau']
Sentence number  283 has a score of  1.2183324802375697e-231

English sentence is: theyre cute
The reference is  ['ils', 'sont',


English sentence is: he is in trouble
The reference is  ['il', 'a', 'des', 'ennuis']
The translation is  ['il', 'est', 'accore', 'a', 'la', 'maison']
Sentence number  323 has a score of  1.384292958842266e-231

English sentence is: save me a seat
The reference is  ['reservemoi', 'un', 'fauteuil']
The translation is  ['reservemoi', 'une', 'seconde']
Sentence number  324 has a score of  1.384292958842266e-231

English sentence is: mary helped tom
The reference is  ['mary', 'a', 'aide', 'tom']
The translation is  ['mery', 'a', 'la', 'parle']
Sentence number  325 has a score of  1.2882297539194154e-231

English sentence is: the milk is sour
The reference is  ['le', 'lait', 'est', 'aigre']
The translation is  ['le', 'livre', 'est', 'tout', 'a', 'la', 'maison']
Sentence number  326 has a score of  1.331960397810445e-231

English sentence is: thats a joke
The reference is  ['cest', 'une', 'blague']
The translation is  ['cest', 'une', 'blague']
Sentence number  327 has a score of  1.221338669


English sentence is: we made him cry
The reference is  ['nous', 'lavons', 'fait', 'pleurer']
The translation is  ['nous', 'devons', 'arreter']
Sentence number  369 has a score of  9.918892480173173e-232

English sentence is: i read the book
The reference is  ['jai', 'lu', 'le', 'livre']
The translation is  ['jai', 'ett', 'ne', 'la', 'chanson']
Sentence number  370 has a score of  1.2183324802375697e-231

English sentence is: my tv is broken
The reference is  ['mon', 'televiseur', 'est', 'casse']
The translation is  ['man', 'est', 'ma', 'nage']
Sentence number  371 has a score of  1.2882297539194154e-231

English sentence is: they want more
The reference is  ['ils', 'veulent', 'davantage']
The translation is  ['elles', 'veulent', 'la', 'vair']
Sentence number  372 has a score of  1.2882297539194154e-231

English sentence is: i am truly sorry
The reference is  ['je', 'suis', 'vraiment', 'desole']
The translation is  ['je', 'suis', 'tres', 'ficile']
Sentence number  373 has a score of  9


English sentence is: youre patient
The reference is  ['vous', 'etes', 'patient']
The translation is  ['vous', 'etes', 'invitees']
Sentence number  412 has a score of  1.133422688662942e-154

English sentence is: who hired you
The reference is  ['qui', 'ta', 'recrute']
The translation is  ['qui', 'vous', 'a', 'embrassees']
Sentence number  413 has a score of  1.2882297539194154e-231

English sentence is: feel this
The reference is  ['sentez', 'ca']
The translation is  ['continuez', 'ca']
Sentence number  414 has a score of  1.5319719891192393e-231

English sentence is: you must leave
The reference is  ['vous', 'devez', 'partir']
The translation is  ['il', 'faut', 'que', 'je', 'ten', 'pres']
Sentence number  415 has a score of  0

English sentence is: dont kill me
The reference is  ['ne', 'me', 'tue', 'pas']
The translation is  ['ne', 'me', 'taquenez', 'pas']
Sentence number  416 has a score of  1.0547686614863434e-154

English sentence is: get away
The reference is  ['pars', 'dici']
Th


English sentence is: tom is the owner
The reference is  ['tom', 'est', 'le', 'proprietaire']
The translation is  ['tom', 'est', 'le', 'mielleur']
Sentence number  458 has a score of  8.636168555094496e-78

English sentence is: youre mature
The reference is  ['tu', 'es', 'mature']
The translation is  ['vous', 'etes', 'marie']
Sentence number  459 has a score of  0

English sentence is: im in training
The reference is  ['je', 'suis', 'en', 'formation']
The translation is  ['je', 'suis', 'en', 'train', 'de', 'parler']
Sentence number  460 has a score of  5.775353993361614e-78

English sentence is: are you jealous
The reference is  ['estu', 'jaloux']
The translation is  ['etesvous', 'jelouses']
Sentence number  461 has a score of  0

English sentence is: he lied to me
The reference is  ['il', 'ma', 'menti']
The translation is  ['il', 'la', 'laissee', 'tomber']
Sentence number  462 has a score of  1.2882297539194154e-231

English sentence is: take mine
The reference is  ['prenez', 'la', 'm


English sentence is: are you drinking
The reference is  ['etesvous', 'en', 'train', 'de', 'boire']
The translation is  ['etesvous', 'jelouses']
Sentence number  504 has a score of  3.418291552750845e-232

English sentence is: im nearly blind
The reference is  ['je', 'suis', 'presque', 'aveugle']
The translation is  ['me', 'suis', 'plutot', 'content']
Sentence number  505 has a score of  1.2882297539194154e-231

English sentence is: are we all here
The reference is  ['sommesnous', 'tous', 'la']
The translation is  ['sommesnous', 'tous', 'la']
Sentence number  506 has a score of  1.2213386697554703e-77

English sentence is: im amused
The reference is  ['je', 'suis', 'amusee']
The translation is  ['je', 'suis', 'confus']
Sentence number  507 has a score of  1.133422688662942e-154

English sentence is: tom chose well
The reference is  ['tom', 'a', 'fait', 'un', 'bon', 'choix']
The translation is  ['tom', 'a', 'des', 'premestionne']
Sentence number  508 has a score of  5.780789590099596e-1


English sentence is: i nearly starved
The reference is  ['je', 'suis', 'presque', 'mort', 'de', 'faim']
The translation is  ['je', 'prefere', 'le', 'francais']
Sentence number  551 has a score of  7.813508425061864e-232

English sentence is: lets do our job
The reference is  ['laisseznous', 'faire', 'notre', 'boulot']
The translation is  ['vontenons', 'ton', 'chapeau']
Sentence number  552 has a score of  0

English sentence is: i talk to myself
The reference is  ['je', 'parle', 'tout', 'seul']
The translation is  ['je', 'lai', 'parle', 'ma', 'nager']
Sentence number  553 has a score of  1.4488496539373276e-231

English sentence is: we can try
The reference is  ['nous', 'pouvons', 'tenter']
The translation is  ['nous', 'pouvons', 'y', 'aller']
Sentence number  554 has a score of  9.53091075863908e-155

English sentence is: im finished
The reference is  ['jen', 'ai', 'termine']
The translation is  ['me', 'ne', 'fais', 'pas', 'de', 'chance']
Sentence number  555 has a score of  0

Engli


English sentence is: im humble
The reference is  ['je', 'suis', 'humble']
The translation is  ['je', 'suis', 'confus']
Sentence number  596 has a score of  1.133422688662942e-154

English sentence is: ask him
The reference is  ['demandelui']
The translation is  ['demandez', 'a', 'tom']
Sentence number  597 has a score of  0

English sentence is: i volunteered
The reference is  ['je', 'me', 'suis', 'porte', 'volontaire']
The translation is  ['jai', 'compris']
Sentence number  598 has a score of  0

English sentence is: it gets worse
The reference is  ['ca', 'empire']
The translation is  ['ca', 'a', 'des', 'enfants']
Sentence number  599 has a score of  1.2882297539194154e-231

English sentence is: check these out
The reference is  ['regardez', 'ceuxci']
The translation is  ['verifie', 'ceci']
Sentence number  600 has a score of  0

English sentence is: just who are you
The reference is  ['qui', 'estu', 'exactement']
The translation is  ['contentetoi', 'de', 'confiance', 'a', 'toi']
Sen

The translation is  ['agardez', 'ca']
Sentence number  640 has a score of  1.5319719891192393e-231

English sentence is: thanks so much
The reference is  ['mille', 'mercis']
The translation is  ['merci', 'bien', 'me', 'contratie']
Sentence number  641 has a score of  0

English sentence is: is that so bad
The reference is  ['estce', 'si', 'mauvais']
The translation is  ['estce', 'le', 'mien']
Sentence number  642 has a score of  1.384292958842266e-231

English sentence is: come over
The reference is  ['venez', 'ici']
The translation is  ['viens', 'chez', 'nous']
Sentence number  643 has a score of  0

English sentence is: were trying
The reference is  ['nous', 'essayons']
The translation is  ['nous', 'sommes', 'connains']
Sentence number  644 has a score of  1.384292958842266e-231

English sentence is: i hate mondays
The reference is  ['je', 'deteste', 'les', 'lundis']
The translation is  ['je', 'deteste', 'les', 'armessantes']
Sentence number  645 has a score of  8.636168555094496e-78


English sentence is: it was ok
The reference is  ['cetait', 'correct']
The translation is  ['cetait', 'singu']
Sentence number  684 has a score of  1.5319719891192393e-231

English sentence is: is the bank open
The reference is  ['estce', 'que', 'la', 'banque', 'est', 'ouverte']
The translation is  ['estce', 'que', 'le', 'songe', 'bien']
Sentence number  685 has a score of  6.867731683891005e-155

English sentence is: crime is down
The reference is  ['la', 'criminalite', 'est', 'en', 'baisse']
The translation is  ['fimez', 'ce', 'compte']
Sentence number  686 has a score of  0

English sentence is: is tom doing ok
The reference is  ['estce', 'que', 'tom', 'va', 'bien']
The translation is  ['estce', 'que', 'tom', 'va', 'bien']
Sentence number  687 has a score of  1.0

English sentence is: tom winced
The reference is  ['tom', 'a', 'grimace']
The translation is  ['tom', 'a', 'connait']
Sentence number  688 has a score of  1.133422688662942e-154

English sentence is: i dont hate you
The r


English sentence is: im not tall
The reference is  ['je', 'ne', 'suis', 'pas', 'grand']
The translation is  ['je', 'ne', 'suis', 'pas', 'simple']
Sentence number  730 has a score of  0.668740304976422

English sentence is: are you sure
The reference is  ['etesvous', 'surs']
The translation is  ['etesvous', 'serieux']
Sentence number  731 has a score of  1.5319719891192393e-231

English sentence is: stop being nosy
The reference is  ['ne', 'te', 'mele', 'pas', 'de', 'ce', 'qui', 'ne', 'te', 'regarde', 'pas']
The translation is  ['cessez', 'detre', 'cruelle']
Sentence number  732 has a score of  0

English sentence is: tom is merciful
The reference is  ['tom', 'est', 'clement']
The translation is  ['tom', 'est', 'mortrarie']
Sentence number  733 has a score of  1.133422688662942e-154

English sentence is: get started
The reference is  ['commencez']
The translation is  ['deste', 'assis']
Sentence number  734 has a score of  0

English sentence is: who is absent
The reference is  ['qui', 


English sentence is: its his
The reference is  ['cest', 'le', 'sien']
The translation is  ['cest', 'le', 'mien']
Sentence number  776 has a score of  1.133422688662942e-154

English sentence is: hows school
The reference is  ['comment', 'ca', 'se', 'passe', 'a', 'lecole']
The translation is  ['cest', 'chan', 'aux']
Sentence number  777 has a score of  0

English sentence is: are you an idiot
The reference is  ['estu', 'idiote']
The translation is  ['etesvous', 'jolouse']
Sentence number  778 has a score of  0

English sentence is: have you slept
The reference is  ['astu', 'dormi']
The translation is  ['avezvous', 'ete', 'blessees']
Sentence number  779 has a score of  0

English sentence is: theyll call
The reference is  ['ils', 'vont', 'telephoner']
The translation is  ['elles', 'vont', 'va']
Sentence number  780 has a score of  1.384292958842266e-231

English sentence is: now dont move
The reference is  ['ne', 'bougez', 'plus']
The translation is  ['dinque', 'te', 'se', 'beurte']
Se


English sentence is: youre so mean
The reference is  ['tu', 'es', 'si', 'mechante']
The translation is  ['vous', 'etes', 'si', 'mechantes']
Sentence number  823 has a score of  1.2882297539194154e-231

English sentence is: i am taller
The reference is  ['je', 'suis', 'plus', 'grand']
The translation is  ['je', 'suis', 'confus']
Sentence number  824 has a score of  8.121328445417258e-155

English sentence is: today was fun
The reference is  ['aujourdhui', 'cetait', 'sympa']
The translation is  ['les', 'cheveux', 'sont', 'moursses']
Sentence number  825 has a score of  0

English sentence is: hes gone senile
The reference is  ['il', 'est', 'devenu', 'gateux']
The translation is  ['il', 'est', 'accupe', 'a', 'maige']
Sentence number  826 has a score of  8.38826642100846e-155

English sentence is: they left
The reference is  ['ils', 'sont', 'partis']
The translation is  ['elles', 'ont', 'rentre']
Sentence number  827 has a score of  0

English sentence is: thats a pencil
The reference is 


English sentence is: tom got confused
The reference is  ['tom', 'sest', 'embrouille']
The translation is  ['tom', 'a', 'des', 'primessantes']
Sentence number  868 has a score of  1.2882297539194154e-231

English sentence is: tom has sheep
The reference is  ['tom', 'a', 'des', 'moutons']
The translation is  ['tom', 'a', 'des', 'choussures']
Sentence number  869 has a score of  8.636168555094496e-78

English sentence is: i feel stupid
The reference is  ['je', 'me', 'sens', 'stupide']
The translation is  ['je', 'me', 'sens', 'senti', 'temmine']
Sentence number  870 has a score of  6.86809206056511e-78

English sentence is: dont resist
The reference is  ['ne', 'resiste', 'pas']
The translation is  ['ne', 'soyez', 'pas', 'fache']
Sentence number  871 has a score of  1.5319719891192393e-231

English sentence is: she is mad at me
The reference is  ['elle', 'est', 'en', 'colere', 'contre', 'moi']
The translation is  ['elle', 'est', 'mon', 'agne']
Sentence number  872 has a score of  5.7807895


English sentence is: hes out of town
The reference is  ['il', 'nest', 'pas', 'en', 'ville']
The translation is  ['il', 'est', 'en', 'train', 'de', 'pleurer']
Sentence number  914 has a score of  1.384292958842266e-231

English sentence is: that looks easy
The reference is  ['ca', 'semble', 'facile']
The translation is  ['ca', 'semble', 'ecrit']
Sentence number  915 has a score of  1.133422688662942e-154

English sentence is: were even
The reference is  ['nous', 'sommes', 'a', 'egalite']
The translation is  ['nous', 'sommes', 'serieuses']
Sentence number  916 has a score of  8.121328445417258e-155

English sentence is: i feel well
The reference is  ['je', 'me', 'sens', 'bien']
The translation is  ['je', 'me', 'sens', 'seul']
Sentence number  917 has a score of  8.636168555094496e-78

English sentence is: im drunk
The reference is  ['je', 'suis', 'ivre']
The translation is  ['je', 'suis', 'confus']
Sentence number  918 has a score of  1.133422688662942e-154

English sentence is: thats d


English sentence is: we were alone
The reference is  ['nous', 'etions', 'seules']
The translation is  ['nous', 'etions', 'en', 'train', 'de', 'mentir']
Sentence number  961 has a score of  7.57965434483665e-155

English sentence is: they found us
The reference is  ['elles', 'nous', 'ont', 'trouvees']
The translation is  ['elles', 'ont', 'defonde']
Sentence number  962 has a score of  1.1795617510369435e-231

English sentence is: i totally agree
The reference is  ['je', 'suis', 'totalement', 'daccord']
The translation is  ['je', 'porle', 'des', 'consaits']
Sentence number  963 has a score of  1.2882297539194154e-231

English sentence is: give me a drink
The reference is  ['donnezmoi', 'quelque', 'chose', 'a', 'boire']
The translation is  ['donnemoi', 'une', 'seconde']
Sentence number  964 has a score of  0

English sentence is: its all wrong
The reference is  ['cest', 'completement', 'faux']
The translation is  ['cest', 'tout', 'ca', 'que', 'je', 'demande']
Sentence number  965 has a s