# **Machine Translation System**

## **Translating from French to Fongbe**

### This project seeks to create a machine translation system from French to Fonge and from French to Ewe.

> A Machine translation (MT) is an automatic translation from one language to another.


In [5]:
## importing libraries
import pandas as pd
import numpy as np
import tensorflow as tf
import os
import re
import string
from string import digits
from sklearn.utils import shuffle
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model
import warnings
warnings.filterwarnings('ignore')

#### **Data Loading and Preprocessing**

In [6]:
test=pd.read_csv('AITest.csv')
train=pd.read_csv('AITrain.csv')
submission=pd.read_csv('AISampleSubmission (1).csv')
test.head()

Unnamed: 0,ID,French,Target_Language
0,ID_AAAAhgRX,Très fière d’elle,Ewe
1,ID_AAGuzGzi,Tous ces grands artistes viendront au Benin po...,Fon
2,ID_AAuiTPkQ,Ce programme va travailler à améliorer les con...,Fon
3,ID_ACYgGXTq,Quels sont les questions récurrentes de ceux ...,Fon
4,ID_AChdWHyF,Grosse bagnolle,Ewe


In [7]:
print(train.shape)
train.head()

(75487, 4)


Unnamed: 0,ID,French,Target_Language,Target
0,ID_AADNDxdl,Mon père,Fon,Tɔ ce
1,ID_AAFQhmDr,Mettez-vous en rang.,Fon,Mi tò miɖéé
2,ID_AAHVDMdq,"Sénégal, Côte d'Ivoire, Guinée, Ghana, on déco...",Ewe,"Sénégal, Côte d'Ivoire, Guinée, Ghana, siwo ƒe..."
3,ID_AAJfVHEH,Son doigt lui fait mal.,Fon,Alɔvi tɔn ɖo vivɛ wɛ
4,ID_AAOJuhzN,La pluie a commencé.,Fon,Jì bɛ́


We have a total of 75487 training examples.

In [8]:
## Cleaning Data
train['French']=train['French'].apply(lambda x: x.lower())
train['French']=train['French'].apply(lambda x: re.sub("'", '', x))
exclude = set(string.punctuation) 
train['French']=train['French'].apply(lambda x: ''.join(ch for ch in x if ch not in exclude))
remove_digits = str.maketrans('', '', digits)
train['French']=train['French'].apply(lambda x: x.translate(remove_digits))
train['French']=train['French'].apply(lambda x: x.strip())
train['French']=train['French'].apply(lambda x: re.sub(" +", " ", x))

# **FON**

In [9]:
## We take out Fon and work with it first.
train_fon=train[train['Target_Language']=='Fon']
train_fon.drop('Target_Language',inplace=True,axis=1)

In [10]:
#Adding start and end tokens
train_fon['Target'] = train_fon['Target'].apply(lambda x : 'START_ '+ x + ' _END')

In [11]:
### Get English and Fon Vocabulary
french_vocab=set()
for french_words in train_fon['French']:
    for word in french_words.split():
        if word not in french_vocab:
            french_vocab.add(word)

fon_vocab=set()
for fon_words in train_fon['Target']:
    for word in fon_words.split():
        if word not in fon_vocab:
            fon_vocab.add(word)

In [12]:
# fon_vocab

In [13]:
#Tokenization
train_fon['length_french_sentence']=train_fon['French'].apply(lambda x:len(x.split(" ")))
train_fon['length_fon_sentence']=train_fon['Target'].apply(lambda x:len(x.split(" ")))

In [14]:
max_length_tar=max(train_fon["length_fon_sentence"])
max_length_src=max(train_fon["length_french_sentence"])
max_length_tar,max_length_src

(71, 52)

In [15]:
train_fon[train_fon['length_fon_sentence']>30].shape

(105, 5)

In [16]:
train_fon.head()

Unnamed: 0,ID,French,Target,length_french_sentence,length_fon_sentence
0,ID_AADNDxdl,mon père,START_ Tɔ ce _END,2,4
1,ID_AAFQhmDr,mettezvous en rang,START_ Mi tò miɖéé _END,3,5
3,ID_AAJfVHEH,son doigt lui fait mal,START_ Alɔvi tɔn ɖo vivɛ wɛ _END,5,7
4,ID_AAOJuhzN,la pluie a commencé,START_ Jì bɛ́ _END,4,4
5,ID_AAOZhyDe,les garçons et les filles sont au cours élémen...,START_ Nyɔnu lε kpo sunnu lε kpo ɖo wemaxomε a...,9,14


In [17]:
#We take sentences with length less than or equal to 20.
train_fon=train_fon[train_fon['length_fon_sentence']<=20]
train_fon=train_fon[train_fon['length_french_sentence']<=20]

In [18]:
## We create encoder and decoder tokens
input_words = sorted(list(french_vocab))
target_words = sorted(list(fon_vocab))
num_encoder_tokens = len(french_vocab)
num_decoder_tokens = len(fon_vocab)
num_encoder_tokens, num_decoder_tokens

(12842, 15520)

In [19]:
num_decoder_tokens += 1 #for zero padding
num_encoder_tokens+=1

In [20]:
#Create index for the tokens
input_token_index = dict([(word, i+1) for i, word in enumerate(input_words)])
target_token_index = dict([(word, i+1) for i, word in enumerate(target_words)])

In [21]:
reverse_input_char_index = dict((i, word) for word, i in input_token_index.items())
reverse_target_char_index = dict((i, word) for word, i in target_token_index.items())

In [22]:
train_fon=shuffle(train_fon)#Shuffle the data
train_fon.head()

Unnamed: 0,ID,French,Target,length_french_sentence,length_fon_sentence
6772,ID_EiTelMfr,fais donc ainsi,START_ Bo bló lĕe _END,3,5
66013,ID_taOUEVMh,cet enfant a une grosse bouche tombante,START_ Vi elɔ ɖo nu gɛjɛɛ _END,7,7
17346,ID_LyORKIaq,alors jallais à pied,START_ Un ɖo yiyi wɛ kpodo afɔ kpo ɖayi _END,4,10
29641,ID_UUmwFxjk,ce qu’il a dit m’a bouleversé,START_ Xó é ɖɔ́ ɔ́ é zɛ̀ hùn nú mì _END,6,11
31493,ID_VjeicDYB,il fait des éclairs,START_ Xɛbyoso kɛ wùn _END,4,5


In [23]:
##Splitting the Data into train and test
from sklearn.model_selection import train_test_split
X, y = train_fon['French'], train_fon['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,random_state=42)
X_train.shape, X_test.shape

((41958,), (10490,))

In [25]:
def generate_batch(X = X_train, y = y_train, batch_size = 128):
    ''' Generate a batch of data '''
    while True:
        for j in range(0, len(X), batch_size):
            encoder_input_data = np.zeros((batch_size, max_length_src),dtype='float32')
            decoder_input_data = np.zeros((batch_size, max_length_tar),dtype='float32')
            decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens),dtype='float32')
            for i, (input_text, target_text) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
                for t, word in enumerate(input_text.split()):
                    encoder_input_data[i, t] = input_token_index[word] # encoder input seq
                for t, word in enumerate(target_text.split()):
                    if t<len(target_text.split())-1:
                        decoder_input_data[i, t] = target_token_index[word] # decoder input seq
                    if t>0:
                        decoder_target_data[i, t - 1, target_token_index[word]] = 1.
            yield([encoder_input_data, decoder_input_data], decoder_target_data)

In [26]:
latent_dim=50

In [27]:
# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

In [28]:
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

In [29]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.summary()


Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, None, 50)     642150      input_1[0][0]                    
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 50)     776050      input_2[0][0]                    
______________________________________________________________________________________________

In [30]:
train_samples = len(X_train)
val_samples = len(X_test)
batch_size = 8
epochs = 20

In [31]:
model.fit_generator(generator = generate_batch(X_train, y_train, batch_size = batch_size),
                    steps_per_epoch = train_samples//batch_size,
                    epochs=epochs,
                    validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
                    validation_steps = val_samples//batch_size)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f3977cfdc50>

In [50]:
model.save_weights('nmt_weights_fongbe.h5')

In [None]:
# model.load_weights('nmt_weights_fongbe.h5')

In [33]:
encoder_model = Model(encoder_inputs, encoder_states)

# Decoder setup
# Below tensors will hold the states of the previous time step
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

dec_emb2= dec_emb_layer(decoder_inputs) #embeddings of the decoder sequence
decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2)

# Final decoder model
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs2] + decoder_states2)

In [34]:
def decode_sequence(input_seq):
    states_value = encoder_model.predict(input_seq)
    target_seq = np.zeros((1,1))
    target_seq[0, 0] = target_token_index['START_']

   
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += ' '+sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '_END' or
           len(decoded_sentence) > 50):
            stop_condition = True

        target_seq = np.zeros((1,1))
        target_seq[0, 0] = sampled_token_index

        states_value = [h, c]

    return decoded_sentence

## **Prediction on Train Set**

In [35]:
train_gen = generate_batch(X_test, y_train, batch_size = 1)
k=-1

In [47]:
k+=1
(input_seq, actual_output), _ = next(train_gen)
decoded_sentence = decode_sequence(input_seq)
print('Input French sentence:', X_train[k:k+1].values[0])
print('Predicted Fon Translation:', decoded_sentence[:-4])

Input French sentence: il coupe l’herbe
Predicted Fon Translation:  Un ná sɔ́ ɖó 


## **Ewe**

In [None]:
train['French']=train['French'].apply(lambda x: x.lower())
train['French']=train['French'].apply(lambda x: re.sub("'", '', x))
exclude = set(string.punctuation) 
train['French']=train['French'].apply(lambda x: ''.join(ch for ch in x if ch not in exclude))
remove_digits = str.maketrans('', '', digits)
train['French']=train['French'].apply(lambda x: x.translate(remove_digits))
train['French']=train['French'].apply(lambda x: x.strip())
train['French']=train['French'].apply(lambda x: re.sub(" +", " ", x))



train_ewe=train[train['Target_Language']=='Ewe']
train_ewe.drop('Target_Language',inplace=True,axis=1)
train_ewe.head()

Unnamed: 0,ID,French,Target
2,ID_AAHVDMdq,sénégal côte divoire guinée ghana on découvre ...,"Sénégal, Côte d'Ivoire, Guinée, Ghana, siwo ƒe..."
6,ID_AARXSjjg,janot se prit à grelotter dès que le soleil se...,Yano dze ƒoƒo esi me ɣe gbe ɖo to eye ya dze ƒ...
13,ID_AAmSrrNh,et cela en une journée sinon rien à manger,"Nawɔe le ŋkekea me. Nemenyυo oa, atdi adɔ"
17,ID_AAsKjVJM,l’idée est partie de deux incidents survenus l...,"nua dze eg,ome tso masɔmasɔ aɖe siwo do mo ɖa ..."
19,ID_ABAUPxlf,mais je souris quand même parce que ça fait pa...,kehã meko nu elabe esia hã le dɔa wɔwɔ me


In [None]:
train_ewe['Target'] = train_ewe['Target'].apply(lambda x : 'START_ '+ x + ' _END')
train_ewe.head()

Unnamed: 0,ID,French,Target
2,ID_AAHVDMdq,sénégal côte divoire guinée ghana on découvre ...,"START_ Sénégal, Côte d'Ivoire, Guinée, Ghana, ..."
6,ID_AARXSjjg,janot se prit à grelotter dès que le soleil se...,START_ Yano dze ƒoƒo esi me ɣe gbe ɖo to eye y...
13,ID_AAmSrrNh,et cela en une journée sinon rien à manger,"START_ Nawɔe le ŋkekea me. Nemenyυo oa, atdi a..."
17,ID_AAsKjVJM,l’idée est partie de deux incidents survenus l...,"START_ nua dze eg,ome tso masɔmasɔ aɖe siwo do..."
19,ID_ABAUPxlf,mais je souris quand même parce que ça fait pa...,START_ kehã meko nu elabe esia hã le dɔa wɔw...


In [None]:
### Get English and Ewe Vocabulary
french_vocab=set()
for french_words in train_ewe['French']:
    for word in french_words.split():
        if word not in french_vocab:
            french_vocab.add(word)

ewe_vocab=set()
for ewe_words in train_ewe['Target']:
    for word in ewe_words.split():
        if word not in ewe_vocab:
            ewe_vocab.add(word)

In [None]:
train_ewe['length_french_sentence']=train_ewe['French'].apply(lambda x:len(x.split(" ")))
train_ewe['length_ewe_sentence']=train_ewe['Target'].apply(lambda x:len(x.split(" ")))
train_ewe.head()

Unnamed: 0,ID,French,Target,length_french_sentence,length_ewe_sentence
2,ID_AAHVDMdq,sénégal côte divoire guinée ghana on découvre ...,"START_ Sénégal, Côte d'Ivoire, Guinée, Ghana, ...",28,18
6,ID_AARXSjjg,janot se prit à grelotter dès que le soleil se...,START_ Yano dze ƒoƒo esi me ɣe gbe ɖo to eye y...,23,21
13,ID_AAmSrrNh,et cela en une journée sinon rien à manger,"START_ Nawɔe le ŋkekea me. Nemenyυo oa, atdi a...",9,10
17,ID_AAsKjVJM,l’idée est partie de deux incidents survenus l...,"START_ nua dze eg,ome tso masɔmasɔ aɖe siwo do...",26,21
19,ID_ABAUPxlf,mais je souris quand même parce que ça fait pa...,START_ kehã meko nu elabe esia hã le dɔa wɔw...,15,12


In [None]:
max_length_tar=max(train_ewe["length_ewe_sentence"])
max_length_src=max(train_ewe["length_french_sentence"])
max_length_src,max_length_tar

(104, 69)

In [None]:
train_ewe=train_ewe[train_ewe['length_ewe_sentence']<=30]
train_ewe=train_ewe[train_ewe['length_french_sentence']<=30]



input_words = sorted(list(french_vocab))
target_words = sorted(list(ewe_vocab))
num_encoder_tokens = len(french_vocab)
num_decoder_tokens = len(ewe_vocab)
num_encoder_tokens, num_decoder_tokens


(30142, 30180)

In [None]:
num_decoder_tokens += 1 #for zero padding
# num_encoder_tokens+=1


In [None]:
input_token_index = dict([(word, i+1) for i, word in enumerate(input_words)])
target_token_index = dict([(word, i+1) for i, word in enumerate(target_words)])


reverse_input_char_index = dict((i, word) for word, i in input_token_index.items())
reverse_target_char_index = dict((i, word) for word, i in target_token_index.items())


train_ewe=shuffle(train_ewe)
train_ewe.head()


Unnamed: 0,ID,French,Target,length_french_sentence,length_ewe_sentence
57700,ID_njkyALVW,et encore…il y a également beaucoup de noms qu...,START_ eye ŋkɔ geɖewo hã li siwoe fluaa ame _END,13,10
65877,ID_tVlRdnDX,et je lui ferai rendre ma rivière,START_ Eye mana woa trɔ nye tɔsisia nam _END,7,9
62748,ID_rLYozhHO,com la pauvreté est la principale cause de la ...,START_ ahedada enye zãnuɖuɖu suetɔ ƒe dzotsoƒ...,11,8
37182,ID_ZXdXJeIU,il ne reviendra plus à moins qu’il décide fini...,START_ wobe magagbnɔ gbeɖe o ne mele didim be ...,9,16
69473,ID_vqjYFwtY,ces trois religions semblaient très distinctes...,START_ xɔseha etɔ̃ siawo dze abe woto vovo neg...,17,20


In [None]:
from sklearn.model_selection import train_test_split

X, y = train_ewe['French'], train_ewe['Target']
X_train_ewe, X_test_ewe, y_train_ewe, y_test_ewe = train_test_split(X, y, test_size = 0.2,random_state=42)
X_train_ewe.shape, X_test_ewe.shape


In [None]:
def generate_batch(X = X_train_ewe, y = y_train_ewe, batch_size = 128):
    ''' Generate a batch of data '''
    while True:
        for j in range(0, len(X), batch_size):
            encoder_input_data = np.zeros((batch_size, max_length_src),dtype='float32')
            decoder_input_data = np.zeros((batch_size, max_length_tar),dtype='float32')
            decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens),dtype='float32')
            for i, (input_text, target_text) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
                for t, word in enumerate(input_text.split()):
                    encoder_input_data[i, t] = input_token_index[word] # encoder input seq
                for t, word in enumerate(target_text.split()):
                    if t<len(target_text.split())-1:
                        decoder_input_data[i, t] = target_token_index[word] # decoder input seq
                    if t>0:                        
                        decoder_target_data[i, t - 1, target_token_index[word]] = 1.
            yield([encoder_input_data, decoder_input_data], decoder_target_data)


In [None]:
latent_dim=50
# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
encoder_states = [state_h, state_c]


decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)


model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.summary()


Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, None, 50)     1507100     input_1[0][0]                    
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 50)     1509050     input_2[0][0]                    
______________________________________________________________________________________________

In [None]:
train_samples = len(X_train_ewe)
val_samples = len(X_test_ewe)
batch_size = 8
epochs = 20

In [None]:
model.fit_generator(generator = generate_batch(X_train_ewe, y_train_ewe, batch_size = batch_size),
                    steps_per_epoch = train_samples//batch_size,
                    epochs=epochs,
                    validation_data = generate_batch(X_test_ewe, y_test_ewe, batch_size = batch_size),
                    validation_steps = val_samples//batch_size)



Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f8127715a50>

In [None]:
##We save the weights so that we can use them later
model.save_weights('nmt_weights2.h5')

In [None]:
#We load the weights
# model.load_weights('nmt_weights2.h5')

In [None]:
encoder_model = Model(encoder_inputs, encoder_states)

# Decoder setup
# Below tensors will hold the states of the previous time step
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

dec_emb2= dec_emb_layer(decoder_inputs) # Get the embeddings of the decoder sequence

# To predict the next word in the sequence, set the initial states to the states from the previous time step
decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2) 

# Final decoder model
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs2] + decoder_states2)


def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1,1))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0] = target_token_index['START_']

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += ' '+sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '_END' or
           len(decoded_sentence) > 50):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1,1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

## **Prediction on Train Set**

In [None]:

train_gen = generate_batch(X_train_ewe, y_train_ewe, batch_size = 1)
k=-1

In [None]:
#predict on the train set
k+=1
(input_seq, actual_output), _ = next(train_gen)
decoded_sentence = decode_sequence(input_seq)
print('Input French sentence:', X_train_ewe[k:k+1].values[0])
print('Actual Ewe Translation:', y_train_ewe[k:k+1].values[0][6:-4])
print('Predicted Ewe Translation:', decoded_sentence[:-4])

Input French sentence: dans une interview au daily express des aliments l’expert a indiqué que changer ses habitudes alimentaires en prenant un avocat par jour permettrait de lutter contre la chute des cheveux
Actual Ewe Translation:  le nyanyanana aɖe si wona Dayli express la, egblɔ be peyaɖuɖu gbe sia gbe la nana wotoa ɖa 
Predicted Ewe Translation:  le na la na be la, ame siwo wole ƒe be la le eƒe


The model was able to get some words right.

## **Translation to Ewe**

In [None]:
test.head()

Unnamed: 0,ID,French,Target_Language
0,ID_AAAAhgRX,Très fière d’elle,Ewe
1,ID_AAGuzGzi,Tous ces grands artistes viendront au Benin po...,Fon
2,ID_AAuiTPkQ,Ce programme va travailler à améliorer les con...,Fon
3,ID_ACYgGXTq,Quels sont les questions récurrentes de ceux ...,Fon
4,ID_AChdWHyF,Grosse bagnolle,Ewe


In [None]:
#Getting Ewe from the test data
test_ewe=test[test['Target_Language']=='Ewe']


In [None]:
test_ewe.head()

Unnamed: 0,ID,French,Target_Language
0,ID_AAAAhgRX,Très fière d’elle,Ewe
4,ID_AChdWHyF,Grosse bagnolle,Ewe
11,ID_AHBSoUNL,Les seins comme ça… » Basta,Ewe
14,ID_AHycIkQv,Lire aussi Pensées Nocturnes,Ewe
16,ID_AIWTdKBT,"voir même de la positivité, de la gaieté et po...",Ewe


In [None]:
len(test_ewe),len(test_fon)

(2964, 2929)

In [None]:
translated_text=[]
french_language=[]
for i in range(len(test_ewe['French'])):
  k+=1
  (input_seq, actual_output), _ = next(train_gen)
  decoded_sentence= decode_sequence(input_seq)
  french_sent=test_ewe.French[k:k+1].values[0]
  translation=decoded_sentence[:-4]
  french_language.append(french_sent)
  translated_text.append(translation)

In [None]:
test_ewe['Target']=translated_text
test_ewe.head()

Unnamed: 0,ID,French,Target_Language,Target
0,ID_AAAAhgRX,Très fière d’elle,Ewe,nye be nye me le wo nu sia me
4,ID_AChdWHyF,Grosse bagnolle,Ewe,la le esi menɔ esi menɔ esi menɔ me o
11,ID_AHBSoUNL,Les seins comme ça… » Basta,Ewe,ele be nu si le eƒe me be nye la le eƒe gbe me ɖ
14,ID_AHycIkQv,Lire aussi Pensées Nocturnes,Ewe,nye le eƒe me la ame o eye be ame siwo wole la w
16,ID_AIWTdKBT,"voir même de la positivité, de la gaieté et po...",Ewe,esi nye be be ame aɖe si le eƒe le eƒe me be l


In [None]:
test_ewe.drop(['Target_Language'],inplace=True,axis=1)
test_ewe.head()

Unnamed: 0,ID,French,Target
0,ID_AAAAhgRX,Très fière d’elle,nye be nye me le wo nu sia me
4,ID_AChdWHyF,Grosse bagnolle,la le esi menɔ esi menɔ esi menɔ me o
11,ID_AHBSoUNL,Les seins comme ça… » Basta,ele be nu si le eƒe me be nye la le eƒe gbe me ɖ
14,ID_AHycIkQv,Lire aussi Pensées Nocturnes,nye le eƒe me la ame o eye be ame siwo wole la w
16,ID_AIWTdKBT,"voir même de la positivité, de la gaieté et po...",esi nye be be ame aɖe si le eƒe le eƒe me be l


In [None]:
print(test_ewe.shape)

(2964, 3)


In [None]:
test_ewe[["ID","Target"]].to_csv("ewe1.csv",index=False)

## **Conclusion**


>References:https://www.kaggle.com/aiswaryaramachandran/english-to-hindi-neural-machine-translation