## English To Hindi Translator Model 

In the technically progressive era, scaling the capabilities of LLMs and LLM-based architectures, this project is an attempt to create an English to Hindi translator, by constructing an encoder-decoder architecture. 

#### Dataset : 
The dataset utilised was developed by IITB since 2016 at the Centre for Indian Language Technology, IITB. Different derivative corpus of the dataset are available, however, the dataset present on HuggingFace consists of 1,662,110 rows. Due to computational constraints, I have restricted my dataset to only 2500 rows, which consists of shuffled and mid to long sentences.

#### Encoder-Decoder Model :
Encoder-Decoder models are basically neural network architectures, making use of architectures like RNNs and LSTMs for tasks like machine translation. The encoder part of the architecture takes in the input sequence in one language, generates the context vector. The decoder accepts the context vector as an input and generates the desired output sequence, in the other language. 

#### Possibilities :
Whilst I have restricted to the encoder-decoder architecture only, attention layers could be also added in the architecture to make the translator more context specific, thus progressing to more of a transformer-like architecture.

In [268]:
import gc

In [269]:
gc.collect()

4183

### Importing Libraries

In [270]:
import numpy as np
import pandas as pd

In [271]:
import re
import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
from nltk.corpus import indian
import matplotlib.pyplot as plt
from keras.models import Sequential,Model
from keras.layers import Input,Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.sequence import pad_sequences
import string
import contractions
from nltk.corpus import stopwords
import numpy as np
import pandas as pd 

### Analysing Dataset

In [272]:
data = pd.read_csv('trans_data.csv')

In [273]:
data.sample(5)

Unnamed: 0,English,Hindi
5882,13 . The person must be additional to your nor...,13 व्यक्ति आप के आम स्टॉफ की ज़रुरतों के अतिरि...
2094,A river runs down through the valley.,वादी में से एक नदी बहती है।
5777,"If , after an enquiry , the Speaker is satisfi...",यदि जांच के पश्चात अध्यक्ष का समाधान हो जाता ह...
716,Don't say such a thing.,ऐसी बात मत बोलो।
1552,He walks his dog every morning.,वह हर सुबह अपने कुत्ते को सैर पर ले जाता है।


In [274]:
new_data = data.sample(2500)

In [275]:
del data

In [276]:
new_data.describe()

Unnamed: 0,English,Hindi
count,2500,2494
unique,2470,2464
top,(Laughter),(हंसी)
freq,10,5


In [277]:
new_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2500 entries, 3779 to 5426
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   English  2500 non-null   object
 1   Hindi    2494 non-null   object
dtypes: object(2)
memory usage: 58.6+ KB


In [278]:
new_data.dropna(inplace = True)

### Data Cleaning and Preprocessing

In [279]:
def remove_html(text):
    if isinstance(text,str):
        pattern = re.compile('<.*?>')
        return pattern.sub(r'',text)
    else:
        return text

In [280]:
def remove_url(text):
    if isinstance(text,str):
        pattern = re.compile(r'https?://\S+|www\.\S+')
        return pattern.sub(r'',text)
    else:
        return

In [281]:
def preprocess_text(text, language='english'):
    if not isinstance(text, str):
        return text
    if language == 'english':
        pattern = re.compile(r'[^a-zA-Z0-9\s]')
        return pattern.sub(r'', text)
    elif language == 'hindi':
        pattern = re.compile(r'[^\u0900-\u097F\s]')
        return pattern.sub(r'', text)
    else:
        raise ValueError("Unsupported Language, Supported languages are 'english' and 'hindi'")

In [282]:
new_data.rename(columns = {'English' : 'english', 'Hindi' : 'hindi'}, inplace = True)

In [283]:
new_data['english'] = new_data['english'].apply(remove_html)
new_data["hindi"] = new_data["hindi"].apply(remove_html)

In [284]:
new_data['english'] = new_data['english'].apply(remove_url)
new_data["hindi"] = new_data["hindi"].apply(remove_url)

In [285]:
new_data['english'] = new_data['english'].apply(lambda x: preprocess_text(x, language='english'))
new_data['hindi'] = new_data['hindi'].apply(lambda x: preprocess_text(x, language='hindi'))

In [286]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [287]:
def get_hindi_punctuations():
    hindi_punctuations = []
    for i in range(0x2000, 0x206f + 1):
        char = chr(i)
        if unicodedata.category(char) == 'Po':
            hindi_punctuations.append(char)
    return ''.join(hindi_punctuations)

In [288]:
hindi_punctuation = get_hindi_punctuations()

In [289]:
hindi_punctuation

'‖‗†‡•‣․‥…‧‰‱′″‴‵‶‷‸※‼‽‾⁁⁂⁃⁇⁈⁉⁊⁋⁌⁍⁎⁏⁐⁑⁓⁕⁖⁗⁘⁙⁚⁛⁜⁝⁞'

In [290]:
def remove_punctuation(text, language = 'english'):
    if language == 'english':
        exclude_english = set(string.punctuation)
        return ''.join(char for char in text if char not in exclude_english)
    elif language == 'hindi':
        return ''.join(char for char in text if char not in hindi_punctuation)
    
    else:
        raise ValueError("Unsupported Language, Supported languages are 'english' and 'hindi'")

In [291]:
new_data['english'] = new_data['english'].apply(lambda x: remove_punctuation(x,language = 'english'))
new_data['hindi'] = new_data['hindi'].apply(lambda x: remove_punctuation(x,language = 'hindi'))

In [292]:
def expand_contractions(text):
    expanded_text = contractions.fix(text)
    return expanded_text

In [293]:
new_data["english"] = new_data["english"].apply(expand_contractions)

In [294]:
new_data.sample(5)

Unnamed: 0,english,hindi
4921,But the difficulties in the way of modernisati...,लेकिन आधुनिकीकरण करने में भी अनेक बाधाएं थीं ज...
4054,His lack of political ambition also makes it e...,उनमें राजनैतिक महत्वाकांक्षाएं न होना भी युद्ध...
522,Do Not open your book,अपनी किताब मत खोलो।
9398,In the year 712 Mohammad Bin Kasim the command...,सन् में फारस के सेनापति मुहम्मद बिन क़ासिम ने...
4587,Akkamahadevi came to Kalyana and met people in...,अक़्कमहादेवी कल्याण पहुंची तथा बसव के घर में ल...


In [295]:
new_data['hindi'] = new_data['hindi'].apply(lambda x : 'start_ ' + x + ' _end')

In [296]:
new_data.sample(3)

Unnamed: 0,english,hindi
4609,When the headlines rolled what happened was,start_ जब सुर्खियों में आती हैतो क्या होता है ...
2520,Everyone could easily see his disappointment,start_ उसकी निराशा सभी आसानी से दिख सकते थे। _end
879,We may not win tomorrow,start_ हम कल शायद नहीं जीतेंगे। _end


### Dictionary and Vocabulary

In [297]:
eng_words = set()
hindi_words = set()

In [298]:
for eng in new_data['english']:
    for word in eng.split():
        if word not in eng_words:
            eng_words.add(word)

for hindi in new_data['hindi']:
    for word in hindi.split():
        if word not in hindi_words:
            hindi_words.add(word)

In [299]:
eng_words

{'partymen',
 'attributes',
 'Reporting',
 'Pakistan',
 'wrinkles',
 'utensils',
 'transport',
 'spines',
 'fourth',
 'Sanders',
 'stubborn',
 'likely',
 'RamakrishnaVivekananda',
 'trail',
 'metric',
 'shade',
 'wonders',
 'farmers',
 'scare',
 'them',
 '1945',
 'negatived',
 'idoltemple',
 'declare',
 'young',
 'counsel',
 'As',
 'wall',
 'Bharata',
 'process',
 'Surdas',
 'maintain',
 'joke',
 'disappointed',
 'prefer',
 'stop',
 'Mohan',
 'transliteration',
 'leading',
 'extension',
 'lotteries',
 'Satyajit',
 'Resignation',
 'Age',
 'uncle',
 'Elysium',
 'range',
 '261',
 'Please',
 'talk',
 'Adhirath',
 'organize',
 'Allahabad',
 'starting',
 'Legislative',
 'rewards',
 'interests',
 'idiom',
 'Bahadur',
 'passing',
 'Birds',
 'legislative',
 'Simla',
 'travel',
 'wisdom',
 'prime',
 'decribe',
 'arches',
 'commtment',
 'then',
 'V',
 'wave',
 'magnify',
 'acquiring',
 'considerations',
 'Premchand',
 'Tudi',
 'before',
 'IF',
 'farm',
 'notions',
 'nothing',
 'coordinating',
 'b

In [300]:
hindi_words

{'संकोच',
 'प्रेरणा',
 'पासपोर्ट',
 'एक',
 'किशोरों',
 'बेल',
 'पडा',
 'सांसदों',
 'भविष्यवाणी',
 'खंडन',
 'उतारचढाव',
 'अगस्त',
 'आजाद',
 'अलक्ष्य',
 'मुझसे',
 'मार्टिन',
 'संख़्या',
 'प्रतिभा',
 'फैट',
 'सुरक्षा',
 'छिपा',
 'नासा',
 'नज़र',
 'खुशी',
 'स्त्रीपुरुषों',
 'परिपत्र',
 'सऋऊण्श्छ्ष्थान',
 'अंडे',
 'दोहरे',
 'नियुक्त',
 'आंखे',
 'भगत',
 'हिस्से',
 'प्रकीर्णन',
 'आँसुओं',
 'विस्तृत',
 'चुनौतियां',
 'गहरे',
 'स्थानस्थान',
 'स्थापना',
 'संसऋऊण्श्छ्ष्ऋतियों',
 'मुयालय',
 'जरिए',
 'दवाई',
 'शाक्यों',
 'बजे',
 'फांसी',
 'दिलानेवाला',
 'अम्पायर',
 'बुद्ध',
 'वाष्पशील',
 'पहुँचाई।',
 'प्रसन्न',
 'कब्जे',
 'आदिवराह',
 'दोहराया',
 'डेटिंग',
 'देखे',
 'स्वभाविक',
 'फरवरी',
 'अनोखे',
 'संस्कृति',
 'दस्ते',
 'कैंसर',
 'सीन',
 'ड्रग्स',
 'युगों',
 'फारस',
 'अर्नेस्ट',
 'संकरण',
 'करकों',
 'एवं',
 'ढली',
 'अतिशय',
 'बनाते',
 'बढ़कर',
 'कारोबार',
 'हुलिया',
 'चक्रवर्ती',
 'कारखाने',
 'मॉडलों',
 'पांचवीं',
 'एस',
 'शीर्ष',
 'समय',
 'नियंत्रित',
 'दावे',
 'ग़्लूकोज',
 'गैरहाजिर',
 'भूख',
 'उड

In [301]:
print("english vocabulary size = ", len(eng_words))
print("hindi vocabulary size = ", len(hindi_words))

english vocabulary size =  7457
hindi vocabulary size =  7768


In [302]:
new_data['length_eng_sentence']=new_data['english'].apply(lambda x:len(x.split(" ")))
new_data['length_hin_sentence']=new_data['hindi'].apply(lambda x:len(x.split(" ")))

In [303]:
new_data.sample(5)

Unnamed: 0,english,hindi,length_eng_sentence,length_hin_sentence
9323,ZealWednesday,start_ जोश बुद्धवार _end,1,4
5381,Kathmandu court ancient templegroup of Palaces...,start_ यूनेस्को की आठ सांस्कृतिक विश्व धरोहरों...,20,21
336,Tell me the truth,start_ मुझे सच्चाई बताओ। _end,4,5
7755,the typical way that ordinary matter does,start_ जैसी सामान्य पदार्थ करते हैं _end,7,7
2203,The train is ten minutes behind today,start_ आज ट्रेन दस मिनट लेट है। _end,7,8


In [304]:
print("maximum length of english language = ", max(new_data['length_eng_sentence']))

maximum length of english language =  121


In [305]:
print("maximum length of hindi language = ", max(new_data['length_hin_sentence']))

maximum length of hindi language =  141


### Model Prerequisites

In [342]:
max_len_src = max(new_data['length_eng_sentence'])
max_len_tar = max(new_data['length_hin_sentence'])

In [307]:
input_words = sorted(list(eng_words))
target_words = sorted(list(hindi_words))

In [308]:
num_encoder_tokens = len(eng_words)
num_decoder_tokens = len(hindi_words)

num_encoder_tokens, num_decoder_tokens

(7457, 7768)

In [309]:
num_decoder_tokens += 1 
num_encoder_tokens += 1
#zero padding

In [310]:
input_token_index = dict([(word, i+1) for i, word in enumerate(input_words)])
target_token_index = dict([(word, i+1) for i, word in enumerate(target_words)])

In [311]:
reverse_input_char_index = dict((i, word) for word, i in input_token_index.items())
reverse_target_char_index = dict((i, word) for word, i in target_token_index.items())

In [312]:
from sklearn.utils import shuffle

In [313]:
new_data = shuffle(new_data)

In [314]:
new_data.head()

Unnamed: 0,english,hindi,length_eng_sentence,length_hin_sentence
6724,The devout who have set up ashrams by the doze...,start_ जिन साधुओं ने इस पर्वतीय इलके में करीब ...,31,32
2020,All of my kids want to learn French,start_ मेरे सभी बच्चें फ़्रेंच सीखना चाहते हैं...,8,9
2226,He embezzled the money from his office,start_ उसने अपने दफ़तर के पैसों को गबन किया। _end,7,10
5403,done in collaboration with Danish artist Soren...,start_ जो डेनिश सहयोगी कलाकार सोरेन पोर्स के स...,8,13
5138,It is doubtful if Vajpayee can return Wahid s ...,start_ अब इसमें संदेह है कि वाजपेयी अपने सरकार...,15,20


In [315]:
x, y = new_data['english'], new_data['hindi']

In [316]:
type(x), type(y)

(pandas.core.series.Series, pandas.core.series.Series)

In [317]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 42)

In [318]:
print(x_train.shape)
print(x_test.shape)

(1995,)
(499,)


In [319]:
print(y_train.shape)
print(y_test.shape)

(1995,)
(499,)


In [321]:
def preprocess_data1(X, y, max_len_src, max_len_tar, num_decoder_tokens, input_token_index, target_token_index):
    ''' Preprocess the data for encoder-decoder model '''
    encoder_input_data = np.zeros((len(X), max_len_src), dtype='float16')
    decoder_input_data = np.zeros((len(y), max_len_tar), dtype='float16')
    decoder_target_data = np.zeros((len(y), max_len_tar, num_decoder_tokens), dtype='float16')

    for i, (input_text, target_text) in enumerate(zip(X, y)):
        for t, word in enumerate(input_text.split()):
            if word in input_token_index:
                encoder_input_data[i, t] = input_token_index[word]  # encoder input seq
            else:
                encoder_input_data[i, t] = input_token_index['<UNK>']  # handle out-of-vocabulary words
            
        for t, word in enumerate(target_text.split()):
            if word in target_token_index:
                decoder_input_data[i, t] = target_token_index[word]  # decoder input seq
                if t > 0:
                    decoder_target_data[i, t - 1, target_token_index[word]] = 1.
            else:
                decoder_input_data[i, t] = target_token_index['<UNK>']  # handle out-of-vocabulary words
    
    return [encoder_input_data, decoder_input_data], decoder_target_data

In [322]:
[x_train_new, x_train_decoder_input], y_train_new = preprocess_data1(x_train, y_train, max_len_src, max_len_tar, num_decoder_tokens, input_token_index, target_token_index)

In [323]:
[x_test_new, x_test_decoder_input], y_test_new = preprocess_data1(x_test, y_test, max_len_src, max_len_tar, num_decoder_tokens, input_token_index, target_token_index)

In [324]:
x_train_new, x_train

(array([[1991., 2346., 6536., ...,    0.,    0.,    0.],
        [ 911., 4720., 2048., ...,    0.,    0.,    0.],
        [1854., 7228., 3230., ...,    0.,    0.,    0.],
        ...,
        [1780., 5328., 6868., ...,    0.,    0.,    0.],
        [1071.,  564., 3000., ...,    0.,    0.,    0.],
        [1638.,    0.,    0., ...,    0.,    0.,    0.]], dtype=float16),
 1618                      We are sorry we cannot help you
 907                              He knows a lot of people
 7112    The virtuous cycle of rising stock prices  whi...
 4540    The share of consumer electronics was to decli...
 7731                                  is about the future
                               ...                        
 4609          When the headlines rolled what happened was
 5401    Their food habits are simple and change from r...
 6952                             Structure of the Council
 6189    Kalpana Chaval completed her primary education...
 3314                                 

In [325]:
latent_dim = 300

### Model Architecture

In [326]:
#encoder
encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
encoder_states = [state_h, state_c]

In [327]:
#decoder with encoder-states as initial states
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

In [328]:
from keras.losses import SparseCategoricalCrossentropy
from keras.losses import CategoricalCrossentropy

In [329]:
model.compile(optimizer='rmsprop', loss=CategoricalCrossentropy(from_logits=False), metrics = ['accuracy'])

In [330]:
model.summary()

In [331]:
from keras.callbacks import EarlyStopping, ModelCheckpoint

In [332]:
model.fit(
    [x_train_new, x_train_decoder_input], y_train_new,
    batch_size=64,
    epochs=25,
    validation_data=([x_test_new, x_test_decoder_input], y_test_new)
)

Epoch 1/25
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m112s[0m 3s/step - accuracy: 0.0060 - loss: 8.1738 - val_accuracy: 0.0040 - val_loss: 6.7417
Epoch 2/25
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m85s[0m 3s/step - accuracy: 0.0068 - loss: 6.6219 - val_accuracy: 0.0071 - val_loss: 6.6464
Epoch 3/25
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m90s[0m 3s/step - accuracy: 0.0075 - loss: 6.5042 - val_accuracy: 0.0096 - val_loss: 6.5945
Epoch 4/25
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 2s/step - accuracy: 0.0090 - loss: 6.3927 - val_accuracy: 0.0098 - val_loss: 6.5677
Epoch 5/25
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 3s/step - accuracy: 0.0093 - loss: 6.3590 - val_accuracy: 0.0097 - val_loss: 6.5304
Epoch 6/25
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m89s[0m 3s/step - accuracy: 0.0093 - loss: 6.3201 - val_accuracy: 0.0094 - val_loss: 6.5222
Epoch 7/25
[1m32/32[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x196e2b27250>

In [333]:
model.save_weights('translation_model.weights.h5')

### Testing the Model 

In [343]:
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
dec_emb2= dec_emb_layer(decoder_inputs) 

# To predict the next word in the sequence, set the initial states to the states from the previous time step
decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2) # A dense softmax layer to generate prob dist. over the target vocabulary

# Final decoder model
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs2] + decoder_states2)

In [344]:
def decode_sequence(input_seq):
    states_value = encoder_model.predict(input_seq)
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1,1))
    target_seq[0, 0] = target_token_index['start_']

    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += ' '+sampled_char

        if (sampled_char == '_end' or
           len(decoded_sentence) > 50):
            stop_condition = True
        target_seq = np.zeros((1,1))
        target_seq[0, 0] = sampled_token_index
        states_value = [h, c]

    return decoded_sentence

In [336]:
def generate_batch(X = x_train, y = y_train, batch_size = 128):
    ''' Generate a batch of data '''
    while True:
        for j in range(0, len(X), batch_size):
            encoder_input_data = np.zeros((batch_size, max_len_src),dtype='float32')
            decoder_input_data = np.zeros((batch_size, max_len_tar),dtype='float32')
            decoder_target_data = np.zeros((batch_size, max_len_tar, num_decoder_tokens),dtype='float32')
            for i, (input_text, target_text) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
                for t, word in enumerate(input_text.split()):
                    encoder_input_data[i, t] = input_token_index[word] # encoder input seq
                for t, word in enumerate(target_text.split()):
                    if t<len(target_text.split())-1:
                        decoder_input_data[i, t] = target_token_index[word] # decoder input seq
                    if t>0:
                        # decoder target sequence (one hot encoded)
                        # does not include the START_ token
                        # Offset by one timestep
                        decoder_target_data[i, t - 1, target_token_index[word]] = 1.
            yield([encoder_input_data, decoder_input_data], decoder_target_data)

In [337]:
gen = generate_batch(x_train, y_train, batch_size = 1)
k = -1

In [339]:
x_train, x_train_new

(1618                      We are sorry we cannot help you
 907                              He knows a lot of people
 7112    The virtuous cycle of rising stock prices  whi...
 4540    The share of consumer electronics was to decli...
 7731                                  is about the future
                               ...                        
 4609          When the headlines rolled what happened was
 5401    Their food habits are simple and change from r...
 6952                             Structure of the Council
 6189    Kalpana Chaval completed her primary education...
 3314                                      SanginiSaturday
 Name: english, Length: 1995, dtype: object,
 array([[1991., 2346., 6536., ...,    0.,    0.,    0.],
        [ 911., 4720., 2048., ...,    0.,    0.,    0.],
        [1854., 7228., 3230., ...,    0.,    0.,    0.],
        ...,
        [1780., 5328., 6868., ...,    0.,    0.,    0.],
        [1071.,  564., 3000., ...,    0.,    0.,    0.],
        

In [340]:
k+=1
(input_seq, actual_output), _ = next(gen)
decoded_sentence = decode_sequence(input_seq)
print('Input English sentence:', x_train[k:k+1].values[0])
print('Actual Hindi Translation:', y_train[k:k+1].values[0][6:-4])
print('Predicted Hindi Translation:', decoded_sentence[:-4])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 752ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
Input English sentence: We are sorry we cannot help you
Actual Hindi Translation:  माफ़ कीजिए पर हम आपकी मदद नहीं कर सकते। 
Predicted Hindi Translation:  मैं में में में 
