# **English To Hindi Neural Translation**

## **Description:**

English to Hindi neural translation refers to the use of neural machine translation (NMT) models to automatically translate text from the English language to the Hindi language. It is a type of artificial intelligence (AI) technology that leverages deep learning techniques, particularly neural networks, to perform high-quality and context-aware translation between these two languages.

Here's how English to Hindi neural translation typically works:

1. **Neural Networks**: NMT models use neural networks, particularly recurrent neural networks (RNNs) or transformer models, which have shown significant improvements in machine translation tasks.

2. **Training Data**: The NMT model is trained on a large dataset containing parallel text from English and Hindi. These parallel texts consist of sentences or phrases in both languages with corresponding translations.

3. **Word Embeddings**: Words in both languages are represented as vectors (word embeddings) in a shared vector space. This allows the model to learn relationships between words and their translations.

4. **Encoder-Decoder Architecture**: The neural network architecture consists of an encoder and a decoder. The encoder takes the input sentence in English and encodes it into a fixed-length representation (context vector). The decoder then uses this context vector to generate the corresponding sentence in Hindi.

5. **Attention Mechanism**: Many modern NMT models use attention mechanisms, such as the one found in the transformer architecture. Attention helps the model focus on different parts of the input sentence when generating the output, improving translation quality.

6. **Training Objective**: During training, the model learns to minimize the translation loss, which measures the dissimilarity between the predicted translation and the ground truth translation in the training data.

7. **Inference**: In the inference phase, the trained model is used to translate English text into Hindi. The encoder processes the input sentence, and the decoder generates the corresponding Hindi translation.

English to Hindi neural translation, like other neural machine translation tasks, has made significant strides in recent years due to advancements in deep learning and the availability of large parallel datasets. These models are capable of producing more fluent and contextually accurate translations compared to earlier statistical machine translation methods.

These neural translation models have numerous practical applications, including language translation services, localization of software and content, cross-lingual information retrieval, and more. They have made it easier for people to access information and communicate across language barriers.

# Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import string
import os
import time
import warnings
warnings.filterwarnings("ignore")


# Reading Datasets

In [2]:
data=pd.read_csv("Hindi_English_Truncated_Corpus.csv",encoding='utf-8')
data.head(20)

Unnamed: 0,source,english_sentence,hindi_sentence
0,ted,politicians do not have permission to do what ...,"राजनीतिज्ञों के पास जो कार्य करना चाहिए, वह कर..."
1,ted,"I'd like to tell you about one such child,",मई आपको ऐसे ही एक बच्चे के बारे में बताना चाहू...
2,indic2012,This percentage is even greater than the perce...,यह प्रतिशत भारत में हिन्दुओं प्रतिशत से अधिक है।
3,ted,what we really mean is that they're bad at not...,हम ये नहीं कहना चाहते कि वो ध्यान नहीं दे पाते
4,indic2012,.The ending portion of these Vedas is called U...,इन्हीं वेदों का अंतिम भाग उपनिषद कहलाता है।
5,tides,The then Governor of Kashmir resisted transfer...,कश्मीर के तत्कालीन गवर्नर ने इस हस्तांतरण का व...
6,indic2012,In this lies the circumstances of people befor...,इसमें तुमसे पूर्व गुज़रे हुए लोगों के हालात हैं।
7,ted,"And who are we to say, even, that they are wrong",और हम होते कौन हैं यह कहने भी वाले कि वे गलत हैं
8,indic2012,“”Global Warming“” refer to warming caused in ...,ग्लोबल वॉर्मिंग से आशय हाल ही के दशकों में हुई...
9,tides,You may want your child to go to a school that...,हो सकता है कि आप चाहते हों कि आप का नऋर्नमेनटे...


# Data Exploration

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 127607 entries, 0 to 127606
Data columns (total 3 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   source            127607 non-null  object
 1   english_sentence  127605 non-null  object
 2   hindi_sentence    127607 non-null  object
dtypes: object(3)
memory usage: 2.9+ MB


In [4]:
data.shape

(127607, 3)

* We can see Only 3 features avialable

In [5]:
data['source'].value_counts()

source
tides        50000
ted          39881
indic2012    37726
Name: count, dtype: int64

* We need to make one constant source 

In [6]:
data=data[data['source']=='ted']
data.head(11)

Unnamed: 0,source,english_sentence,hindi_sentence
0,ted,politicians do not have permission to do what ...,"राजनीतिज्ञों के पास जो कार्य करना चाहिए, वह कर..."
1,ted,"I'd like to tell you about one such child,",मई आपको ऐसे ही एक बच्चे के बारे में बताना चाहू...
3,ted,what we really mean is that they're bad at not...,हम ये नहीं कहना चाहते कि वो ध्यान नहीं दे पाते
7,ted,"And who are we to say, even, that they are wrong",और हम होते कौन हैं यह कहने भी वाले कि वे गलत हैं
13,ted,So there is some sort of justice,तो वहाँ न्याय है
23,ted,This changed slowly,धीरे धीरे ये सब बदला
26,ted,were being produced.,उत्पन्न नहीं कि जाती थी.
30,ted,"And you can see, this LED is going to glow.","और जैसा आप देख रहे है, ये एल.ई.डी. जल उठेगी।"
32,ted,to turn on the lights or to bring him a glass ...,"लाईट जलाने के लिए या उनके लिए पानी लाने के लिए,"
35,ted,Can you imagine saying that?,क्या आप ये कल्पना कर सकते है


* Let's check null value in datasets

In [7]:
data.isnull().sum()

source              0
english_sentence    0
hindi_sentence      0
dtype: int64

In [8]:
data.isnull().sum().sum()

0

* No null value present

In [9]:
# Remove all rows where the 'english_sentence' column is null
data = data[~pd.isnull(data['english_sentence'])]

In [10]:
data.duplicated().sum()

1078

* Using duplicate data not make any sense so remove it

In [11]:
data.drop_duplicates(inplace=True)
data.shape

(38803, 3)

### Taking traing sample data from data

In [12]:
data=data.sample(n=25000,random_state=42)
data.shape

(25000, 3)

### Making all sentences in lowercase to avoid error

In [13]:
data['english_sentence']=data['english_sentence'].apply(lambda x: x.lower())
data['hindi_sentence']=data['hindi_sentence'].apply(lambda x: x.lower())

### Remove quotes

In [14]:
data['english_sentence']=data['english_sentence'].apply(lambda x: re.sub("'",'',x))
data['hindi_sentence']=data['hindi_sentence'].apply(lambda x: re.sub("'",'',x))

In [15]:
exclude = set(string.punctuation) # Set of all special characters
# Remove all the special characters
data['english_sentence']=data['english_sentence'].apply(lambda x: ''.join(ch for ch in x if ch not in exclude))
data['hindi_sentence']=data['hindi_sentence'].apply(lambda x: ''.join(ch for ch in x if ch not in exclude))
    

In [16]:
# Remove all numbers from text
remove_digits = str.maketrans('', '', string.digits)
data['english_sentence']=data['english_sentence'].apply(lambda x: x.translate(remove_digits))
data['hindi_sentence']=data['hindi_sentence'].apply(lambda x: x.translate(remove_digits))
data['hindi_sentence']=data['hindi_sentence'].apply(lambda x: re.sub("[२३०८१५७९४६]", "", x))


In [17]:
# Remove extra spaces
data['english_sentence']=data['english_sentence'].apply(lambda x: x.strip())
data['hindi_sentence']=data['hindi_sentence'].apply(lambda x: x.strip())
data['english_sentence']=data['english_sentence'].apply(lambda x: re.sub(" +", " ", x))
data['hindi_sentence']=data['hindi_sentence'].apply(lambda x: re.sub(" +", " ", x))


### Add start and end tokens to target sequences

In [18]:
data['hindi_sentence'] = data['hindi_sentence'].apply(lambda x : 'START_ '+ x + ' _END')
data.head(11)

Unnamed: 0,source,english_sentence,hindi_sentence
82040,ted,we still dont know who her parents are who she is,START_ हम अभी तक नहीं जानते हैं कि उसके मातापि...
85038,ted,no keyboard,START_ कोई कुंजीपटल नहीं _END
58018,ted,but as far as being a performer,START_ लेकिन एक कलाकार होने के साथ _END
74470,ted,and this particular balloon,START_ और यह खास गुब्बारा _END
122330,ted,and its not as hard as you think integrate cli...,START_ और जितना आपको लगता है यह उतना कठिन नहीं...
79517,ted,and saw the demo by jeff han,START_ और जेफ़ हान द्वारा प्रदर्शन देखा होगा _END
23089,ted,this baby is fully electric,START_ यह बच्चा पूरी तरह से बिजली से चलता है _END
116699,ted,kids have no or very little say in making the ...,START_ फिर भी बच्चों को नियम बनाने का बिलकुल न...
64132,ted,im going to add a little bit to my description...,START_ मै अपने द्वारा दिए गए उम्र के बढ़्ने के...
74513,ted,expands and cools until it gets to the point w...,START_ फैलने लगता है फिर ये ठंडा होकर उस अवस्थ...


### Get English and Hindi Vocabulary

In [19]:
all_eng_words=set()
for eng in data['english_sentence']:
    for word in eng.split():
        if word not in all_eng_words:
            all_eng_words.add(word)
            
            
all_hindi_words=set()
for hin in data['hindi_sentence']:
    for word in hin.split():
        if word not in all_hindi_words:
            all_hindi_words.add(word)

In [20]:
len(all_eng_words)

14030

In [21]:
len(all_hindi_words)

17540

In [22]:
data['length_eng_sentence']=data['english_sentence'].apply(lambda x:len(x.split(" ")))
data['length_hin_sentence']=data['hindi_sentence'].apply(lambda x:len(x.split(" ")))


In [23]:
data.head(11)

Unnamed: 0,source,english_sentence,hindi_sentence,length_eng_sentence,length_hin_sentence
82040,ted,we still dont know who her parents are who she is,START_ हम अभी तक नहीं जानते हैं कि उसके मातापि...,11,16
85038,ted,no keyboard,START_ कोई कुंजीपटल नहीं _END,2,5
58018,ted,but as far as being a performer,START_ लेकिन एक कलाकार होने के साथ _END,7,8
74470,ted,and this particular balloon,START_ और यह खास गुब्बारा _END,4,6
122330,ted,and its not as hard as you think integrate cli...,START_ और जितना आपको लगता है यह उतना कठिन नहीं...,16,20
79517,ted,and saw the demo by jeff han,START_ और जेफ़ हान द्वारा प्रदर्शन देखा होगा _END,7,9
23089,ted,this baby is fully electric,START_ यह बच्चा पूरी तरह से बिजली से चलता है _END,5,11
116699,ted,kids have no or very little say in making the ...,START_ फिर भी बच्चों को नियम बनाने का बिलकुल न...,11,17
64132,ted,im going to add a little bit to my description...,START_ मै अपने द्वारा दिए गए उम्र के बढ़्ने के...,12,16
74513,ted,expands and cools until it gets to the point w...,START_ फैलने लगता है फिर ये ठंडा होकर उस अवस्थ...,13,21


In [24]:
data[data['length_eng_sentence']>30].shape

(0, 5)

In [25]:
data[data['length_eng_sentence']<=20].shape
data[data['length_hin_sentence']<=20].shape

(24774, 5)

In [26]:
data.shape

(25000, 5)

In [27]:
print("maximum length of Hindi Sentence ",max(data['length_hin_sentence']))
print("maximum length of English Sentence ",max(data['length_eng_sentence']))

maximum length of Hindi Sentence  30
maximum length of English Sentence  20


In [28]:
max_length_src=max(data['length_hin_sentence'])
max_length_tar=max(data['length_eng_sentence'])

In [29]:
input_words = sorted(list(all_eng_words))
target_words = sorted(list(all_hindi_words))
num_encoder_tokens = len(all_eng_words)
num_decoder_tokens = len(all_hindi_words)
num_encoder_tokens, num_decoder_tokens

(14030, 17540)

In [30]:
num_decoder_tokens += 1 # For zero padding

In [31]:
input_token_index = dict([(word, i+1) for i, word in enumerate(input_words)])
target_token_index = dict([(word, i+1) for i, word in enumerate(target_words)])

In [32]:
reverse_input_char_index = dict((i, word) for word, i in input_token_index.items())
reverse_target_char_index = dict((i, word) for word, i in target_token_index.items())

In [33]:
from sklearn.utils import shuffle
data= shuffle(data)
data.head(10)

Unnamed: 0,source,english_sentence,hindi_sentence,length_eng_sentence,length_hin_sentence
77943,ted,just to get the permits,START_ केवल परमिट लेने में ही _END,5,7
391,ted,chris anderson so pranav,START_ क्रिस एंडर्सन तो प्रणव _END,4,6
105815,ted,i can go here and i can split subsaharan afric...,START_ मैं यहाँ आ सकता हूँ और मैं उपसहारन अफ्र...,13,18
18292,ted,we think that the race is on to do something d...,START_ हम कुछ अलग करने की दौड़ पर हैं _END,11,10
397,ted,and they said “no problem we probably hit some...,START_ और उन्होंने कहा “कोई दिक्कत नहीं है। शा...,10,15
5973,ted,its a uniquely human achievement,START_ यह एक एक विशिष्ट मानव उपलब्धि है _END,5,9
73281,ted,notice that we dont do this in science,START_ गौर कीजिए कि हम विज्ञान में ऐसा नहीं कर...,8,11
81515,ted,and a woman who worked there as a messenger ca...,START_ और वहां मैसेंजर का काम करनेवाली एक महिल...,15,16
55008,ted,when i do my work,START_ जब मैं अपना काम करता हूँ _END,5,8
2820,ted,now i was a little scared,START_ अब मुझे थोड़ा डर लग रहा था _END,6,9


# Model Training

In [34]:
from sklearn.model_selection import train_test_split
X, y = data['english_sentence'], data['hindi_sentence']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,random_state=42)
X_train.shape, X_test.shape

((20000,), (5000,))

In [35]:
X_train.to_pickle('X_train.pkl')
X_test.to_pickle('X_test.pkl')

In [36]:
def generate_batch(X = X_train, y = y_train, batch_size = 128):
    ''' Generate a batch of data '''
    while True:
        for j in range(0, len(X), batch_size):
            encoder_input_data = np.zeros((batch_size, max_length_src),dtype='float32')
            decoder_input_data = np.zeros((batch_size, max_length_tar),dtype='float32')
            decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens),dtype='float32')
            for i, (input_text, target_text) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
                for t, word in enumerate(input_text.split()):
                    encoder_input_data[i, t] = input_token_index[word] # encoder input seq
                for t, word in enumerate(target_text.split()):
                    if t<len(target_text.split())-1:
                        decoder_input_data[i, t] = target_token_index[word] # decoder input seq
                    if t>0:
                        # decoder target sequence (one hot encoded)
                        # does not include the START_ token
                        # Offset by one timestep
                        decoder_target_data[i, t - 1, target_token_index[word]] = 1.
            yield([encoder_input_data, decoder_input_data], decoder_target_data)

### Encoder-Decoder Architecture

In [37]:
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model

2023-12-01 19:36:53.410013: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-01 19:36:54.296101: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-01 19:36:54.299939: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [38]:
latent_dim=300

# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

In [39]:
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

In [40]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

In [41]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_1 (InputLayer)        [(None, None)]               0         []                            
                                                                                                  
 input_2 (InputLayer)        [(None, None)]               0         []                            
                                                                                                  
 embedding (Embedding)       (None, None, 300)            4209000   ['input_1[0][0]']             
                                                                                                  
 embedding_1 (Embedding)     (None, None, 300)            5262300   ['input_2[0][0]']             
                                                                                              

In [42]:
train_samples = len(X_train)
val_samples = len(X_test)
batch_size = 128
epochs = 100

In [43]:
model.fit_generator(generator = generate_batch(X_train, y_train, batch_size = batch_size),
                    steps_per_epoch = train_samples//batch_size,
                    epochs=epochs,
                    validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
                    validation_steps = val_samples//batch_size)



2023-12-01 19:37:15.579597: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 179619840 exceeds 10% of free system memory.


Epoch 1/100


2023-12-01 19:37:38.125297: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 179619840 exceeds 10% of free system memory.
2023-12-01 19:37:38.462574: W tensorflow/core/framework/op_kernel.cc:1816] UNKNOWN: IndexError: index 20 is out of bounds for axis 1 with size 20
Traceback (most recent call last):

  File "/home/blackheart/.local/lib/python3.10/site-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/home/blackheart/.local/lib/python3.10/site-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/home/blackheart/.local/lib/python3.10/site-packages/tensorflow/python/data/ops/from_generator_op.py", line 198, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/home/blackheart/.local/lib/python3.10/site-packages/keras/src/engine/data_adapter.py", line 917, in wrapped_generator
    for data in generator_fn():

  File "/tmp

  1/156 [..............................] - ETA: 1:12:41 - loss: 9.7723

UnknownError: Graph execution error:

IndexError: index 20 is out of bounds for axis 1 with size 20
Traceback (most recent call last):

  File "/home/blackheart/.local/lib/python3.10/site-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/home/blackheart/.local/lib/python3.10/site-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/home/blackheart/.local/lib/python3.10/site-packages/tensorflow/python/data/ops/from_generator_op.py", line 198, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/home/blackheart/.local/lib/python3.10/site-packages/keras/src/engine/data_adapter.py", line 917, in wrapped_generator
    for data in generator_fn():

  File "/tmp/ipykernel_48230/1785600609.py", line 13, in generate_batch
    decoder_input_data[i, t] = target_token_index[word] # decoder input seq

IndexError: index 20 is out of bounds for axis 1 with size 20


	 [[{{node PyFunc}}]]
	 [[IteratorGetNext]] [Op:__inference_train_function_14019]

In [None]:
model.save_weights('nmt_weights.h5')

In [None]:
# Encode the input sequence to get the "thought vectors"
encoder_model = Model(encoder_inputs, encoder_states)

# Decoder setup
# Below tensors will hold the states of the previous time step
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

dec_emb2= dec_emb_layer(decoder_inputs) # Get the embeddings of the decoder sequence

# To predict the next word in the sequence, set the initial states to the states from the previous time step
decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2) # A dense softmax layer to generate prob dist. over the target vocabulary

# Final decoder model
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs2] + decoder_states2)


In [None]:
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1,1))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0] = target_token_index['START_']

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += ' '+sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '_END' or
           len(decoded_sentence) > 50):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1,1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

In [None]:
train_gen = generate_batch(X_train, y_train, batch_size = 1)
k=-1


In [None]:
k+=1
(input_seq, actual_output), _ = next(train_gen)
decoded_sentence = decode_sequence(input_seq)
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Hindi Translation:', y_train[k:k+1].values[0][6:-4])
print('Predicted Hindi Translation:', decoded_sentence[:-4])

In [None]:
k+=1
(input_seq, actual_output), _ = next(train_gen)
decoded_sentence = decode_sequence(input_seq)
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Hindi Translation:', y_train[k:k+1].values[0][6:-4])
print('Predicted Hindi Translation:', decoded_sentence[:-4])

In [None]:
k+=1
(input_seq, actual_output), _ = next(train_gen)
decoded_sentence = decode_sequence(input_seq)
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Hindi Translation:', y_train[k:k+1].values[0][6:-4])
print('Predicted Hindi Translation:', decoded_sentence[:-4])

In [None]:
k+=1
(input_seq, actual_output), _ = next(train_gen)
decoded_sentence = decode_sequence(input_seq)
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Hindi Translation:', y_train[k:k+1].values[0][6:-4])
print('Predicted Hindi Translation:', decoded_sentence[:-4])

In [None]:
k+=1
(input_seq, actual_output), _ = next(train_gen)
decoded_sentence = decode_sequence(input_seq)
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Hindi Translation:', y_train[k:k+1].values[0][6:-4])
print('Predicted Hindi Translation:', decoded_sentence[:-4])