# Exercise 10: Word Prediction using RNN


## Objective : Learning the application of RNN in word prediction

## Problem : Consider the sentence : "India is ready to produces over five billion Covid Vaccine doeses next year to help the world in the fight against the pandamic".
## Construct a RNN using the above sentence as training data to predict a word and evaluate its performance

# Step 1: Preprocessing of the data

**Tokenizing**

In [None]:
from keras.preprocessing.text import Tokenizer

In [None]:
t=Tokenizer()

In [None]:
data="India is ready to produce over five billion Covid Vaccine doeses next" + 
"year to help the world in the fight against the pandamic"

In [None]:
t.fit_on_texts([data])

In [None]:
wo_indx=t.word_index
print(wo_indx)

{'the': 1, 'to': 2, 'india': 3, 'is': 4, 'ready': 5, 'produce': 6, 'over': 7, 'five': 8, 'billion': 9, 'covid': 10, 'vaccine': 11, 'doeses': 12, 'next': 13, 'year': 14, 'help': 15, 'world': 16, 'in': 17, 'fight': 18, 'against': 19, 'pandamic': 20}


**Numeric encoding of the data**

In [None]:
encoded_data=t.texts_to_sequences([data])[0] 

**Creating feature and target values for predicting a word when a single word is given**

In [None]:
n=len(encoded_data) # to know the number of items in the sequence
dt_seq=list()
for i  in range(1,n):
  seq=encoded_data[i-1:i+1]
  dt_seq.append(seq)
  #dt_seq

In [None]:
import  numpy as np
seq=np.array(dt_seq)
#seq
X,y=seq[:,0],seq[:,1]

In [None]:
y

array([ 4,  5,  2,  6,  7,  8,  9, 10, 11, 12, 13, 14,  2, 15,  1, 16, 17,
        1, 18, 19,  1, 20])

**Converting y values into categorical**

In [None]:
from keras.utils import np_utils
voc_size=len(t.word_index)+1
y_cat=np_utils.to_categorical(y,num_classes=voc_size)

# Step 2: Construction of RNN

In [None]:
from keras.models import Sequential
from keras.layers import Embedding,Flatten
from keras.layers import Dense
from keras.layers import SimpleRNN
from keras.metrics import TopKCategoricalAccuracy

In [None]:
model=Sequential()
model.add(Embedding(voc_size,10,input_length=1)) # meant for predicting one word given the previous one
#model.add(Flatten())
model.add(SimpleRNN(50))
model.add(Dense(voc_size,'softmax'))
print(model.summary())

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 1, 10)             210       
                                                                 
 simple_rnn_1 (SimpleRNN)    (None, 50)                3050      
                                                                 
 dense (Dense)               (None, 21)                1071      
                                                                 
Total params: 4,331
Trainable params: 4,331
Non-trainable params: 0
_________________________________________________________________
None


# Step 3: Compilation,Fitting and Prediction

In [None]:
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [None]:
model.fit(X,y_cat,epochs=500)

In [None]:
pred=np.argmax(model.predict(X),axis=1)



In [None]:
pred

array([ 4,  5,  2, 15,  7,  8,  9, 10, 11, 12, 13, 14,  2, 15,  1, 20, 17,
        1, 20, 19,  1, 20])

# Step 4: Decoding the predicted values

In [None]:
word=list(wo_indx.keys())  # collect the keys and store it as a list
word[4]

'ready'

In [None]:
pr_words=''
for i in range (1,len(pred)+1):
  pr_words=pr_words+' ' + word[pred[i-1]-1]
pr_words 

' is ready to help over five billion covid vaccine doeses next year to help the pandamic in the pandamic against the pandamic'

# Step 5: Out of Sample Prediction : We shall predict the term which follows 'five'

In [None]:
# To predict the word which follows five we make use of inded namely 8

print('Predicted word is', word  [np.argmax(model.predict (np.array([8]) )) -1] )


Predicted word is billion


# Conclusion: Thus, we constructed RNN model predict a word from the given sentence