<a href="https://colab.research.google.com/github/Kshitez-Pratap-Singh/Next-Word-Prediction/blob/main/Prediction_of_the_Next_Word.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **NEXT WORD PREDICTION**

### **Description**  
Identifying the most likely word to follow a given string of words is the basic goal of the Natural Language Processing (NLP) task of “next word prediction.” This predictive skill is essential in various applications, including text auto-completion, speech recognition, and machine translation. Deep learning approaches have transformed NLP by attaining remarkable success in various language-related tasks, such as next-word prediction.

![picture](https://drive.google.com/uc?export=view&id=14jlJ-bdtHoSl0fMI-HCfT2QvNU-hC1Fc)

### **Importing the Essential Libraries**

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.layers import Embedding,LSTM,Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam
import pickle
import os

### **Uploading the File from System**

In [None]:
from google.colab import files
uploaded=files.upload()

Saving Romeo and Juliet.txt to Romeo and Juliet.txt


### **Preprocessing the Uploaded File**

In [None]:
file=open("Romeo and Juliet.txt",'r',encoding='utf8')
lines=[]
for i in file:
  lines.append(i)
data=""
for i in lines:
  data=' '.join(lines)

data=data.replace('\n','').replace('\r','').replace('ufeff','').replace('"','').replace('"','')
data=data.split()
data=' '.join(data)
data[:1000]

'The Project Gutenberg eBook of Romeo and Juliet This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. Title: Romeo and Juliet Author: William Shakespeare Release date: November 1, 1998 [eBook #1513] Most recently updated: June 27, 2023 Language: English *** START OF THE PROJECT GUTENBERG EBOOK ROMEO AND JULIET *** THE TRAGEDY OF ROMEO AND JULIET by William Shakespeare Contents THE PROLOGUE. ACT I Scene I. A public place. Scene II. A Street. Scene III. Room in Capulet’s House. Scene IV. A Street. Scene V. A Hall in Capulet’s House. ACT II CHORUS. Scene I. An open place adjoining Capulet’s Garden. Scene I

### **Tokenization of Word**

In [None]:
tokenizer=Tokenizer()
tokenizer.fit_on_texts([data])
pickle.dump(tokenizer,open('token.pkl','wb'))
sequence_data=tokenizer.texts_to_sequences([data])[0]
sequence_data[:15]

[1, 54, 129, 306, 6, 12, 2, 22, 16, 306, 8, 18, 1, 150, 6]

In [None]:
len(sequence_data)

29285

### **Vocabulary Creation**

In [None]:
vocab_size=len(tokenizer.word_index)+1
print(vocab_size)

4296


In [None]:
sequences=[]
for i in range(3,len(sequence_data)):
  words=sequence_data[i-3:i+1]
  sequences.append(words)
sequence=np.array(sequences)
sequences[:15]

[[1, 54, 129, 306],
 [54, 129, 306, 6],
 [129, 306, 6, 12],
 [306, 6, 12, 2],
 [6, 12, 2, 22],
 [12, 2, 22, 16],
 [2, 22, 16, 306],
 [22, 16, 306, 8],
 [16, 306, 8, 18],
 [306, 8, 18, 1],
 [8, 18, 1, 150],
 [18, 1, 150, 6],
 [1, 150, 6, 653],
 [150, 6, 653, 969],
 [6, 653, 969, 7]]

### **Splitting the Dataset in Dependent and Independent**

In [None]:
x=[]
y=[]
for i in sequences:
  x.append(i[0:3])
  y.append(i[3])

x=np.array(x)
y=np.array(y)

In [None]:
print('Data: ',x)
print('Response: ',y)

Data:  [[   1   54  129]
 [  54  129  306]
 [ 129  306    6]
 ...
 [4295    3  183]
 [   3  183  226]
 [ 183  226  234]]
Response:  [306   6  12 ... 226 234 564]


In [None]:
y=to_categorical(y,num_classes=vocab_size)
y[:10]

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.]], dtype=float32)

### **Creating the Model**

In [None]:
model=Sequential()
model.add(Embedding(vocab_size,10,input_length=3))
model.add(LSTM(1000,return_sequences=True))
model.add(LSTM(1000))
model.add(Dense(1000,activation='relu'))
model.add(Dense(vocab_size,activation='softmax'))

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 3, 10)             42960     
                                                                 
 lstm (LSTM)                 (None, 3, 1000)           4044000   
                                                                 
 lstm_1 (LSTM)               (None, 1000)              8004000   
                                                                 
 dense (Dense)               (None, 1000)              1001000   
                                                                 
 dense_1 (Dense)             (None, 4296)              4300296   
                                                                 
Total params: 17392256 (66.35 MB)
Trainable params: 17392256 (66.35 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint=ModelCheckpoint('next_words.h5',monitor='loss',verbose=1,save_best_only=True)
model.compile(loss='categorical_crossentropy',optimizer=Adam(learning_rate=0.001))
model.fit(x,y,epochs=200,batch_size=64,callbacks=[checkpoint])

Epoch 1/200
Epoch 1: loss improved from inf to 0.17040, saving model to next_words.h5


  saving_api.save_model(


Epoch 2/200
Epoch 2: loss improved from 0.17040 to 0.16060, saving model to next_words.h5
Epoch 3/200
Epoch 3: loss improved from 0.16060 to 0.15155, saving model to next_words.h5
Epoch 4/200
Epoch 4: loss did not improve from 0.15155
Epoch 5/200
Epoch 5: loss did not improve from 0.15155
Epoch 6/200
Epoch 6: loss did not improve from 0.15155
Epoch 7/200
Epoch 7: loss improved from 0.15155 to 0.14897, saving model to next_words.h5
Epoch 8/200
Epoch 8: loss improved from 0.14897 to 0.14340, saving model to next_words.h5
Epoch 9/200
Epoch 9: loss improved from 0.14340 to 0.14183, saving model to next_words.h5
Epoch 10/200
Epoch 10: loss improved from 0.14183 to 0.13812, saving model to next_words.h5
Epoch 11/200
Epoch 11: loss improved from 0.13812 to 0.13529, saving model to next_words.h5
Epoch 12/200
Epoch 12: loss did not improve from 0.13529
Epoch 13/200
Epoch 13: loss did not improve from 0.13529
Epoch 14/200
Epoch 14: loss did not improve from 0.13529
Epoch 15/200
Epoch 15: loss di

<keras.src.callbacks.History at 0x7e60175eea70>

### **Accuracy**

In [None]:
score=model.evaluate(x,y)
print('Accuracy: ',score)

Accuracy:  0.07498759776353836


In [None]:
from tensorflow.keras.models import load_model

model=load_model('next_words.h5')
tokenizer=pickle.load(open('token.pkl','rb'))
def Predict_Next_Word(model,tokenizer,text):
  sequence=tokenizer.texts_to_sequences([text])
  sequence=np.array(sequence)
  preds=np.argmax(model.predict(sequence))
  predicted_word=""

  for key,value in tokenizer.word_index.items():
    if value==preds:
      predicted_word=key
      break

  print(predicted_word)
  return predicted_word

### **Prediction**

In [None]:
while(True):
  text=input('Enter your lines: ')

  if text=='0':
    print('Execution Completed...')
    break

  else:
    try:
      text=text.split(" ")
      text=text[-3:]
      print(text)
      Predict_Next_Word(model,tokenizer,text)

    except Exception as e:
      print('Error occured: ',e)
      continue

Enter your lines: how that thing
['how', 'that', 'thing']
he
Enter your lines: the day was
['the', 'day', 'was']
broke
Enter your lines: and then he 
['then', 'he', '']
is
Enter your lines: 0
Execution Completed...
