Task 39-> Implement with TensorFlow/Keras (RNN)

A simple application using RNNs is predicting the next word in a sentence.
Example:
Collect Data: Use a small set of text, like sentences from a book or articles.
Prepare Data: Convert sentences into sequences of words or characters.
Build Model: Create an RNN that learns to predict the next word based on the previous ones.
Generate Text: Input a few words, and the model predicts the next word, continuing to generate a sequence.
This basic task helps understand how RNNs handle sequential data and make predictions based on learned patterns.

importing necessary libraries and dataset

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
from google.colab import files
uploaded = files.upload()

file_name = next(iter(uploaded))
data = pd.read_csv(file_name, delimiter='\t')#delimiter='\t' is used for reading a text file as tab separated values.
print(data.head())


Saving NextWordPrediction.txt to NextWordPrediction.txt
  inclined to think neither. Women are naturally secretive, and they like
0  to do their own secreting. Why should she hand...                     
1  She could trust her own guardianship, but she ...                     
2  indirect or political influence might be broug...                     
3  business man. Besides, remember that she had r...                     
4  a few days. It must be where she can lay her h...                     


Tokenize the text

In [3]:
data.columns = ['text']#Renaming column to text
text = ' '.join(data['text'])#Extracting text from the DataFrame
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1

Create sequences of words

In [4]:
input_sequences = []
for line in text.split('.'):#Spliting the text into sentences based on '.'
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):#Generate n-grams of varying lengths from the token list
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

Pad sequences and prepare predictors and labels

In [5]:
max_sequence_len = max([len(x) for x in input_sequences])#Finding the length of the longest sequence in input_sequences
#Pad sequences to ensure they all have the same length
input_sequences = np.array(tf.keras.preprocessing.sequence.pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
#Spliting the padded sequences into features'X' and labels'y'
X, y = input_sequences[:,:-1], input_sequences[:,-1]
y = to_categorical(y, num_classes=total_words)

Define the RNN model

In [6]:
model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
model.add(SimpleRNN(150, return_sequences=True))
model.add(SimpleRNN(100))
model.add(Dense(total_words, activation='softmax'))



Compile the model

In [7]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=30, verbose=1)#verbose=1 for providing progress updates during training

Epoch 1/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 61ms/step - accuracy: 0.0188 - loss: 5.9553
Epoch 2/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 89ms/step - accuracy: 0.0598 - loss: 5.5408
Epoch 3/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 84ms/step - accuracy: 0.0737 - loss: 5.3777
Epoch 4/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 63ms/step - accuracy: 0.0810 - loss: 5.3040
Epoch 5/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 61ms/step - accuracy: 0.0450 - loss: 6.0058
Epoch 6/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 62ms/step - accuracy: 0.0804 - loss: 5.3022
Epoch 7/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 59ms/step - accuracy: 0.0872 - loss: 5.3052
Epoch 8/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 102ms/step - accuracy: 0.0792 - loss: 5.2245
Epoch 9/30
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7adb7193d2d0>

Generate text

In [11]:
seed_text = "I will"
next_words = 50

#Loop to generate the specified number of words
for _ in range(next_words):
    token_list = tokenizer.texts_to_sequences([seed_text])[0]
    token_list = tf.keras.preprocessing.sequence.pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
    predicted = np.argmax(model.predict(token_list, verbose=0), axis=-1)
    output_word = ""
    #Finding the word corresponding to the predicted token
    for word, index in tokenizer.word_index.items():
        if index == predicted:
            output_word = word
            break
    seed_text += " " + output_word

print(seed_text)

I will my heart and in the another of who had had stepped with the carriage of the loungers of the sidelights of the carriage and with the corner of the avenue of the loungers of the my man with the corner but i saw the blood running freely down the face
