### End to End DL project using Simple RNN

**Problem Statement**

Develop a simple RNN model to predict whether a movie review is positive or negative using IMDB dataset. The model should process the review text, learn sequential patterns using RNN layers, and classify the sentiment effectively.

In [5]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, SimpleRNN
from tensorflow.keras.datasets import imdb

Preprocessing steps
- Step1: Set vocabulary size (max_features)
- Step2: Loading IMDB dataset.
- Step3: Preprocess data by adding padding sequence, to make sentence of equal size

In [6]:
# load the imdb dataset
max_features=10000  # vocabulary size
(X_train,y_train),(X_test,y_test)=imdb.load_data(num_words=max_features)

print(f'training data shape: {X_train.shape}, training labels sahape: {y_train.shape}')
print(f'testing data shape: {X_test.shape}, testing labels sahape: {y_test.shape}')

training data shape: (25000,), training labels sahape: (25000,)
testing data shape: (25000,), testing labels sahape: (25000,)


Understanding the dataset-------------------------------

In [7]:
## Inspect a sample review and its label
sample_review=X_train[0]
sample_label=y_train[0]

print(f"Sample review (as integers):{sample_review}")
print(f'Sample label: {sample_label}')

Sample review (as integers):[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
Sample label: 1


Now for better understanding of dataset we are going to map index back to actual words to see what exactly is the sentence in sample review is 1 label which is positive review is true for the sentence.
- Step1: Getting All word index dictionary from the dataset.
- Step2: Reversing key: value pair to make index as key.
- Step3: Decoding sample review.

In [None]:
word_index=imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}
reverse_word_index

In [9]:
#In code if the evaluated index is not found the ? is used
decoded_review = ' '.join([reverse_word_index.get(i - 3, '?') for i in sample_review])
decoded_review

"? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you th

end of understanding the dataset --------------------------

In [None]:
# making all the sentences of same length, by default "pre" padding gets applied
max_len=500
X_train=pad_sequences(X_train,maxlen=max_len)
X_test=pad_sequences(X_test,maxlen=max_len)
X_train[0]

### Train RNN model
- Step1: Initialize the model.
- Step2: Add Embedding layer.
- Step3: Add Dense layer, for output. you can more than one dense layer but they won't have feedback loop.
- Step4: Compile model.

now model is Initialized

- set Callback parameters like Earlystopping
- Now train model
- Save Model in .h5 file

In [14]:
## Train Simple RNN
model=Sequential()
model.add(Embedding(max_features,128)) ## Embedding Layers  dimension = 128
model.build(input_shape=(None, max_len)) #input length: represent max length of sentence.
model.add(SimpleRNN(128,activation='relu'))
model.add(Dense(1,activation="sigmoid")) # output layer for binary classification
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

In [15]:
model.summary()

In [16]:
# Setting up callbacks
from tensorflow.keras.callbacks import EarlyStopping
early_stopping_callback = EarlyStopping(monitor='val_loss',patience=5, restore_best_weights= True)
early_stopping_callback

<keras.src.callbacks.early_stopping.EarlyStopping at 0x2628d0d51c0>

In [17]:
# Training the model
history = model.fit(
    X_train,y_train,
    epochs=10,batch_size=32,
    validation_split=0.2,
    callbacks=[early_stopping_callback] 
)

Epoch 1/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 89ms/step - accuracy: 0.5823 - loss: 1252735.8750 - val_accuracy: 0.6730 - val_loss: 0.6016
Epoch 2/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 101ms/step - accuracy: 0.5987 - loss: 558879.0000 - val_accuracy: 0.5938 - val_loss: 0.9714
Epoch 3/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 99ms/step - accuracy: 0.6459 - loss: 0.9158 - val_accuracy: 0.6106 - val_loss: 0.8749
Epoch 4/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 103ms/step - accuracy: 0.6851 - loss: 0.7959 - val_accuracy: 0.6280 - val_loss: 0.8287
Epoch 5/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 101ms/step - accuracy: 0.7232 - loss: 0.7344 - val_accuracy: 0.6396 - val_loss: 0.7961
Epoch 6/10
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 104ms/step - accuracy: 0.7489 - loss: 0.6757 - val_accuracy: 0.6936 - val_loss: 0.5805
E

In [18]:
# save model
model.save('Simple_RNN_movie_review_prediction.h5')

