# Recurrent Neural Network

**Name - Mitul Srivastava**

**ID - C00313606**

## **LOG** : Introduction to dataset
### **DATASET** : IMDB Sentiment Analysis Dataset
### **DETAIL** : The IMDB dataset contains 50,000 movie reviews labeled as positive (1) or negative (0).
### **AIM** : To train and fine tune Recurrent Neural network to correctly identify sentiment. 

## **LOG** : Importing required packages.

In [2]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

## **LOG** : Importing IMDB dataset.

In [3]:
imdb = keras.datasets.imdb
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 0us/step 


## **LOG**: Padding sequences to ensure uniform input size and Limiting vocabulary to the top 10,000 most common words.

In [4]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

max_length = 250
x_train = pad_sequences(x_train, maxlen=max_length)
x_test = pad_sequences(x_test, maxlen=max_length)

## **LOG:** Defining RNN Model 
## Embedding: Converts words into dense vectors. 
## SimpleRNN: Captures sequential dependencies. 
## Dense: Fully connected layers for classification.

In [6]:
model = keras.Sequential([
    keras.layers.Embedding(10000, 64, input_length=max_length),
    keras.layers.SimpleRNN(64, return_sequences=True),
    keras.layers.SimpleRNN(32),
    keras.layers.Dense(1, activation='sigmoid')
])

## **LOG** : Compiling the model with optimizer as "adam" and using "binary_crossentropy" suitable for binary classification for error and accuracy as the metric.

In [7]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

## **LOG** : Training the model with 5 epochs.

In [8]:
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))


Epoch 1/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m150s[0m 183ms/step - accuracy: 0.5372 - loss: 0.6862 - val_accuracy: 0.6360 - val_loss: 0.6206
Epoch 2/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 181ms/step - accuracy: 0.7361 - loss: 0.5266 - val_accuracy: 0.6622 - val_loss: 0.6041
Epoch 3/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m143s[0m 183ms/step - accuracy: 0.7941 - loss: 0.4502 - val_accuracy: 0.8209 - val_loss: 0.4173
Epoch 4/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m145s[0m 185ms/step - accuracy: 0.8167 - loss: 0.4136 - val_accuracy: 0.7654 - val_loss: 0.4976
Epoch 5/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m143s[0m 183ms/step - accuracy: 0.8584 - loss: 0.3424 - val_accuracy: 0.8050 - val_loss: 0.4724


<keras.src.callbacks.history.History at 0x1fe044694d0>

## **LOG** : Model evaluation

In [9]:
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 43ms/step - accuracy: 0.7987 - loss: 0.4869
Test accuracy: 0.8050


## **LOG:** Enhancing model performance by:
## Replacing SimpleRNN with LSTM for better long-term memory.
## Increasing the number of LSTM units.
## Adding Dropout to reduce overfitting.

In [10]:
model_improved = keras.Sequential([
    keras.layers.Embedding(10000, 64, input_length=max_length),
    keras.layers.LSTM(128, return_sequences=True),
    keras.layers.LSTM(64),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(1, activation='sigmoid')
])

## **LOG** : Compiling the improved model.

In [11]:
model_improved.compile(optimizer='adam',
                        loss='binary_crossentropy',
                        metrics=['accuracy'])

model_improved.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m560s[0m 705ms/step - accuracy: 0.6865 - loss: 0.5794 - val_accuracy: 0.6311 - val_loss: 0.6842
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32023s[0m 41s/step - accuracy: 0.6737 - loss: 0.5630 - val_accuracy: 0.6449 - val_loss: 0.6262
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m534s[0m 683ms/step - accuracy: 0.7829 - loss: 0.4704 - val_accuracy: 0.8441 - val_loss: 0.3680
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m598s[0m 765ms/step - accuracy: 0.8971 - loss: 0.2647 - val_accuracy: 0.8628 - val_loss: 0.3133
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m571s[0m 731ms/step - accuracy: 0.9257 - loss: 0.2021 - val_accuracy: 0.8694 - val_loss: 0.3426
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m598s[0m 765ms/step - accuracy: 0.9442 - loss: 0.1594 - val_accuracy: 0.8686 - val_loss: 0.3290
Epoc

<keras.src.callbacks.history.History at 0x1fe0f603690>

## **LOG** : Evaluating the improved model.
## We observe that the after modification the new models accuracy increased from 0.8050 to 0.8546.

In [12]:
test_loss, test_acc = model_improved.evaluate(x_test, y_test)
print(f"Improved Test accuracy: {test_acc:.4f}")

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m197s[0m 252ms/step - accuracy: 0.8528 - loss: 0.5099
Improved Test accuracy: 0.8546


### **REFERENCES** :
### https://chatgpt.com/
### https://www.perplexity.ai/
### https://github.com/trekhleb/machine-learning-experiments

## **END**