<a href="https://colab.research.google.com/github/WMinerva292/WMinerva292/blob/main/AIModule8Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Artificial Intelligence**

# **Module 8: Recurrent Neural Networks - Assignment**

### **Problem Statement:**

Build a sentiment analysis model using Recurrent Neural Networks (RNNs) to
classify movie reviews from the IMDB dataset into positive or negative
sentiments.

### **Dataset:**

The dataset comprises 25,000 movie reviews from IMDB, labeled by sentiment
(positive/negative). Reviews have been preprocessed, and each review is
encoded as a sequence of word indices (integers). The words in the dataset are
indexed by overall frequency in the dataset, allowing for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words".

### **Tasks to be Performed:**

**Data Preprocessing:**

● Load the IMDB dataset, keeping only the top 10,000 most frequently
occurring words.

● Pad the sequences so that they all have the same length.

**Model Building:**

● Create a Sequential RNN model using TensorFlow and Keras.

● The model should consist of an Embedding layer, a SimpleRNN layer, and
a Dense output layer.

● Compile the model, specifying the appropriate optimizer, loss function, and
metrics.

**Training:**

● Train the model on the preprocessed movie reviews, using a batch size of
128 and validating on 20% of the training data.

● Run the training for 10 epochs.

**Evaluation:**

Evaluate the model on the test set and report the accuracy.

**Expected Outcome:**

A trained RNN model that can classify movie reviews into positive or negative
sentiments, with an accuracy metric provided at the end of the training process.


### **Step 1: Import Libraries**

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense

### **Step 2: Data Preprocessing**

In [2]:
# Load the IMDB dataset
max_features = 10000  # Only consider the top 10,000 words
maxlen = 500  # Pad sequences to a maximum length of 500
batch_size = 128


In [3]:
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=max_features)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [4]:
# Pad sequences so they all have the same length
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

### **Step 3: Model Building**

In [5]:
# Build the RNN model
model = Sequential([
    Embedding(max_features, 128, input_length=maxlen),
    SimpleRNN(64, activation='relu'),
    Dense(1, activation='sigmoid')
])



In [6]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

### **Step 4: Model Training**

In [7]:
# Train the model
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=batch_size,
                    validation_split=0.2)

Epoch 1/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m57s[0m 348ms/step - accuracy: 0.5540 - loss: 0.6851 - val_accuracy: 0.6742 - val_loss: 0.5965
Epoch 2/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m85s[0m 367ms/step - accuracy: 0.7423 - loss: 0.5345 - val_accuracy: 0.6914 - val_loss: 0.5696
Epoch 3/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 359ms/step - accuracy: 0.8223 - loss: 0.4069 - val_accuracy: 0.7770 - val_loss: 0.4755
Epoch 4/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 359ms/step - accuracy: 0.8055 - loss: 0.4101 - val_accuracy: 0.6782 - val_loss: 0.6204
Epoch 5/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 353ms/step - accuracy: 0.8433 - loss: 0.3557 - val_accuracy: 0.7554 - val_loss: 0.5518
Epoch 6/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m57s[0m 364ms/step - accuracy: 0.9219 - loss: 0.2105 - val_accuracy: 0.7668 - val_loss: 0.5351
Epoch 7/10

### **Step 5: Model Evaluation**

In [8]:
# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 38ms/step - accuracy: 0.7913 - loss: 0.7523
Test Accuracy: 0.7959
