# Homework Assignment #6: LSTM and Recurrent Neural Networks (RNNs)

# Part 1: Recurrent Neural Networks

**Implementing an RNN for Sentiment Analysis**

In this assignment, you will implement a simple Recurrent Neural Network (RNN) for sentiment analysis using the IMDB movie reviews dataset. You will build an RNN model using TensorFlow/Keras to classify movie reviews as positive or negative.


Fill in the code if indicated with the comment "PUT YOUR CODE HERE" and follow all the steps in the document.

In this section, please run the provided Python code, add the code needed to complete the tasks described below, and use the results to answer the questions in the HW assignment. Change your Runtime to TPU in order to speed up processing.



**Task 1: Data Preparation**

Load the IMDB movie reviews dataset.
Preprocess the text data: tokenization, padding sequences.

(Run this code)


In [None]:
import numpy as np
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load IMDB dataset
num_words = 10000
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_words)

# Preprocess data
max_len = 200
x_train = pad_sequences(x_train, maxlen=max_len)
x_test = pad_sequences(x_test, maxlen=max_len)


**Task 2: Build the RNN Model**

**Coding Exercise**: Put in your code where the comment "PUT YOUR CODE HERE" is.

Build a sequential model with an Embedding layer, an RNN layer, and a Dense layer.
Compile the model with appropriate loss and optimizer.

- Define model
- Create a sequential model, which is a linear stack of layers.
- Add and Embedding layer to the model. This layer converts input sequences of integers (each representing a word index) into dense vectors of fixed size. Use num_words to specify the size of the vocabulary, 32 is the dimensionality of the embedding space, and set the maximum input sequence length to input_length=max_len.
- Add a simple RNN layer to the model. This layer implements the basic RNN cell. 32 specifies the number of units (hidden states) in the RNN layer.
- Add a Dense layer with a single unit and a sigmoid activation function. This layer is the output layer of the model.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense


# PUT YOUR CODE HERE vvvvv




# PUT YOUR CODE HERE ^^^^^

#---------------------------------------------
# Compile model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

**Task 3: Train the Model**

Train the model on the training data.
Evaluate the model on the test data.

Run this code once you have completed Task 2 above.

In [None]:
# Train model
history = model.fit(x_train, y_train, epochs=5, batch_size=128, validation_split=0.2)

# Evaluate model
loss, accuracy = model.evaluate(x_test, y_test)
print("Test Accuracy:", accuracy)

**Task 4: Fill in the Code (Coding Exercise)**

Fill in the code marked with # PUT YOUR CODE HERE to make predictions on new reviews.

- Create a new variable called "prediction".
- Set that variable equal to the model.predict function, and run that function on review_pad.
- Write an if/else statement that creates the following logic
  - If prediction is greater than 0.5 then return "Positive"
  - Otherwise return "Negative"

In [None]:
# Define tokenizer
from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer(num_words=num_words)

# Convert sequences back to text
texts_train = tokenizer.sequences_to_texts(x_train)

# Fit tokenizer on texts
tokenizer.fit_on_texts(texts_train)

def predict_sentiment(review):
    # Tokenize and pad the input sequence
    review_seq = tokenizer.texts_to_sequences([review])
    review_pad = pad_sequences(review_seq, maxlen=max_len)

# PUT YOUR CODE HERE vvvvv





# PUT YOUR CODE HERE ^^^^^

#*********************

# Example usage
review = "This movie was fantastic! I loved every moment of it."
#review = "This movie was terrible! I was bored out of my mind."
#review = "Two thumbs down, unoriginal and pedantic! Numbingly predictable."
#review = "Absolutely spellbinding and thrilling! Oppenheimer is a movie for the generations!"
print(predict_sentiment(review))

# Part 2: LSTM Implementation
Fill in the code marked with # PUT YOUR CODE HERE.

- Add an LSTM layer with 32 units.
- Add a dense layer with 256 units. Use the "relu" activation function for this dense layer.


In [None]:
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, Flatten, Dropout
from keras.layers import Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences
import matplotlib.pyplot as plt

num_words = 2000
(X_train, y_train), (X_test, y_test) = imdb.load_data(path="imdb.npz",
                                                      num_words=num_words,
                                                      skip_top=0,
                                                      maxlen=None,
                                                      seed=113,
                                                      start_char=1,
                                                      oov_char=2,
                                                      index_from=3)
max_review_length = 250
X_train = pad_sequences(X_train, maxlen=max_review_length)
X_test = pad_sequences(X_test, maxlen=max_review_length)

embedding_vector_length = 32
model = Sequential()
model.add(Embedding(input_dim=num_words, output_dim=embedding_vector_length, input_length=max_review_length))
model.add(Dropout(0.2))

# PUT YOUR CODE HERE vvvvv




# PUT YOUR CODE HERE ^^^^^

model.add(Dropout(0.2))
model.add(Dense(units=1, activation='sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

train_history = model.fit(X_train, y_train, batch_size=32,
                          epochs=10, verbose=2,
                          validation_split=0.2)
scores = model.evaluate(X_test, y_test, verbose=1) #scores[1] is the accuracy
scores[1]