# Sentiment Analysis Using Simple RNN

__Objective:__ Build a sentiment analysis model to classify text as positive or negative using RNN.

__Data:__ Use a publicly available dataset, such as the IMDB movie reviews dataset.

__Technology:__ Python, TensorFlow, and Keras for modeling.

## Importing Libraries and Modules

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, SimpleRNN
from tensorflow.keras.datasets import imdb

## Setting Model Parameters
This code sets parameters for text processing, specifying the maximum number of words to be considered as features (max_features), the maximum length of texts after which they will be cut (maxlen), and the batch size for processing (batch_size).

In [None]:
# Parameters
max_features = 10000  # Number of words to consider as features
maxlen = 500          # Cutting texts after this number of words
batch_size = 32

## Loading Dataset
This code loads the IMDb dataset, with a restriction that only the top max_features most frequent words are included. It retrieves training data (input_train, y_train) and testing data (input_test, y_test) for use in model training and evaluation.

In [None]:
# Loading the data
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=max_features)

## Preprocess Data: Sequence Padding
This code pads the sequences in input_train and input_test to a uniform length specified by maxlen to ensure consistent input size for model training and testing.

In [None]:
# Padding sequences for uniform input size
input_train = sequence.pad_sequences(input_train, maxlen=maxlen)
input_test = sequence.pad_sequences(input_test, maxlen=maxlen)

## Building the RNN Model
This code constructs a recurrent neural network (RNN) model using Keras. It starts with an embedding layer to convert word indices to dense vectors of size 32, adds a simple RNN layer with 32 units, and includes a dense output layer with a sigmoid activation function for binary classification (e.g., positive vs. negative sentiment).

In [None]:
# Building the RNN model
model = Sequential()
model.add(Embedding(max_features, 32))
model.add(SimpleRNN(32))  # 32 units in the RNN layer
model.add(Dense(1, activation='sigmoid'))  # Binary classification (positive/negative)

## Compiling the Model

This code compiles the RNN model, setting 'rmsprop' as the optimizer, using 'binary_crossentropy' as the loss function, and tracking accuracy ('acc') as the metric for evaluation.

In [None]:
# Compiling the model
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

## Displaying Model Summary

In [None]:
# Model summary
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, None, 32)          320000    
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, 32)                2080      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
Total params: 322113 (1.23 MB)
Trainable params: 322113 (1.23 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## Training the Model

This code trains the RNN model on the training data (input_train and y_train), for 10 epochs, with a specified batch size, and uses 20% of the data as a validation set to evaluate the model's performance during training.

In [None]:
# Training the model
history = model.fit(input_train, y_train,
                    epochs=10,
                    batch_size=batch_size,
                    validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Evaluating the Model
This code evaluates the trained RNN model on the test dataset (input_test and y_test), computing the test loss and accuracy. It then prints the test accuracy.

In [None]:
# Evaluating the model
test_loss, test_acc = model.evaluate(input_test, y_test)
print(f'Test accuracy: {test_acc}')

Test accuracy: 0.7914400100708008


## Text Encoding
This code begins by importing necessary components from the TensorFlow Keras library for handling the IMDb dataset and sequence padding. It then loads the IMDb word index dictionary which maps words to integers.

The function encode_text is defined to encode a given text into an integer sequence. It processes the text into tokens, converts each token to its corresponding integer using the word_index (defaulting to 0 for unknown words), and then pads this sequence to ensure that all input sequences have the same length (maxlen), which is crucial for consistent input size in model processing.

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Loading the word index dictionary
word_index = imdb.get_word_index()

def encode_text(text):
    # Encoding text to integer sequence using the same word index
    tokens = tf.keras.preprocessing.text.text_to_word_sequence(text)
    tokens = [word_index[word] if word in word_index else 0 for word in tokens]
    return sequence.pad_sequences([tokens], maxlen=maxlen)  # Using the same maxlen as during training

## Defining Sentiment Prediction Function
This function predict_sentiment takes a text input, encodes and pads it using the encode_text function to be compatible with the trained RNN model, then predicts the sentiment using the model. The output is interpreted as "Positive" if the predicted sigmoid probability is greater than 0.5, otherwise it is "Negative".

In [None]:
def predict_sentiment(text):
    # Encoding and padding the text to create a compatible input
    encoded_text = encode_text(text)

    # Prediction
    prediction = model.predict(encoded_text)[0][0]  # Output is a sigmoid probability

    # Interpreting the result
    if prediction > 0.5:
        return "Positive"
    else:
        return "Negative"

## Testing Sentiment Predictions on Sample Texts
This code sets up a list of sample text reviews about movies and iterates over them. For each review, it prints the review text and uses the predict_sentiment function to predict and display the sentiment as either "Positive" or "Negative".

In [None]:
# Samples to test
sample_texts = [
    "This was a good movie.",
    "This movie was an excellent portrayal of a very important story."
]

for text in sample_texts:
    print(f'Review: "{text}" - Sentiment: {predict_sentiment(text)}')

Review: "This was a good movie." - Sentiment: Positive
Review: "This movie was an excellent portrayal of a very important story." - Sentiment: Positive
