<a href="https://colab.research.google.com/github/NDsasuke/Gradient-decent--simplex-method--Binary-linear-programming/blob/main/Gradient%20Descent/Text_Classification_with_Recurrent_Neural_Networks_(RNN).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Load and Preprocess Data:

* The code loads the IMDB movie review dataset using imdb.load_data(). This dataset is a binary sentiment classification problem.
* The data is split into training and testing sets: (x_train, y_train) contains the training data, and (x_test, y_test) contains the testing data.
* The tf.keras.preprocessing.sequence.pad_sequences() function is used to preprocess the data by padding the sequences to a maximum length (max_len) to ensure uniformity.

In [4]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

In [5]:
# Load the IMDB movie review dataset (binary sentiment classification)
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)


In [6]:

# Preprocess the data
max_len = 200  # Maximum sequence length
x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_len)
x_test = tf.keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_len)

Build the RNN Model:

* The RNN model is constructed using the Sequential() class from Keras.
An embedding layer is added as the first layer, which maps the word indices to dense word embeddings of a specified dimension (embedding_dim).
* An LSTM layer is added, which processes the embedded sequences and captures the context information.
* Finally, a dense layer with a sigmoid activation function is added to produce a single output representing the sentiment prediction.

In [7]:
# Build the RNN model
embedding_dim = 128  # Dimensionality of word embeddings
model = Sequential()
model.add(Embedding(10000, embedding_dim, input_length=max_len))
model.add(LSTM(128))
model.add(Dense(1, activation='sigmoid'))


Compile the Model:

* The model is compiled using model.compile().
* The loss function is set to 'binary_crossentropy' since it is a binary classification problem.
* The optimizer used here is 'adam', which utilizes gradient descent methods for optimization.
* The metric chosen is 'accuracy' to monitor the accuracy of the model during training.


In [8]:
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])


Train the Model:

* The model is trained using model.fit().
* The training data (x_train and y_train) are provided, along with the batch size and number of epochs.
* During training, gradient descent is employed to update the model's * parameters iteratively, minimizing the loss and improving accuracy.
* The validation data (x_test and y_test) are used to evaluate the model's performance on unseen data after each epoch.


In [9]:
# Train the model
batch_size = 64
epochs = 5
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9f8f5980d0>

Evaluate the Model:

* After training, the model's performance is evaluated using model.evaluate().
The testing data (x_test and y_test) are provided to calculate the loss and accuracy of the model on unseen data.
* The calculated loss and accuracy are printed to assess the model's performance.

In [10]:
# Evaluate the model
loss, accuracy = model.evaluate(x_test, y_test)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Test Loss: 0.3920823931694031
Test Accuracy: 0.8479999899864197


Extract Word Index:

* This segment extracts the word index from the IMDB dataset, which maps each word to a unique index.
* The get_word_index() function from the IMDB dataset is used to retrieve the word index.
* The extracted word index is then converted into a dictionary (index_to_word) for easy lookup, where the index is the key and the word is the value.
* This segment also demonstrates how to convert a sample review from the dataset back into its original text form using the word index dictionary.
* The sample review and its corresponding sentiment label are printed for inspection.


In [11]:
word_index = imdb.get_word_index()
index_to_word = {index: word for word, index in word_index.items()}

# Print a sample review and its corresponding sentiment
sample_review = ' '.join(index_to_word[index] for index in x_train[0])
print("Sample Review:", sample_review)
print("Sentiment:", y_train[0])


Sample Review: to have after out atmosphere never more room and it so heart shows to years of every never going and help moments or of every chest visual movie except her was several of enough more with is now current film as you of mine potentially unfortunately of you than him that with out themselves her get for was camp of you movie sometimes movie that with scary but and to story wonderful that in seeing in character to of 70s musicians with heart had shadows they of here that with her serious to have does when from why what have critics they is you that isn't one will very to as itself with other and in of seen over landed for anyone of and br show's to whether from than out themselves history he name half some br of and odd was two most of mean for 1 any an boat she he should is thought frog but of script you not while history he heart to real at barrel but when from one bit then have two of script their with her nobody most that with wasn't to with armed acting watch an for wit

Make Predictions:

* This segment demonstrates how to use the trained RNN model to make predictions on new reviews.
* It starts by providing a list of new reviews (new_reviews) as strings.
Each review is tokenized into a sequence of words using text_to_word_sequence() from Keras.
* The word sequences are then converted into sequences of word indices by mapping each word to its corresponding index from the word index dictionary.
Padding is applied to ensure that all sequences have the same length as the training data.
* The model's predict() function is used to obtain the predicted sentiment probabilities for each review.
* A threshold of 0.5 is used to determine the sentiment label ('Positive' if the probability is above 0.5, 'Negative' otherwise).
* The new reviews, along with their predicted sentiments, are printed for inspection.

In [12]:
# Make predictions on new reviews
new_reviews = ['This movie is fantastic!', 'I did not like the acting in this film.']
sequences = [tf.keras.preprocessing.text.text_to_word_sequence(review) for review in new_reviews]
sequences = [[word_index[word] for word in sequence if word in word_index] for sequence in sequences]
sequences = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_len)

predictions = model.predict(sequences)
sentiments = ['Positive' if pred > 0.5 else 'Negative' for pred in predictions]

for review, sentiment in zip(new_reviews, sentiments):
    print("Review:", review)
    print("Sentiment:", sentiment)
    print()


Review: This movie is fantastic!
Sentiment: Positive

Review: I did not like the acting in this film.
Sentiment: Positive

