This code performs sentiment analysis on movie reviews from the IMDb dataset using a LSTM (Long Short-Term Memory) model. It predicts whether a review is positive or negative based on textual data.

    Keras Modules:

    Sequential: For building the model layer by layer.
    Dense: Fully connected layer for binary classification.
    LSTM: Recurrent layer for sequence processing.
    Embedding: Embedding layer to represent words as dense vectors.
    imdb: Provides the IMDb dataset for sentiment analysis.
    pad_sequences: Ensures all input sequences are of the same length.

In [1]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import tokenizer_from_json
import json

2024-12-12 12:16:16.097636: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-12-12 12:16:16.106593: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-12 12:16:16.117964: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-12 12:16:16.121205: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-12 12:16:16.130465: I tensorflow/core/platform/cpu_feature_guar

Setteng parameters
max_features: Limits the vocabulary to the top 5000 most frequent words.
max_len: Maximum number of words in each review (longer reviews are truncated, shorter ones are padded).

In [2]:
max_features = 5000  # Vocabulary size
max_len = 100  # Max sequence length

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
X_train = pad_sequences(X_train, maxlen=max_len)
X_test = pad_sequences(X_test, maxlen=max_len)

X_train, X_test: Sequences of word indices representing reviews.
y_train, y_test: Sentiment labels (0 for negative, 1 for positive).
Pads or truncates sequences to ensure uniform length (max_len).

word_index: Maps words to their indices.
reverse_word_index: Inverts the mapping (indices → words).

In [3]:
word_index = imdb.get_word_index()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[1m1641221/1641221[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [5]:
reverse_word_index = {value: key for key, value in word_index.items()}

Converts sequences of word indices back into human-readable text.
Adjusts for special tokens (offset by -3).

In [6]:
def decode_review(sequence):
    """Convert a sequence of integers into a human-readable review."""
    return ' '.join([reverse_word_index.get(i - 3, '?') for i in sequence if i > 2])  # Offset for special tokens


Embedding Layer:

    Maps each word index to a dense vector of size 128.
    Handles a vocabulary size of max_features.

LSTM Layer:

    64 units for processing sequential input.
    Dropout (20%) and recurrent dropout (20%) prevent overfitting.

Dense Layer:

    Single neuron with sigmoid activation for binary classification.

In [7]:
model = Sequential([
    Embedding(max_features, 128, input_length=max_len),
    LSTM(64, dropout=0.2, recurrent_dropout=0.2),
    Dense(1, activation='sigmoid')
])



Optimizer: adam for adaptive learning.
Loss Function: Binary cross-entropy for binary classification.
Metrics: Accuracy to measure performance.

In [8]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


Evaluate: Tests the model on unseen data (X_test, y_test).
Outputs: Test accuracy.

In [9]:
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy}")


[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 11ms/step - accuracy: 0.4917 - loss: 0.6933
Test Accuracy: 0.495959997177124


Generates probabilities (0 to 1) for each review's sentiment:

        0.5 → Positive sentiment.

    ≤0.5 → Negative sentiment.

In [10]:
predictions = model.predict(X_test)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 11ms/step


Loop: Processes the first 5 reviews in the test set.
decode_review: Converts the numerical sequence into text.
Compare: Shows predicted vs. actual sentiment.

In [11]:
print("\nSample Reviews with Predictions:")
for i in range(5):
    decoded_review = decode_review(X_test[i])  # Decode the review
    predicted_sentiment = "Positive" if predictions[i] > 0.5 else "Negative"
    actual_sentiment = "Positive" if y_test[i] == 1 else "Negative"
    print(f"\nReview {i + 1}: {decoded_review}")
    print(f"Predicted Sentiment: {predicted_sentiment}")
    print(f"Actual Sentiment: {actual_sentiment}")



Sample Reviews with Predictions:

Review 1: please give this one a miss br br and the rest of the cast terrible performances the show is flat flat flat br br i don't know how michael could have allowed this one on his he almost seemed to know this wasn't going to work out and his performance was quite so all you fans give this a miss
Predicted Sentiment: Negative
Actual Sentiment: Negative

Review 2: a powerful study of sexual and desperation be patient up the atmosphere and pay attention to the wonderfully written script br br i praise robert this is one of his many films that deals with fascinating subject matter this film is disturbing but it's sincere and it's sure to a strong emotional response from the viewer if you want to see an unusual film some might even say bizarre this is worth the time br br unfortunately it's very difficult to find in video you may have to buy it off the internet
Predicted Sentiment: Positive
Actual Sentiment: Positive

Review 3: the of that civil war i