This project is "Sentiment Analysis on Movie Reviews using Deep Learning".

We will use the IMDB Dataset, which contains 50,000 movie reviews labeled as Positive or Negative. To avoid the error you faced in the previous conversation (with the Lambda layer), we will use TensorFlow Hub correctly by adding the pre-trained embedding layer directly to the model.

Cell 1: Install & Import Libraries
We need tensorflow-hub for the pre-trained model and tensorflow-datasets to easily download the IMDB data.

In [1]:
!pip install -q tensorflow-hub tensorflow-datasets

import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

Version:  2.19.0
Eager mode:  True
Hub version:  0.16.1
GPU is NOT AVAILABLE


Cell 2: Download and Split Data
We will load the IMDB reviews dataset. It comes pre-split into 25,000 training and 25,000 testing examples. We will further split the training data to create a validation set (60% training, 40% validation).

In [2]:
# Load the IMDB reviews dataset
# split: 15k for training, 10k for validation, 25k for testing
train_data, validation_data, test_data = tfds.load(
    name="imdb_reviews",
    split=('train[:60%]', 'train[60%:]', 'test'),
    as_supervised=True
)

print("✅ Data downloaded and split successfully.")

# Inspect the first 3 examples
train_examples_batch, train_labels_batch = next(iter(train_data.batch(3)))
print("\n--- Example Review ---")
print(train_examples_batch[0].numpy().decode('utf-8'))
print("\n--- Label (0=Negative, 1=Positive) ---")
print(train_labels_batch[0].numpy())



Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.TGR37P_1.0.0/imdb_reviews-train.tfrecor…

Generating test examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.TGR37P_1.0.0/imdb_reviews-test.tfrecord…

Generating unsupervised examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.TGR37P_1.0.0/imdb_reviews-unsupervised.…

Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.
✅ Data downloaded and split successfully.

--- Example Review ---
This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.

--- Label (0=Negative, 1=Positive) ---
0


Cell 3: Build the Deep Learning Model
Here is where we fix your previous issue. We use a Pre-trained Text Embedding from Google (nnlm-en-dim50) which turns text into numbers (vectors).

Layer 1 (Hub Layer): Takes raw text, converts it to vectors. (No Lambda wrapper needed).

Layer 2 (Dense): 16 neurons with ReLU activation to learn patterns.

Layer 3 (Output): 1 neuron to output a score (Positive/Negative).

In [8]:
# The pre-trained model URL
embedding = "https://tfhub.dev/google/nnlm-en-dim50/2"

# Define a custom Keras Layer to wrap the tfhub.KerasLayer
class HubEmbeddingLayer(tf.keras.layers.Layer):
    def __init__(self, handle, **kwargs):
        super(HubEmbeddingLayer, self).__init__(**kwargs)
        # Instantiate the hub.KerasLayer internally
        self.hub_layer = hub.KerasLayer(handle, input_shape=[], dtype=tf.string, trainable=True)

    def call(self, inputs):
        # Pass inputs directly to the internal hub_layer
        return self.hub_layer(inputs)

    def compute_output_shape(self, input_shape):
        # The output dimension for 'nnlm-en-dim50' is 50
        return (input_shape[0], 50)

# Build the Sequential model using our custom HubEmbeddingLayer
model = tf.keras.Sequential([
    tf.keras.Input(shape=(), dtype=tf.string, name='text_input'), # Explicitly define string input
    HubEmbeddingLayer(embedding), # Our custom layer handling the TF Hub embedding
    tf.keras.layers.Dense(16, activation='relu'), # Hidden layer
    tf.keras.layers.Dense(1)  # Output layer
])

model.summary()

Cell 4: Compile and Train
We use BinaryCrossentropy because this is a Yes/No (Binary) classification task.

In [9]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

print("Training model... (This may take 2-3 minutes)")

# Train for 10 epochs (passes through the dataset)
history = model.fit(
    train_data.shuffle(10000).batch(512),
    epochs=10,
    validation_data=validation_data.batch(512),
    verbose=1
)

Training model... (This may take 2-3 minutes)
Epoch 1/10
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 102ms/step - accuracy: 0.5058 - loss: 0.7321 - val_accuracy: 0.5533 - val_loss: 0.6576
Epoch 2/10
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 55ms/step - accuracy: 0.5738 - loss: 0.6506 - val_accuracy: 0.5892 - val_loss: 0.6294
Epoch 3/10
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 54ms/step - accuracy: 0.6066 - loss: 0.6253 - val_accuracy: 0.6272 - val_loss: 0.6080
Epoch 4/10
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 52ms/step - accuracy: 0.6369 - loss: 0.6037 - val_accuracy: 0.6473 - val_loss: 0.5891
Epoch 5/10
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 55ms/step - accuracy: 0.6660 - loss: 0.5846 - val_accuracy: 0.6627 - val_loss: 0.5740
Epoch 6/10
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 90ms/step - accuracy: 0.6733 - loss: 0.5745 - val_accuracy: 0.6886 - val_

Cell 5: Evaluate Accuracy
Now we check how well the model performs on data it has never seen before (the test set).

In [10]:
results = model.evaluate(test_data.batch(512), verbose=2)

for name, value in zip(model.metrics_names, results):
    print("%s: %.3f" % (name, value))

49/49 - 3s - 54ms/step - accuracy: 0.7044 - loss: 0.5399
loss: 0.540
compile_metrics: 0.704


Cell 6: Test on Your Own Reviews
You can now write your own reviews and see if the AI classifies them correctly. Positive numbers indicate "Positive Sentiment", and negative numbers indicate "Negative Sentiment".

In [12]:
# Create some sample reviews
my_reviews = [
    "The movie was absolutely wonderful and the acting was great!",
    "I wasted two hours of my life, this was terrible.",
    "It was okay, not the best but not the worst.",
    "The cinematography was stunning, but the plot was boring."
]

# Convert the Python list of strings to a TensorFlow constant of strings
input_tensor = tf.constant(my_reviews, dtype=tf.string)

# Get predictions
predictions = model.predict(input_tensor)

# Print results
for review, score in zip(my_reviews, predictions):
    sentiment = "POSITIVE" if score > 0 else "NEGATIVE"
    print(f"Review: {review[:50]}... \nPrediction: {sentiment} (Score: {score[0]:.2f})\n")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 97ms/step
Review: The movie was absolutely wonderful and the acting ... 
Prediction: POSITIVE (Score: 0.95)

Review: I wasted two hours of my life, this was terrible.... 
Prediction: NEGATIVE (Score: -0.15)

Review: It was okay, not the best but not the worst.... 
Prediction: NEGATIVE (Score: -0.32)

Review: The cinematography was stunning, but the plot was ... 
Prediction: NEGATIVE (Score: -0.37)

