Binary classification using Deep Neural Networks Example: Classify movie reviews into 
positive" reviews and "negative" reviews, just based on the text content of the reviews. 
Use IMDB dataset

In [53]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dropout, Dense
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam


In [54]:
df = pd.read_csv('IMDB Dataset.csv')

In [55]:
# Map sentiment labels to integers
df['sentiment'] = df['sentiment'].map({'negative': 0, 'positive': 1})

# Extract data and labels
texts = df['review'].astype(str).values
labels = df['sentiment'].values

In [57]:
# Split into train and test sets
x_train_texts, x_test_texts, y_train, y_test = train_test_split(
    texts, labels, test_size=0.2, random_state=42)

In [58]:
# Tokenization
vocab_size = 20000
max_length = 300
tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>")
tokenizer.fit_on_texts(x_train_texts)

In [59]:
x_train = tokenizer.texts_to_sequences(x_train_texts)
x_test = tokenizer.texts_to_sequences(x_test_texts)

In [60]:
# Padding
x_train = pad_sequences(x_train, maxlen=max_length, padding='post')
x_test = pad_sequences(x_test, maxlen=max_length, padding='post')


In [61]:
# Build the model
model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=256, input_length=max_length),
    Conv1D(128, 5, activation='relu'),
    GlobalMaxPooling1D(),
    Dropout(0.4),
    Dense(128, activation='relu'),
    Dropout(0.4),
    Dense(1, activation='sigmoid')
])




In [62]:
# Compile
optimizer = Adam(learning_rate=0.001)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

In [63]:
# Callback
early_stop = EarlyStopping(monitor='val_accuracy', patience=2, restore_best_weights=True)


In [64]:
# Train
print("Training model...")
history = model.fit(
    x_train, y_train,
    epochs=6,
    batch_size=512,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=2
)

Training model...
Epoch 1/6
63/63 - 82s - 1s/step - accuracy: 0.6332 - loss: 0.6275 - val_accuracy: 0.8071 - val_loss: 0.4388
Epoch 2/6
63/63 - 81s - 1s/step - accuracy: 0.8417 - loss: 0.3651 - val_accuracy: 0.8831 - val_loss: 0.2791
Epoch 3/6
63/63 - 81s - 1s/step - accuracy: 0.9154 - loss: 0.2179 - val_accuracy: 0.8949 - val_loss: 0.2590
Epoch 4/6
63/63 - 80s - 1s/step - accuracy: 0.9636 - loss: 0.1121 - val_accuracy: 0.8979 - val_loss: 0.2831
Epoch 5/6
63/63 - 86s - 1s/step - accuracy: 0.9838 - loss: 0.0549 - val_accuracy: 0.8905 - val_loss: 0.3355
Epoch 6/6
63/63 - 87s - 1s/step - accuracy: 0.9925 - loss: 0.0285 - val_accuracy: 0.8901 - val_loss: 0.3862


In [65]:
# Predict
predictions = model.predict(x_test)
predicted_labels = (predictions > 0.5).astype(int).flatten()

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 23ms/step


In [66]:
# Accuracy
test_accuracy = accuracy_score(y_test, predicted_labels)
print(f"\nTest Accuracy: {test_accuracy * 100:.2f}%")


Test Accuracy: 90.14%


In [67]:
# Optional: decode a review
word_index = tokenizer.word_index
reverse_word_index = {v: k for k, v in word_index.items()}

def decode_review(encoded_review):
    return ' '.join([reverse_word_index.get(i, '?') for i in encoded_review if i != 0])

# Show sample predictions
print("\nPredicted Results on Test Set:")
for i in range(10):
    print(f"\nReview {i+1}:")
    print(decode_review(x_test[i]))
    print(f"Predicted: {'Positive' if predicted_labels[i] == 1 else 'Negative'} | Actual: {'Positive' if y_test[i] == 1 else 'Negative'}")


Predicted Results on Test Set:

Review 1:
i really liked this <OOV> due to the look of the arena the curtains and just the look overall was interesting to me for some reason anyways this could have been one of the best <OOV> ever if the wwf didn't have lex <OOV> in the main event against <OOV> now for it's time it was ok to have a huge fat man vs a strong man but i'm glad times have changed it was a terrible main event just like every match <OOV> is in is terrible other matches on the card were razor ramon vs ted dibiase steiner brothers vs heavenly bodies shawn michaels vs curt <OOV> this was the event where shawn named his big monster of a body guard diesel irs vs 1 2 3 kid bret hart first takes on <OOV> then takes on jerry <OOV> and stuff with the <OOV> and <OOV> was always very interesting then <OOV> <OOV> destroyed marty <OOV> undertaker took on giant <OOV> in another terrible match the smoking <OOV> and <OOV> took on bam bam bigelow and the <OOV> and <OOV> defended the world tit