# Word Embedding Using IMDB Dataset
1. What is the IMDB Dataset?

A text dataset of 50,000 movie reviews

Reviews are labeled as positive or negative

Available directly in Keras

2. What Is Word Embedding?

A method to convert words into dense numerical vectors

Each word is represented by a learned vector (e.g., 100-dimensional)

Unlike one-hot encoding, embedding vectors capture:

Meaning

Context

Similarity between words

 3. Why Use Word Embeddings?

Reduces dimensionality

Captures semantic relationships

Helps neural networks understand text

Performs better than one-hot encoding

4. Workflow: IMDB + Word Embedding

Load the IMDB dataset

Pad sequences (so all reviews have equal length)

Use an Embedding layer to convert word indices → dense vectors

Feed these vectors into a neural network (RNN / LSTM / CNN / Dense)

Train the model for sentiment classification

In [56]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, LSTM

# ---------------------------
# 1. Load dataset
# ---------------------------
num_words = 10000   # Only keep top 10,000 words
maxlen = 200        # Cut/pad reviews to 200 words

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_words)

# Pad sequences so all are of same length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

# ---------------------------
# 2. Build model with Embedding
# ---------------------------
model = Sequential([
    Embedding(input_dim=num_words, output_dim=128, input_length=maxlen),  # Word embeddings
    LSTM(64),                # Recurrent layer to capture sequence info
    Dense(1, activation='sigmoid')   # Binary classification (positive/negative)
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# ---------------------------
# 3. Train model
# ---------------------------
model.fit(x_train, y_train, epochs=3, batch_size=128, validation_split=0.2, verbose=1)

# ---------------------------
# 4. Evaluate model
# ---------------------------
loss, acc = model.evaluate(x_test, y_test, verbose=1)
print(f"Test Accuracy: {acc:.4f}")


Epoch 1/3




[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 377ms/step - accuracy: 0.6861 - loss: 0.5669 - val_accuracy: 0.8606 - val_loss: 0.3418
Epoch 2/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 373ms/step - accuracy: 0.8915 - loss: 0.2748 - val_accuracy: 0.8760 - val_loss: 0.3056
Epoch 3/3
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 404ms/step - accuracy: 0.9374 - loss: 0.1782 - val_accuracy: 0.8626 - val_loss: 0.3810
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 31ms/step - accuracy: 0.8518 - loss: 0.4065
Test Accuracy: 0.8504
