# Sentiment Analysis of Movie Reviews

## 1. Introduction
This notebook demonstrates how to perform sentiment analysis on the IMDb movie review dataset. We will build a neural network to classify reviews as either positive or negative. The project covers loading pre-processed data, building a model with an embedding layer, and evaluating its performance.

## 2. Data Loading and Preparation

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

# Load the IMDb dataset
vocab_size = 10000  # Vocabulary size
max_len = 256  # Max length of reviews

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocab_size)

# Pad the sequences
X_train = pad_sequences(X_train, maxlen=max_len, padding='post')
X_test = pad_sequences(X_test, maxlen=max_len, padding='post')

print(f'X_train shape: {X_train.shape}')
print(f'X_test shape: {X_test.shape}')

## 3. Model Building

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GlobalAveragePooling1D, Dense

embedding_dim = 16

model = Sequential([
    Embedding(vocab_size, embedding_dim, input_length=max_len),
    GlobalAveragePooling1D(),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.summary()

## 4. Model Compilation and Training

In [None]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

history = model.fit(
    X_train,
    y_train,
    epochs=30,
    batch_size=512,
    validation_split=0.2
)

## 5. Model Evaluation

In [None]:
import matplotlib.pyplot as plt

# Plot training & validation accuracy values
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')

# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

In [None]:
# Evaluate on the test set
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Test Accuracy: {accuracy:.2f}')

## 6. Conclusion
The neural network with an embedding layer performs well on the IMDb movie review sentiment classification task. The model achieves a high accuracy on the test set, demonstrating its ability to generalize to unseen data. The training history plots show that the model learns effectively over the epochs without significant overfitting.