# Sentiment Analysis with LSTM on IMDB Dataset
This notebook demonstrates how to build and train an LSTM-based neural network to classify movie reviews from the IMDB dataset into positive or negative sentiment.

## Steps:
1. Load and preprocess the IMDB dataset.
2. Build an LSTM-based deep learning model.
3. Train and evaluate the model.
4. Save the model and configurations for future inference.


## 1-Import Libraries

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
import pickle

## 2-Define Hyperparameters

In [None]:
MAX_FEATURES = 20000  # Vocabulary size
MAX_LEN = 200         # Maximum review length
EMBEDDING_DIM = 128
BATCH_SIZE = 64
EPOCHS = 10

## 3-Load and Explore the Dataset

In [None]:
print("Loading IMDB dataset...")
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=MAX_FEATURES)

print(f"Training samples: {len(x_train)}")
print(f"Testing samples: {len(x_test)}")

## 4-Preprocess Data (Filter & Pad Sequences)

In [None]:
# Fix out-of-bounds indices
print("Filtering out-of-bounds indices...")
x_train = [[idx if idx < MAX_FEATURES else 0 for idx in seq] for seq in x_train]
x_test = [[idx if idx < MAX_FEATURES else 0 for idx in seq] for seq in x_test]

# Pad sequences
print("Padding sequences...")
x_train = sequence.pad_sequences(x_train, maxlen=MAX_LEN)
x_test = sequence.pad_sequences(x_test, maxlen=MAX_LEN)

print(f"Training data shape: {x_train.shape}")
print(f"Testing data shape: {x_test.shape}")
print(f"Max value in data: {max(x_train.max(), x_test.max())}")
print("Data preprocessing complete!")

## 5-Build the LSTM Model

In [None]:
print("Building LSTM model...")

model = Sequential([
    Embedding(MAX_FEATURES, EMBEDDING_DIM, input_length=MAX_LEN),
    LSTM(128, return_sequences=True),
    LSTM(64),
    Dense(64, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model.summary()

## 6-Train the Model

In [None]:
print("\nGPU ready:", len(tf.config.list_physical_devices('GPU')) > 0)
print("Training model...")

history = model.fit(
    x_train, y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_split=0.2,
    verbose=1
)

## 7-Evaluate the Model

In [None]:
print("\nEvaluating model...")
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"\nTest Accuracy: {test_accuracy*100:.2f}%")
print(f"Test Loss: {test_loss:.4f}")

## 8-Save Model and Configurations

In [None]:
print("Saving model...")

# Save full model
model.save('sentiment_lstm_model.keras')

# Save weights & architecture separately
model.save_weights('model_weights.h5')
model_json = model.to_json()
with open('model_architecture.json', 'w') as json_file:
    json_file.write(model_json)

# Save word index
word_index = imdb.get_word_index()
with open('word_index.pkl', 'wb') as f:
    pickle.dump(word_index, f)

# Save config
config = {
    'MAX_FEATURES': MAX_FEATURES,
    'MAX_LEN': MAX_LEN,
    'EMBEDDING_DIM': EMBEDDING_DIM
}
with open('config.pkl', 'wb') as f:
    pickle.dump(config, f)

print("All files saved successfully!")

## Conclusion
- We trained an LSTM model on the IMDB dataset with an accuracy around **85–90%** (depending on hyperparameters).
- The model and configurations were saved for future use.
- This notebook can serve as a template for other sequence classification tasks.