# Long Short-Term Memory (LSTM) Basics

## 1. Introduction
- LSTMs are a special type of RNN designed to overcome the **vanishing gradient problem**.
- They use **gates** (input, forget, output) to control the flow of information.
- This allows them to capture **long-term dependencies** in sequential data.

### Applications:
- Text classification (sentiment analysis)
- Machine translation
- Time-series forecasting
- Speech recognition

## 2. LSTM Cell Structure
- **Forget gate**: Decides what information to throw away.
- **Input gate**: Decides what new information to store.
- **Cell state**: Carries long-term memory.
- **Output gate**: Decides the final output.

This architecture helps LSTMs remember important data for longer periods.

## 3. Example: LSTM for Sequence Classification

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Example toy dataset: sequences of integers
X = [
    [1, 2, 3, 4],
    [2, 3, 4, 5],
    [3, 4, 5, 6],
    [4, 5, 6, 7]
]
y = [0, 1, 0, 1]  # Binary labels

X = pad_sequences(X, maxlen=6)
y = np.array(y)

# Define LSTM model
model = Sequential([
    Embedding(input_dim=10, output_dim=8, input_length=6),
    LSTM(16),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

# Train the model
history = model.fit(X, y, epochs=10, verbose=1)

## 4. Key Notes
- LSTMs handle **long-term dependencies** better than vanilla RNNs.
- Can model sequences like text paragraphs or long time-series.
- Heavier than simple RNNs in terms of computation.
- Alternatives: **GRU** (simpler, fewer gates, faster).