# 04. Long Short-Term Memory (LSTM)

## Why LSTM?
Standard RNNs suffer from short-term memory due to vanishing gradients. LSTMs were designed to learn long-term dependencies.

## The Core Idea
The key to LSTMs is the **Cell State** ($C_t$), which runs straight down the entire chain with only minor linear interactions. It's like a conveyor belt.

LSTMs remove or add information to the cell state using **Gates**:
1. **Forget Gate**: Decides what information to throw away from the cell state.
2. **Input Gate**: Decides what new information to store in the cell state.
3. **Output Gate**: Decides what to output based on the cell state.

## Mathematical Equations

Forget Gate: $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$ 

Input Gate: $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$ 

Candidate Cell: $\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$ 

Cell Update: $C_t = f_t * C_{t-1} + i_t * \tilde{C}_t$ 

Output Gate: $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$ 

Hidden State: $h_t = o_t * \tanh(C_t)$

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.models import Sequential

# Let's verify dimensions of an LSTM layer manually

# Input: Batch=1, TimeSteps=5, Features=10
x = np.random.randn(1, 5, 10).astype(np.float32)

# LSTM Layer with 20 hidden units
lstm_layer = LSTM(20, return_sequences=True, return_state=True)

# Forward pass
outputs, h_state, c_state = lstm_layer(x)

print("Input shape:", x.shape)
print("Output shape (sequences):", outputs.shape)   # (1, 5, 20)
print("Hidden State h shape:", h_state.shape)       # (1, 20)
print("Cell State c shape:", c_state.shape)         # (1, 20)

# Observe: Output at the last time step equals the hidden state
print("Difference between last output and h_state:", np.sum(np.abs(outputs[0, -1, :] - h_state)))



Input shape: (1, 5, 10)
Output shape (sequences): (1, 5, 20)
Hidden State h shape: (1, 20)
Cell State c shape: (1, 20)
Difference between last output and h_state: 0.0


## Task: Sqaure/Cube Prediction using LSTM

In [2]:
# Simple task to predict the next number in a sequence
X = np.array([[[i]] for i in range(100)]) # Sequence of 1 inputs
y = np.array([i+1 for i in range(100)])

model = Sequential()
model.add(LSTM(64, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.summary()

# Train (Mock run)
# model.fit(X, y, epochs=100, verbose=0)

  super().__init__(**kwargs)
