# Week 14: Recurrent Networks - LSTM, GRU for Sequences

## üéØ Learning Objectives

By the end of this week, you will understand:
- **RNN Fundamentals**: Sequential processing
- **LSTM**: Long Short-Term Memory gates
- **GRU**: Gated Recurrent Units (simplified LSTM)
- **Finance Applications**: Time series forecasting

---

## Why Recurrent Networks?

Financial data is sequential - today depends on yesterday:
- Price history matters
- Patterns repeat at different time scales
- Order of events is crucial

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)
print("‚úÖ Libraries loaded!")
print("üìö Week 14: Recurrent Networks")

---

## Part 1: RNN Fundamentals

### The Idea

RNNs maintain a hidden state that carries information through time:

$$h_t = \tanh(W_{hh} h_{t-1} + W_{xh} x_t + b_h)$$
$$y_t = W_{hy} h_t + b_y$$

### ü§î Simple Explanation

An RNN has "memory" - it processes sequences one step at a time, updating its internal state. Think of it as reading a book word by word while remembering context.

In [None]:
# Simple RNN cell (conceptual)
class SimpleRNNCell:
    def __init__(self, input_size, hidden_size):
        self.Wxh = np.random.randn(hidden_size, input_size) * 0.01
        self.Whh = np.random.randn(hidden_size, hidden_size) * 0.01
        self.bh = np.zeros((hidden_size, 1))
        self.hidden_size = hidden_size
    
    def forward(self, x, h_prev):
        """Single step forward pass"""
        h_new = np.tanh(self.Wxh @ x + self.Whh @ h_prev + self.bh)
        return h_new

# Demonstrate RNN processing
rnn = SimpleRNNCell(input_size=3, hidden_size=4)
sequence = np.random.randn(5, 3, 1)  # 5 time steps, 3 features
h = np.zeros((4, 1))  # Initial hidden state

print("RNN Forward Pass")
print("="*50)
for t, x in enumerate(sequence):
    h = rnn.forward(x, h)
    print(f"Time {t}: Hidden state shape = {h.shape}, values = {h.ravel()[:2]}...")

---

## Part 2: LSTM - Long Short-Term Memory

### The Problem with Vanilla RNN

- Vanishing gradients ‚Üí Can't learn long-term dependencies
- Exploding gradients ‚Üí Unstable training

### LSTM Gates

1. **Forget Gate**: What to forget from cell state
2. **Input Gate**: What new info to add
3. **Output Gate**: What to output

$$f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$$
$$i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$$
$$\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$$
$$C_t = f_t * C_{t-1} + i_t * \tilde{C}_t$$
$$o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$$
$$h_t = o_t * \tanh(C_t)$$

### ü§î Simple Explanation

LSTM has gates that control information flow:
- Forget gate: "Should I forget yesterday's news?"
- Input gate: "Is today's info worth remembering?"
- Output gate: "What should I report now?"

In [None]:
# Visualize LSTM gates
fig, ax = plt.subplots(figsize=(12, 6))

# Create a diagram
ax.text(0.1, 0.5, 'x_t\n(input)', ha='center', fontsize=12, bbox=dict(boxstyle='round', facecolor='lightblue'))
ax.text(0.3, 0.8, 'Forget\nGate', ha='center', fontsize=10, bbox=dict(boxstyle='round', facecolor='lightyellow'))
ax.text(0.5, 0.8, 'Input\nGate', ha='center', fontsize=10, bbox=dict(boxstyle='round', facecolor='lightgreen'))
ax.text(0.7, 0.8, 'Output\nGate', ha='center', fontsize=10, bbox=dict(boxstyle='round', facecolor='lightcoral'))
ax.text(0.5, 0.5, 'Cell State\nC_t', ha='center', fontsize=12, bbox=dict(boxstyle='round', facecolor='lightgray'))
ax.text(0.9, 0.5, 'h_t\n(output)', ha='center', fontsize=12, bbox=dict(boxstyle='round', facecolor='lightblue'))

# Arrows
ax.annotate('', xy=(0.3, 0.6), xytext=(0.15, 0.5), arrowprops=dict(arrowstyle='->'))
ax.annotate('', xy=(0.5, 0.6), xytext=(0.35, 0.75), arrowprops=dict(arrowstyle='->'))
ax.annotate('', xy=(0.65, 0.5), xytext=(0.55, 0.5), arrowprops=dict(arrowstyle='->'))
ax.annotate('', xy=(0.85, 0.5), xytext=(0.7, 0.65), arrowprops=dict(arrowstyle='->'))

ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.axis('off')
ax.set_title('LSTM Cell Structure (Simplified)')
plt.tight_layout()
plt.show()

---

## Part 3: LSTM for Price Prediction

### Data Preparation

For sequence models, we need to create sliding windows of historical data.

In [None]:
# Generate synthetic price data
n = 1000
np.random.seed(42)

# Random walk with trend and seasonality
returns = np.random.randn(n) * 0.02 + 0.0001
prices = 100 * np.cumprod(1 + returns)

# Create sequences
def create_sequences(data, seq_length):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

# Scale data
scaler = MinMaxScaler()
prices_scaled = scaler.fit_transform(prices.reshape(-1, 1)).ravel()

# Create sequences
seq_length = 20
X, y = create_sequences(prices_scaled, seq_length)

# Reshape for LSTM: (samples, time steps, features)
X = X.reshape(-1, seq_length, 1)

# Train/test split (no shuffle for time series!)
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Sequence shape: {X_train.shape}")

In [None]:
try:
    import torch
    import torch.nn as nn
    
    class LSTMPredictor(nn.Module):
        def __init__(self, input_size=1, hidden_size=50, num_layers=2):
            super().__init__()
            self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
            self.fc = nn.Linear(hidden_size, 1)
        
        def forward(self, x):
            lstm_out, _ = self.lstm(x)
            return self.fc(lstm_out[:, -1, :])
    
    # Convert to tensors
    X_train_t = torch.FloatTensor(X_train)
    y_train_t = torch.FloatTensor(y_train).unsqueeze(1)
    X_test_t = torch.FloatTensor(X_test)
    y_test_t = torch.FloatTensor(y_test).unsqueeze(1)
    
    # Train
    model = LSTMPredictor()
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    
    epochs = 50
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        outputs = model(X_train_t)
        loss = criterion(outputs, y_train_t)
        loss.backward()
        optimizer.step()
        
        if (epoch + 1) % 10 == 0:
            print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.6f}")
    
    # Evaluate
    model.eval()
    with torch.no_grad():
        train_pred = model(X_train_t).numpy()
        test_pred = model(X_test_t).numpy()
    
    print(f"\nTest MSE: {np.mean((test_pred.ravel() - y_test)**2):.6f}")
    
except ImportError:
    print("‚ö†Ô∏è PyTorch not installed. Skipping LSTM implementation.")
    print("Install with: pip install torch")

---

## Part 4: GRU - Gated Recurrent Unit

### Simplified LSTM

GRU has only 2 gates (vs LSTM's 3):
- **Reset Gate**: How much past to forget
- **Update Gate**: How much new info to add

### When to Use GRU vs LSTM

- GRU: Faster training, fewer parameters, shorter sequences
- LSTM: Longer sequences, more complex patterns

---

## Interview Questions

### Conceptual
1. What problem does LSTM solve that vanilla RNN cannot?
2. Explain the purpose of each gate in LSTM.
3. Why use sequences for financial forecasting?

### Technical
1. How do you choose sequence length?
2. What's the difference between stateful and stateless LSTM?
3. How do you handle variable length sequences?

### Finance-Specific
1. Can LSTM capture mean reversion?
2. How would you use LSTM for volatility forecasting?
3. What are the risks of using LSTM for trading?

---

## Key Takeaways

| Model | Complexity | Best For |
|-------|------------|----------|
| RNN | Simple | Short sequences |
| LSTM | Complex | Long-term dependencies |
| GRU | Medium | Balance of speed/performance |