# Week 14 - Day 4: Sequence-to-Sequence Models

## Learning Objectives
- Understand the **encoder-decoder architecture** for sequence modeling
- Implement **multi-step forecasting** approaches
- Master the **teacher forcing** technique for training
- Build a **multi-horizon return prediction** system

---

## Why Seq2Seq for Finance?

Traditional models predict one step ahead. In trading, we often need:
- **Multi-day forecasts** for portfolio rebalancing
- **Term structure predictions** (volatility curves)
- **Scenario generation** for risk management

Seq2Seq models excel at mapping **input sequences → output sequences** of arbitrary lengths.

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Set seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

---

## 1. Data Preparation

We'll download stock data and prepare it for multi-horizon prediction.

In [None]:
# Download data
ticker = 'SPY'
data = yf.download(ticker, start='2015-01-01', end='2024-01-01', progress=False)

# Extract Close prices and compute returns
close = data['Close'].values.flatten()
returns = np.diff(np.log(close))  # Log returns

print(f"Total trading days: {len(close)}")
print(f"Returns shape: {returns.shape}")
print(f"Returns statistics:")
print(f"  Mean: {returns.mean():.6f}")
print(f"  Std:  {returns.std():.6f}")

In [None]:
# Normalize returns
scaler = StandardScaler()
returns_scaled = scaler.fit_transform(returns.reshape(-1, 1)).flatten()

# Visualization
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

axes[0].plot(close, color='blue', linewidth=0.8)
axes[0].set_title(f'{ticker} Close Prices', fontsize=12)
axes[0].set_xlabel('Days')
axes[0].set_ylabel('Price ($)')
axes[0].grid(True, alpha=0.3)

axes[1].plot(returns_scaled, color='green', linewidth=0.5, alpha=0.7)
axes[1].axhline(y=0, color='red', linestyle='--', linewidth=0.5)
axes[1].set_title('Normalized Log Returns', fontsize=12)
axes[1].set_xlabel('Days')
axes[1].set_ylabel('Scaled Return')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## 2. Sequence Dataset Creation

For Seq2Seq, we need:
- **Encoder input**: Past `seq_len` observations
- **Decoder target**: Future `forecast_horizon` observations

In [None]:
def create_seq2seq_dataset(data, seq_len, forecast_horizon):
    """
    Create sequences for Seq2Seq model.
    
    Args:
        data: 1D array of values
        seq_len: Length of encoder input sequence
        forecast_horizon: Length of decoder output sequence
    
    Returns:
        X: Encoder inputs (batch, seq_len, 1)
        Y: Decoder targets (batch, forecast_horizon, 1)
    """
    X, Y = [], []
    
    for i in range(len(data) - seq_len - forecast_horizon + 1):
        X.append(data[i:i + seq_len])
        Y.append(data[i + seq_len:i + seq_len + forecast_horizon])
    
    X = np.array(X).reshape(-1, seq_len, 1)
    Y = np.array(Y).reshape(-1, forecast_horizon, 1)
    
    return X, Y


# Parameters
SEQ_LEN = 30           # Look back 30 days
FORECAST_HORIZON = 5   # Predict 5 days ahead

# Create dataset
X, Y = create_seq2seq_dataset(returns_scaled, SEQ_LEN, FORECAST_HORIZON)

print(f"Encoder input shape (X): {X.shape}")
print(f"Decoder target shape (Y): {Y.shape}")

In [None]:
# Train/Validation/Test split (70/15/15)
n = len(X)
train_size = int(0.7 * n)
val_size = int(0.15 * n)

X_train, Y_train = X[:train_size], Y[:train_size]
X_val, Y_val = X[train_size:train_size + val_size], Y[train_size:train_size + val_size]
X_test, Y_test = X[train_size + val_size:], Y[train_size + val_size:]

print(f"Training samples:   {len(X_train)}")
print(f"Validation samples: {len(X_val)}")
print(f"Test samples:       {len(X_test)}")

# Convert to PyTorch tensors
X_train_t = torch.FloatTensor(X_train).to(device)
Y_train_t = torch.FloatTensor(Y_train).to(device)
X_val_t = torch.FloatTensor(X_val).to(device)
Y_val_t = torch.FloatTensor(Y_val).to(device)
X_test_t = torch.FloatTensor(X_test).to(device)
Y_test_t = torch.FloatTensor(Y_test).to(device)

# Create DataLoaders
BATCH_SIZE = 64

train_dataset = TensorDataset(X_train_t, Y_train_t)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)

val_dataset = TensorDataset(X_val_t, Y_val_t)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)

---

## 3. Encoder-Decoder Architecture

### Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                    ENCODER-DECODER ARCHITECTURE                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ENCODER                          DECODER                      │
│   ┌───────────┐                    ┌───────────┐                │
│   │   LSTM    │──── context ───▶   │   LSTM    │                │
│   │  Layers   │    (h_n, c_n)      │  Layers   │                │
│   └───────────┘                    └───────────┘                │
│        ▲                                │                       │
│        │                                ▼                       │
│   [x₁, x₂, ..., xₜ]              [ŷ₁, ŷ₂, ..., ŷₕ]            │
│   (Input Sequence)               (Predicted Sequence)          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### Key Components:
1. **Encoder**: Processes input sequence → produces context vector (hidden state)
2. **Context Vector**: Compressed representation of input sequence
3. **Decoder**: Uses context to generate output sequence step-by-step

In [None]:
class Encoder(nn.Module):
    """LSTM Encoder for Seq2Seq model."""
    
    def __init__(self, input_size, hidden_size, num_layers, dropout=0.2):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )
    
    def forward(self, x):
        """
        Args:
            x: (batch, seq_len, input_size)
        Returns:
            hidden: tuple of (h_n, c_n), each (num_layers, batch, hidden_size)
        """
        # outputs: (batch, seq_len, hidden_size)
        # h_n, c_n: (num_layers, batch, hidden_size)
        outputs, (h_n, c_n) = self.lstm(x)
        return (h_n, c_n)


class Decoder(nn.Module):
    """LSTM Decoder for Seq2Seq model."""
    
    def __init__(self, input_size, hidden_size, output_size, num_layers, dropout=0.2):
        super(Decoder, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.output_size = output_size
        
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )
        
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x, hidden):
        """
        Args:
            x: (batch, 1, input_size) - single timestep input
            hidden: tuple of (h, c)
        Returns:
            output: (batch, 1, output_size)
            hidden: updated hidden state
        """
        output, hidden = self.lstm(x, hidden)
        output = self.fc(output)
        return output, hidden


print("Encoder-Decoder classes defined!")

---

## 4. Teacher Forcing Technique

### What is Teacher Forcing?

During training, the decoder can use:
- **Own predictions** (autoregressive) - errors compound
- **Ground truth** (teacher forcing) - faster convergence, but exposure bias

```
WITHOUT Teacher Forcing:          WITH Teacher Forcing:
                                  
  Decoder input = ŷₜ₋₁            Decoder input = yₜ₋₁ (true)
       │                               │
       ▼                               ▼
  [DECODER] ──▶ ŷₜ               [DECODER] ──▶ ŷₜ
       │                               │
       ▼                               ▼
  (Error accumulates)            (Faster learning)
```

### Teacher Forcing Ratio
- Start with high ratio (0.5-1.0) for fast initial learning
- Gradually decrease to 0 (scheduled sampling)
- Helps bridge train-test gap

In [None]:
class Seq2Seq(nn.Module):
    """Sequence-to-Sequence model with teacher forcing support."""
    
    def __init__(self, encoder, decoder, forecast_horizon, device):
        super(Seq2Seq, self).__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.forecast_horizon = forecast_horizon
        self.device = device
    
    def forward(self, src, trg=None, teacher_forcing_ratio=0.5):
        """
        Args:
            src: Encoder input (batch, seq_len, input_size)
            trg: Decoder target (batch, forecast_horizon, output_size) - optional
            teacher_forcing_ratio: Probability of using ground truth as input
        
        Returns:
            outputs: (batch, forecast_horizon, output_size)
        """
        batch_size = src.shape[0]
        output_size = self.decoder.output_size
        
        # Store outputs
        outputs = torch.zeros(batch_size, self.forecast_horizon, output_size).to(self.device)
        
        # Encode input sequence
        hidden = self.encoder(src)
        
        # First decoder input: last value of encoder input
        decoder_input = src[:, -1:, :]  # (batch, 1, input_size)
        
        # Decode step by step
        for t in range(self.forecast_horizon):
            output, hidden = self.decoder(decoder_input, hidden)
            outputs[:, t:t+1, :] = output
            
            # Teacher forcing decision
            if trg is not None and np.random.random() < teacher_forcing_ratio:
                # Use ground truth as next input
                decoder_input = trg[:, t:t+1, :]
            else:
                # Use own prediction as next input
                decoder_input = output
        
        return outputs


print("Seq2Seq model with teacher forcing defined!")

In [None]:
# Model hyperparameters
INPUT_SIZE = 1
HIDDEN_SIZE = 64
NUM_LAYERS = 2
OUTPUT_SIZE = 1
DROPOUT = 0.2

# Initialize model components
encoder = Encoder(INPUT_SIZE, HIDDEN_SIZE, NUM_LAYERS, DROPOUT).to(device)
decoder = Decoder(INPUT_SIZE, HIDDEN_SIZE, OUTPUT_SIZE, NUM_LAYERS, DROPOUT).to(device)
model = Seq2Seq(encoder, decoder, FORECAST_HORIZON, device).to(device)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Model Architecture:")
print(f"  Hidden Size: {HIDDEN_SIZE}")
print(f"  Num Layers: {NUM_LAYERS}")
print(f"  Forecast Horizon: {FORECAST_HORIZON}")
print(f"\nTotal Parameters: {total_params:,}")
print(f"Trainable Parameters: {trainable_params:,}")

---

## 5. Training with Scheduled Teacher Forcing

We'll implement **scheduled sampling** where teacher forcing ratio decreases over epochs.

In [None]:
def train_epoch(model, loader, optimizer, criterion, teacher_forcing_ratio):
    """Train for one epoch."""
    model.train()
    total_loss = 0
    
    for X_batch, Y_batch in loader:
        optimizer.zero_grad()
        
        # Forward pass with teacher forcing
        outputs = model(X_batch, Y_batch, teacher_forcing_ratio)
        
        # Compute loss
        loss = criterion(outputs, Y_batch)
        
        # Backward pass
        loss.backward()
        
        # Gradient clipping to prevent exploding gradients
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        
        optimizer.step()
        total_loss += loss.item()
    
    return total_loss / len(loader)


def evaluate(model, loader, criterion):
    """Evaluate model (no teacher forcing)."""
    model.eval()
    total_loss = 0
    
    with torch.no_grad():
        for X_batch, Y_batch in loader:
            # Forward pass without teacher forcing
            outputs = model(X_batch, None, teacher_forcing_ratio=0.0)
            loss = criterion(outputs, Y_batch)
            total_loss += loss.item()
    
    return total_loss / len(loader)


print("Training and evaluation functions defined!")

In [None]:
# Training configuration
EPOCHS = 100
LEARNING_RATE = 0.001
INITIAL_TF_RATIO = 0.5  # Initial teacher forcing ratio
TF_DECAY = 0.95         # Decay rate per epoch

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10)

# Training history
train_losses = []
val_losses = []
tf_ratios = []

best_val_loss = float('inf')
best_model_state = None

print("Starting training with scheduled teacher forcing...\n")

tf_ratio = INITIAL_TF_RATIO

for epoch in range(EPOCHS):
    # Train
    train_loss = train_epoch(model, train_loader, optimizer, criterion, tf_ratio)
    
    # Validate
    val_loss = evaluate(model, val_loader, criterion)
    
    # Learning rate scheduling
    scheduler.step(val_loss)
    
    # Store history
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    tf_ratios.append(tf_ratio)
    
    # Save best model
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_model_state = model.state_dict().copy()
    
    # Decay teacher forcing ratio
    tf_ratio = max(0.0, tf_ratio * TF_DECAY)
    
    # Print progress
    if (epoch + 1) % 10 == 0:
        current_lr = optimizer.param_groups[0]['lr']
        print(f"Epoch {epoch+1:3d}/{EPOCHS} | "
              f"Train Loss: {train_loss:.6f} | "
              f"Val Loss: {val_loss:.6f} | "
              f"TF Ratio: {tf_ratio:.3f} | "
              f"LR: {current_lr:.6f}")

# Load best model
model.load_state_dict(best_model_state)
print(f"\nBest validation loss: {best_val_loss:.6f}")

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss curves
axes[0].plot(train_losses, label='Train Loss', color='blue', linewidth=1.5)
axes[0].plot(val_losses, label='Val Loss', color='red', linewidth=1.5)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('MSE Loss')
axes[0].set_title('Training and Validation Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].set_yscale('log')

# Teacher forcing ratio decay
axes[1].plot(tf_ratios, color='green', linewidth=2)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Teacher Forcing Ratio')
axes[1].set_title('Scheduled Sampling (TF Ratio Decay)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## 6. Multi-Step Forecasting Approaches

Let's compare different strategies for multi-step prediction:

| Approach | Description | Pros | Cons |
|----------|-------------|------|------|
| **Direct** | Separate model per horizon | No error accumulation | Many models to train |
| **Recursive** | Feed predictions back | One model | Error compounds |
| **Seq2Seq** | Single model, multi-output | End-to-end | More complex |
| **Multi-Output** | Single forward pass | Fast inference | Fixed horizon |

In [None]:
# Multi-Output Direct Prediction Model (for comparison)
class DirectMultiStepLSTM(nn.Module):
    """Direct multi-step prediction: outputs all horizons at once."""
    
    def __init__(self, input_size, hidden_size, num_layers, output_horizon):
        super(DirectMultiStepLSTM, self).__init__()
        
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=0.2
        )
        
        # Output all horizons in one shot
        self.fc = nn.Linear(hidden_size, output_horizon)
    
    def forward(self, x):
        # x: (batch, seq_len, input_size)
        lstm_out, _ = self.lstm(x)
        # Use last hidden state
        last_hidden = lstm_out[:, -1, :]  # (batch, hidden_size)
        output = self.fc(last_hidden)     # (batch, output_horizon)
        return output.unsqueeze(-1)       # (batch, output_horizon, 1)


# Initialize direct model
direct_model = DirectMultiStepLSTM(INPUT_SIZE, HIDDEN_SIZE, NUM_LAYERS, FORECAST_HORIZON).to(device)

print(f"Direct Multi-Step Model Parameters: {sum(p.numel() for p in direct_model.parameters()):,}")

In [None]:
# Train direct model for comparison
direct_optimizer = optim.Adam(direct_model.parameters(), lr=LEARNING_RATE)
direct_train_losses = []
direct_val_losses = []

print("Training Direct Multi-Step Model...\n")

for epoch in range(EPOCHS):
    # Train
    direct_model.train()
    train_loss = 0
    for X_batch, Y_batch in train_loader:
        direct_optimizer.zero_grad()
        outputs = direct_model(X_batch)
        loss = criterion(outputs, Y_batch)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(direct_model.parameters(), max_norm=1.0)
        direct_optimizer.step()
        train_loss += loss.item()
    
    # Validate
    direct_model.eval()
    val_loss = 0
    with torch.no_grad():
        for X_batch, Y_batch in val_loader:
            outputs = direct_model(X_batch)
            val_loss += criterion(outputs, Y_batch).item()
    
    direct_train_losses.append(train_loss / len(train_loader))
    direct_val_losses.append(val_loss / len(val_loader))
    
    if (epoch + 1) % 20 == 0:
        print(f"Epoch {epoch+1:3d}/{EPOCHS} | "
              f"Train: {direct_train_losses[-1]:.6f} | "
              f"Val: {direct_val_losses[-1]:.6f}")

---

## 7. Multi-Horizon Return Prediction - Evaluation

Now let's evaluate both models on the test set and analyze per-horizon performance.

In [None]:
def predict_and_evaluate(model, X, Y, model_name, is_seq2seq=True):
    """
    Generate predictions and compute metrics.
    
    Returns:
        predictions: numpy array
        metrics: dict with MSE, MAE per horizon
    """
    model.eval()
    
    with torch.no_grad():
        if is_seq2seq:
            predictions = model(X, None, teacher_forcing_ratio=0.0)
        else:
            predictions = model(X)
    
    predictions = predictions.cpu().numpy()
    Y_np = Y.cpu().numpy()
    
    # Per-horizon metrics
    metrics = {'horizon': [], 'mse': [], 'mae': [], 'rmse': []}
    
    for h in range(FORECAST_HORIZON):
        pred_h = predictions[:, h, 0]
        true_h = Y_np[:, h, 0]
        
        mse = mean_squared_error(true_h, pred_h)
        mae = mean_absolute_error(true_h, pred_h)
        
        metrics['horizon'].append(h + 1)
        metrics['mse'].append(mse)
        metrics['mae'].append(mae)
        metrics['rmse'].append(np.sqrt(mse))
    
    return predictions, pd.DataFrame(metrics)


# Evaluate Seq2Seq model
seq2seq_preds, seq2seq_metrics = predict_and_evaluate(
    model, X_test_t, Y_test_t, "Seq2Seq", is_seq2seq=True
)

# Evaluate Direct model
direct_preds, direct_metrics = predict_and_evaluate(
    direct_model, X_test_t, Y_test_t, "Direct", is_seq2seq=False
)

print("="*60)
print("SEQ2SEQ MODEL - Per-Horizon Metrics")
print("="*60)
print(seq2seq_metrics.to_string(index=False))

print("\n" + "="*60)
print("DIRECT MODEL - Per-Horizon Metrics")
print("="*60)
print(direct_metrics.to_string(index=False))

In [None]:
# Visualization: Per-horizon comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

horizons = seq2seq_metrics['horizon'].values
x = np.arange(len(horizons))
width = 0.35

# MSE comparison
axes[0].bar(x - width/2, seq2seq_metrics['mse'], width, label='Seq2Seq', color='steelblue')
axes[0].bar(x + width/2, direct_metrics['mse'], width, label='Direct', color='coral')
axes[0].set_xlabel('Forecast Horizon (days)')
axes[0].set_ylabel('MSE')
axes[0].set_title('MSE by Forecast Horizon')
axes[0].set_xticks(x)
axes[0].set_xticklabels(horizons)
axes[0].legend()
axes[0].grid(True, alpha=0.3, axis='y')

# MAE comparison
axes[1].bar(x - width/2, seq2seq_metrics['mae'], width, label='Seq2Seq', color='steelblue')
axes[1].bar(x + width/2, direct_metrics['mae'], width, label='Direct', color='coral')
axes[1].set_xlabel('Forecast Horizon (days)')
axes[1].set_ylabel('MAE')
axes[1].set_title('MAE by Forecast Horizon')
axes[1].set_xticks(x)
axes[1].set_xticklabels(horizons)
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

In [None]:
# Sample predictions visualization
Y_test_np = Y_test_t.cpu().numpy()

# Select sample indices to visualize
sample_indices = [0, 50, 100, 150]

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

for idx, ax in zip(sample_indices, axes):
    horizons = range(1, FORECAST_HORIZON + 1)
    
    ax.plot(horizons, Y_test_np[idx, :, 0], 'ko-', label='Actual', markersize=8, linewidth=2)
    ax.plot(horizons, seq2seq_preds[idx, :, 0], 'bs--', label='Seq2Seq', markersize=6, linewidth=1.5)
    ax.plot(horizons, direct_preds[idx, :, 0], 'r^--', label='Direct', markersize=6, linewidth=1.5)
    
    ax.axhline(y=0, color='gray', linestyle=':', alpha=0.5)
    ax.set_xlabel('Forecast Horizon (days)')
    ax.set_ylabel('Scaled Return')
    ax.set_title(f'Sample {idx}: Multi-Horizon Prediction')
    ax.legend(loc='best')
    ax.grid(True, alpha=0.3)
    ax.set_xticks(horizons)

plt.tight_layout()
plt.show()

---

## 8. Directional Accuracy Analysis

For trading, **direction** matters more than exact value. Let's analyze directional accuracy.

In [None]:
def compute_directional_accuracy(predictions, actuals):
    """
    Compute directional accuracy per horizon.
    
    Returns:
        DataFrame with horizon and accuracy
    """
    results = {'horizon': [], 'accuracy': [], 'up_precision': [], 'down_precision': []}
    
    for h in range(predictions.shape[1]):
        pred_dir = np.sign(predictions[:, h, 0])
        true_dir = np.sign(actuals[:, h, 0])
        
        # Overall accuracy
        accuracy = np.mean(pred_dir == true_dir)
        
        # Precision for up/down predictions
        up_mask = pred_dir > 0
        down_mask = pred_dir < 0
        
        up_precision = np.mean(true_dir[up_mask] > 0) if up_mask.sum() > 0 else 0
        down_precision = np.mean(true_dir[down_mask] < 0) if down_mask.sum() > 0 else 0
        
        results['horizon'].append(h + 1)
        results['accuracy'].append(accuracy)
        results['up_precision'].append(up_precision)
        results['down_precision'].append(down_precision)
    
    return pd.DataFrame(results)


# Compute directional accuracy
seq2seq_dir = compute_directional_accuracy(seq2seq_preds, Y_test_np)
direct_dir = compute_directional_accuracy(direct_preds, Y_test_np)

print("SEQ2SEQ - Directional Accuracy")
print(seq2seq_dir.round(4).to_string(index=False))

print("\nDIRECT - Directional Accuracy")
print(direct_dir.round(4).to_string(index=False))

In [None]:
# Visualization: Directional accuracy
fig, ax = plt.subplots(figsize=(10, 6))

x = np.arange(FORECAST_HORIZON)
width = 0.35

ax.bar(x - width/2, seq2seq_dir['accuracy'] * 100, width, label='Seq2Seq', color='steelblue')
ax.bar(x + width/2, direct_dir['accuracy'] * 100, width, label='Direct', color='coral')

ax.axhline(y=50, color='red', linestyle='--', linewidth=1.5, label='Random (50%)')

ax.set_xlabel('Forecast Horizon (days)', fontsize=12)
ax.set_ylabel('Directional Accuracy (%)', fontsize=12)
ax.set_title('Directional Accuracy by Horizon', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels([f'Day {i+1}' for i in range(FORECAST_HORIZON)])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
ax.set_ylim(40, 60)

plt.tight_layout()
plt.show()

---

## 9. Practical Trading Application

Let's simulate a simple multi-horizon strategy using Seq2Seq predictions.

In [None]:
def backtest_multi_horizon(predictions, actuals, threshold=0.0):
    """
    Backtest using multi-horizon predictions.
    
    Strategy:
    - If average predicted return over horizon > threshold: Long
    - If average predicted return over horizon < -threshold: Short
    - Hold position for 'horizon' days
    """
    n_samples = predictions.shape[0]
    
    # Average prediction across horizons
    avg_pred = predictions.mean(axis=1).flatten()
    
    # Generate signals
    signals = np.zeros(n_samples)
    signals[avg_pred > threshold] = 1   # Long
    signals[avg_pred < -threshold] = -1  # Short
    
    # Actual returns (sum over horizon)
    actual_returns = actuals.sum(axis=1).flatten()
    
    # Strategy returns
    strategy_returns = signals * actual_returns
    
    # Metrics
    total_return = np.sum(strategy_returns)
    hit_rate = np.mean((signals != 0) & (np.sign(strategy_returns) > 0))
    trades = np.sum(signals != 0)
    
    return {
        'signals': signals,
        'strategy_returns': strategy_returns,
        'total_return': total_return,
        'hit_rate': hit_rate,
        'num_trades': trades,
        'sharpe': np.mean(strategy_returns) / (np.std(strategy_returns) + 1e-8) * np.sqrt(252 / FORECAST_HORIZON)
    }


# Backtest both models
seq2seq_bt = backtest_multi_horizon(seq2seq_preds, Y_test_np, threshold=0.1)
direct_bt = backtest_multi_horizon(direct_preds, Y_test_np, threshold=0.1)

print("="*50)
print("MULTI-HORIZON BACKTEST RESULTS")
print("="*50)
print(f"\n{'Metric':<20} {'Seq2Seq':>12} {'Direct':>12}")
print("-"*44)
print(f"{'Total Return':.<20} {seq2seq_bt['total_return']:>12.4f} {direct_bt['total_return']:>12.4f}")
print(f"{'Hit Rate':.<20} {seq2seq_bt['hit_rate']:>12.2%} {direct_bt['hit_rate']:>12.2%}")
print(f"{'Num Trades':.<20} {seq2seq_bt['num_trades']:>12} {direct_bt['num_trades']:>12}")
print(f"{'Sharpe Ratio':.<20} {seq2seq_bt['sharpe']:>12.2f} {direct_bt['sharpe']:>12.2f}")

In [None]:
# Cumulative returns plot
fig, ax = plt.subplots(figsize=(12, 6))

seq2seq_cumret = np.cumsum(seq2seq_bt['strategy_returns'])
direct_cumret = np.cumsum(direct_bt['strategy_returns'])
buy_hold = np.cumsum(Y_test_np.sum(axis=1).flatten())  # Buy and hold

ax.plot(seq2seq_cumret, label='Seq2Seq Strategy', color='steelblue', linewidth=2)
ax.plot(direct_cumret, label='Direct Strategy', color='coral', linewidth=2)
ax.plot(buy_hold, label='Buy & Hold', color='gray', linestyle='--', linewidth=1.5)

ax.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax.set_xlabel('Sample Index')
ax.set_ylabel('Cumulative Return (scaled)')
ax.set_title('Multi-Horizon Strategy Performance')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## 10. Key Takeaways

### Encoder-Decoder Architecture
- **Encoder** compresses input sequence into context vector
- **Decoder** generates output sequence from context
- Suitable for variable-length input/output sequences

### Teacher Forcing
- Speeds up training by using ground truth as decoder input
- **Scheduled sampling** gradually reduces TF ratio
- Helps bridge train-test distribution gap

### Multi-Step Forecasting
- **Direct**: Train separate model per horizon
- **Recursive**: Feed predictions back (error accumulation)
- **Seq2Seq**: End-to-end multi-output (recommended)

### Financial Applications
- Multi-day return forecasting
- Volatility term structure prediction
- Risk scenario generation

---

## Next Steps
- Add **attention mechanism** to improve long-horizon predictions
- Experiment with **bidirectional encoder**
- Incorporate **multivariate inputs** (OHLCV, technical indicators)
- Implement **beam search** for probabilistic forecasts

In [None]:
# Summary statistics
print("="*60)
print("SESSION SUMMARY")
print("="*60)
print(f"\nData: {ticker}")
print(f"Input Sequence Length: {SEQ_LEN} days")
print(f"Forecast Horizon: {FORECAST_HORIZON} days")
print(f"\nSeq2Seq Model:")
print(f"  - Encoder: {NUM_LAYERS}-layer LSTM ({HIDDEN_SIZE} hidden units)")
print(f"  - Decoder: {NUM_LAYERS}-layer LSTM ({HIDDEN_SIZE} hidden units)")
print(f"  - Teacher Forcing: Scheduled ({INITIAL_TF_RATIO:.0%} → 0%)")
print(f"  - Best Val Loss: {best_val_loss:.6f}")
print(f"\nDirect Model:")
print(f"  - {NUM_LAYERS}-layer LSTM with multi-output head")
print(f"  - Final Val Loss: {direct_val_losses[-1]:.6f}")
print(f"\nTest Performance:")
print(f"  - Seq2Seq Avg MSE: {seq2seq_metrics['mse'].mean():.6f}")
print(f"  - Direct Avg MSE: {direct_metrics['mse'].mean():.6f}")