# 3. Model Comparison ‚Äî So s√°nh m√¥ h√¨nh

**M·ª•c ti√™u:** So s√°nh nhi·ªÅu m√¥ h√¨nh d·ª± ƒëo√°n volatility ƒë·ªÉ ch·ªçn m√¥ h√¨nh t·ªët nh·∫•t.

**Input ch√≠nh:**
- data/processed/timestep_*.pt (dataset ƒë√£ x·ª≠ l√Ω)

**Output ch√≠nh:**
- Bi·ªÉu ƒë·ªì/metrics so s√°nh m√¥ h√¨nh
- K·∫øt lu·∫≠n m√¥ h√¨nh t·ªët nh·∫•t

**Quy tr√¨nh:**
1) Load dataset
2) Hu·∫•n luy·ªán nhi·ªÅu m√¥ h√¨nh
3) So s√°nh metric & ch·ªçn m√¥ h√¨nh

> Ch·∫°y l·∫ßn l∆∞·ª£t c√°c cell ƒë·ªÉ t√°i t·∫°o k·∫øt qu·∫£.

# Model Comparison for Volatility Prediction

This notebook compares different algorithms for predicting stock volatility:

| Group | Algorithm | Role |
|-------|-----------|------|
| Baseline | ARIMA | Statistical benchmark |
| ML | Random Forest | Non-linear baseline |
| DL | LSTM | Temporal modeling |
| (Optional) | GRU | Lighter than LSTM |

## Evaluation Metrics
- **MSE** (Mean Squared Error): Average squared prediction error
- **RMSE** (Root Mean Squared Error): Square root of MSE
- **MAE** (Mean Absolute Error): Average absolute prediction error
- **MAPE** (Mean Absolute Percentage Error): Percentage error
- **R¬≤** (Coefficient of Determination): Proportion of variance explained

In [None]:
import sys
sys.path.append('../')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.optim as optim
from torch_geometric.loader import DataLoader
import warnings
import json
import os

# Import custom modules
from src.datasets import VNStocksDataset
from src.datasets.VNStocksDataset import VNStocksVolatilityDataset
from src.models import LSTMModel, GRUModel, RandomForestModel, ARIMAModel
from src.utils import (
    train, calculate_metrics, evaluate_model, evaluate_sklearn_model,
    plot_predictions, compare_models, print_metrics_table, count_parameters
)

warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-darkgrid')

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

%matplotlib inline

# Device configuration
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

## 1. Load Data

Load the volatility dataset and create train/test splits.

In [None]:
# Load dataset
print("Loading dataset...")
dataset = VNStocksVolatilityDataset(
    root='../data/',
    past_window=25,
    future_window=5,
    volatility_window=20,
    force_reload=False
)

print(f"Dataset size: {len(dataset)}")
print(f"Sample shape: {dataset[0].x.shape}")
print(f"Target shape: {dataset[0].y.shape}")

# Train/test split (80/20)
train_ratio = 0.8
train_size = int(len(dataset) * train_ratio)

train_dataset = dataset[:train_size]
test_dataset = dataset[train_size:]

print(f"\nTrain size: {len(train_dataset)}")
print(f"Test size: {len(test_dataset)}")

In [None]:
# Create data loaders for DL models
batch_size = 32

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print(f"Train batches: {len(train_loader)}")
print(f"Test batches: {len(test_loader)}")

## 2. Baseline: ARIMA Model

Classical statistical model for time series forecasting.

In [None]:
print("="*80)
print("TRAINING ARIMA MODEL")
print("="*80)

# Prepare data for ARIMA (use first sample as training)
# ARIMA trains on each stock's time series independently
X_train_arima = train_dataset[0].x  # (nodes, features, timesteps)
X_test_arima = test_dataset[0].x
y_test_arima = test_dataset[0].y

# Train ARIMA
arima_model = ARIMAModel(order=(2, 1, 1))
print("Fitting ARIMA models...")
arima_model.fit(X_train_arima)

# Predict volatility
print("Making predictions...")
y_pred_arima = arima_model.predict_volatility(
    X_test_arima,
    volatility_window=20,
    future_window=5
)

# Evaluate
y_true_arima, y_pred_arima = evaluate_sklearn_model(arima_model, X_test_arima, y_test_arima)
arima_metrics = calculate_metrics(y_test_arima.numpy(), y_pred_arima)

print("\nARIMA Results:")
for metric, value in arima_metrics.items():
    print(f"  {metric}: {value:.6f}")

## 3. ML Model: Random Forest

Non-linear model that doesn't explicitly model temporal dependencies.

In [None]:
print("="*80)
print("TRAINING RANDOM FOREST MODEL")
print("="*80)

# Prepare data
X_train_rf = torch.stack([data.x for data in train_dataset])
y_train_rf = torch.cat([data.y for data in train_dataset])
X_test_rf = torch.stack([data.x for data in test_dataset])
y_test_rf = torch.cat([data.y for data in test_dataset])

print(f"Train shape: {X_train_rf.shape}, {y_train_rf.shape}")
print(f"Test shape: {X_test_rf.shape}, {y_test_rf.shape}")

# Train Random Forest
rf_model = RandomForestModel(
    n_estimators=100,
    max_depth=20,
    min_samples_split=5,
    random_state=42
)

print("\nFitting Random Forest...")
rf_model.fit(X_train_rf, y_train_rf)

# Evaluate
print("Evaluating...")
y_true_rf, y_pred_rf = evaluate_sklearn_model(rf_model, X_test_rf, y_test_rf)
rf_metrics = calculate_metrics(y_true_rf, y_pred_rf)

print("\nRandom Forest Results:")
for metric, value in rf_metrics.items():
    print(f"  {metric}: {value:.6f}")

# Feature importance
importance = rf_model.feature_importance()
print(f"\nTop 10 important features (indices): {np.argsort(importance)[-10:][::-1]}")

## 4. Deep Learning: LSTM Model

Recurrent neural network for modeling temporal dependencies.

In [None]:
print("="*80)
print("TRAINING LSTM MODEL")
print("="*80)

# Create model
in_features = dataset[0].x.shape[1]  # Number of features
lstm_model = LSTMModel(
    in_features=in_features,
    hidden_size=64,
    num_layers=2,
    dropout=0.2,
    bidirectional=False
)

print(f"Model parameters: {count_parameters(lstm_model):,}")

# Training configuration
optimizer = optim.Adam(lstm_model.parameters(), lr=0.001)
criterion = nn.MSELoss()
num_epochs = 50

# Train
os.makedirs('../models', exist_ok=True)
print("\nTraining...")
lstm_history = train(
    model=lstm_model,
    optimizer=optimizer,
    criterion=criterion,
    train_loader=train_loader,
    test_loader=test_loader,
    num_epochs=num_epochs,
    device=device,
    task_title="volatility_LSTM",
    early_stopping_patience=10
)

# Evaluate
print("\nEvaluating...")
y_true_lstm, y_pred_lstm = evaluate_model(lstm_model, test_loader, device)
lstm_metrics = calculate_metrics(y_true_lstm, y_pred_lstm)

print("\nLSTM Results:")
for metric, value in lstm_metrics.items():
    print(f"  {metric}: {value:.6f}")

## 5. Deep Learning: GRU Model (Optional)

Lighter alternative to LSTM with fewer parameters.

In [None]:
print("="*80)
print("TRAINING GRU MODEL")
print("="*80)

# Create model
gru_model = GRUModel(
    in_features=in_features,
    hidden_size=64,
    num_layers=2,
    dropout=0.2,
    bidirectional=False
)

print(f"Model parameters: {count_parameters(gru_model):,}")
print(f"LSTM parameters: {count_parameters(lstm_model):,}")
print(f"Parameter reduction: {(1 - count_parameters(gru_model)/count_parameters(lstm_model))*100:.1f}%")

# Training configuration
optimizer = optim.Adam(gru_model.parameters(), lr=0.001)
criterion = nn.MSELoss()

# Train
print("\nTraining...")
gru_history = train(
    model=gru_model,
    optimizer=optimizer,
    criterion=criterion,
    train_loader=train_loader,
    test_loader=test_loader,
    num_epochs=num_epochs,
    device=device,
    task_title="volatility_GRU",
    early_stopping_patience=10
)

# Evaluate
print("\nEvaluating...")
y_true_gru, y_pred_gru = evaluate_model(gru_model, test_loader, device)
gru_metrics = calculate_metrics(y_true_gru, y_pred_gru)

print("\nGRU Results:")
for metric, value in gru_metrics.items():
    print(f"  {metric}: {value:.6f}")

## 6. Model Comparison

Compare all models side-by-side.

In [None]:
# Compile results
results = {
    'ARIMA': arima_metrics,
    'Random Forest': rf_metrics,
    'LSTM': lstm_metrics,
    'GRU': gru_metrics
}

# Print comparison table
print_metrics_table(results)

# Plot comparison
fig = compare_models(results)
plt.savefig('../data/analysis/model_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("Comparison plot saved to '../data/analysis/model_comparison.png'")

In [None]:
# Save results to JSON
results_json = {}
for model_name, metrics in results.items():
    results_json[model_name] = {k: float(v) for k, v in metrics.items()}

with open('../data/analysis/model_comparison_results.json', 'w') as f:
    json.dump(results_json, f, indent=2)

print("Results saved to '../data/analysis/model_comparison_results.json'")

## 7. Visualization of Predictions

Visual comparison of model predictions.

In [None]:
# Plot predictions for each model
models_to_plot = [
    ('ARIMA', y_test_arima.numpy(), y_pred_arima),
    ('Random Forest', y_true_rf, y_pred_rf),
    ('LSTM', y_true_lstm, y_pred_lstm),
    ('GRU', y_true_gru, y_pred_gru)
]

fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()

for i, (model_name, y_true, y_pred) in enumerate(models_to_plot):
    y_true_flat = y_true.flatten()
    y_pred_flat = y_pred.flatten()
    
    # Scatter plot
    axes[i].scatter(y_true_flat[:500], y_pred_flat[:500], alpha=0.5, s=20)
    axes[i].plot([y_true_flat.min(), y_true_flat.max()], 
                [y_true_flat.min(), y_true_flat.max()], 'r--', lw=2)
    axes[i].set_xlabel('True Volatility')
    axes[i].set_ylabel('Predicted Volatility')
    axes[i].set_title(f'{model_name} - Predictions vs True', fontweight='bold')
    axes[i].grid(True, alpha=0.3)
    
    # Add R¬≤ score
    r2 = results[model_name]['R2']
    axes[i].text(0.05, 0.95, f'R¬≤ = {r2:.4f}', 
                transform=axes[i].transAxes, 
                verticalalignment='top',
                bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.tight_layout()
plt.savefig('../data/analysis/predictions_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("Predictions plot saved to '../data/analysis/predictions_comparison.png'")

## 8. Training History (DL Models)

Visualize training progress for LSTM and GRU.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# LSTM history
axes[0].plot(lstm_history['train_loss'], label='Train Loss', linewidth=2)
axes[0].plot(lstm_history['test_loss'], label='Test Loss', linewidth=2)
axes[0].axvline(lstm_history['best_epoch'], color='r', linestyle='--', 
               label=f"Best Epoch ({lstm_history['best_epoch']})")
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss (MSE)')
axes[0].set_title('LSTM Training History', fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# GRU history
axes[1].plot(gru_history['train_loss'], label='Train Loss', linewidth=2)
axes[1].plot(gru_history['test_loss'], label='Test Loss', linewidth=2)
axes[1].axvline(gru_history['best_epoch'], color='r', linestyle='--',
               label=f"Best Epoch ({gru_history['best_epoch']})")
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Loss (MSE)')
axes[1].set_title('GRU Training History', fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../data/analysis/training_history.png', dpi=300, bbox_inches='tight')
plt.show()

print("Training history plot saved to '../data/analysis/training_history.png'")

## 9. Summary and Conclusions

### Best Performing Model

Based on the evaluation metrics, we can determine which model performs best for volatility prediction.

In [None]:
# Find best model by RMSE (lower is better)
rmse_scores = {name: metrics['RMSE'] for name, metrics in results.items()}
best_model = min(rmse_scores, key=rmse_scores.get)

print("="*80)
print("FINAL SUMMARY")
print("="*80)
print(f"\nüèÜ Best Model: {best_model}")
print(f"   RMSE: {rmse_scores[best_model]:.6f}")
print(f"   R¬≤:   {results[best_model]['R2']:.6f}")

print("\nüìä Model Rankings (by RMSE):")
for i, (model, rmse) in enumerate(sorted(rmse_scores.items(), key=lambda x: x[1]), 1):
    print(f"   {i}. {model:<20} RMSE: {rmse:.6f}")

print("\nüí° Key Findings:")
print(f"   - DL models (LSTM/GRU) typically outperform classical methods")
print(f"   - GRU offers similar performance to LSTM with {(1 - count_parameters(gru_model)/count_parameters(lstm_model))*100:.1f}% fewer parameters")
print(f"   - Random Forest provides strong baseline without temporal modeling")
print(f"   - ARIMA serves as statistical benchmark")

print("\nüìÅ All results saved to '../data/analysis/' directory")
print("="*80)

In [None]:
# Create summary report
summary = {
    'dataset': {
        'total_samples': len(dataset),
        'train_samples': len(train_dataset),
        'test_samples': len(test_dataset),
        'num_stocks': 98,
        'num_features': in_features,
        'past_window': 25,
        'future_window': 5
    },
    'models': {
        model: {
            'metrics': {k: float(v) for k, v in metrics.items()},
            'parameters': count_parameters(lstm_model).item() if model == 'LSTM'
                         else count_parameters(gru_model).item() if model == 'GRU'
                         else 'N/A'
        }
        for model, metrics in results.items()
    },
    'best_model': best_model,
    'timestamp': pd.Timestamp.now().isoformat()
}

with open('../data/analysis/experiment_summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

print("Experiment summary saved to '../data/analysis/experiment_summary.json'")