# LSTM with Walk-Forward Validation

**Objective**: Implement LSTM deep learning model with proper walk-forward validation

**Key Differences from Standard LSTM**:
- ✅ **Walk-forward validation**: Re-train model incrementally on expanding window
- ✅ **Realistic forecasting**: Simulates real-world deployment scenario
- ✅ **No future data leakage**: Model only sees past data at each prediction step

**Training Strategy**:
1. Start with initial training window (e.g., 80% of data)
2. Predict next 5 days
3. Add actual observed data to training set
4. Re-train model (or update with new data)
5. Repeat until end of dataset

**Trade-offs**:
- ⚠️ **Computationally expensive**: Multiple model trainings required
- ⚠️ **Time-consuming**: Can take hours depending on dataset size
- ✅ **More realistic**: Better reflects production performance

## 1. Import Libraries

In [None]:
# Data manipulation
import os
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.notebook import tqdm

# Deep Learning
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.preprocessing import MinMaxScaler

# Metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Warnings
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Plot settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
plt.rcParams['figure.figsize'] = (14, 6)

print("✓ Libraries imported successfully")
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")

✓ Libraries imported successfully
TensorFlow version: 2.16.1
GPU Available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


## 2. Load and Prepare Data

In [27]:
# Load dataset
df = pd.read_csv('../data/gold_silver.csv')

# Convert to datetime
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.sort_values('DATE')
df.set_index('DATE', inplace=True)

# Calculate log returns
df['GOLD_LOG_RETURN'] = np.log(df['GOLD_PRICE']) - np.log(df['GOLD_PRICE'].shift(1))
df = df.dropna(subset=['GOLD_LOG_RETURN'])

print(f"Dataset: {len(df)} observations")
print(f"Date range: {df.index.min()} to {df.index.max()}")

Dataset: 10570 observations
Date range: 1985-01-03 00:00:00 to 2025-09-10 00:00:00


## 3. Configuration Parameters

In [28]:
# Walk-forward validation parameters
lookback = 20  # Past days to use for prediction
forecast_horizon = 5  # Days ahead to forecast
initial_train_size = int(len(df) * 0.7)  # Start with 70% for initial training
retrain_frequency = 20  # Re-train model every N observations

# Model parameters
lstm_units_1 = 64
lstm_units_2 = 32
dense_units = 16
dropout_rate = 0.2
learning_rate = 0.001
epochs = 50  # Reduced for faster iteration
batch_size = 32

print("Walk-Forward Configuration:")
print(f"  Initial training size: {initial_train_size} observations")
print(f"  Lookback window: {lookback} days")
print(f"  Forecast horizon: {forecast_horizon} days")
print(f"  Retrain frequency: every {retrain_frequency} observations")
print(f"\nTest set size: {len(df) - initial_train_size - lookback} observations")
print(f"Expected retraining cycles: ~{(len(df) - initial_train_size - lookback) // retrain_frequency}")

Walk-Forward Configuration:
  Initial training size: 7398 observations
  Lookback window: 20 days
  Forecast horizon: 5 days
  Retrain frequency: every 20 observations

Test set size: 3152 observations
Expected retraining cycles: ~157


## 4. Helper Functions

In [29]:
def create_sequences(data, lookback, forecast_horizon):
    """
    Create sequences for LSTM
    X: [samples, lookback, features]
    y: [samples, forecast_horizon]
    """
    X, y = [], []
    for i in range(len(data) - lookback - forecast_horizon + 1):
        X.append(data[i:i+lookback])
        y.append(data[i+lookback:i+lookback+forecast_horizon])
    return np.array(X), np.array(y)

def build_lstm_model(lookback, forecast_horizon, lstm_units_1, lstm_units_2, dense_units, dropout_rate, learning_rate):
    """
    Build and compile LSTM model
    """
    model = Sequential([
        LSTM(lstm_units_1, activation='tanh', return_sequences=True, input_shape=(lookback, 1)),
        Dropout(dropout_rate),
        LSTM(lstm_units_2, activation='tanh', return_sequences=False),
        Dropout(dropout_rate),
        Dense(dense_units, activation='relu'),
        Dropout(dropout_rate/2),
        Dense(forecast_horizon, activation='linear')
    ])
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss='mse',
        metrics=['mae']
    )
    
    return model

def scale_data(X_train, y_train, X_val=None, y_val=None):
    """
    Scale data using MinMaxScaler fitted on training data
    """
    scaler_X = MinMaxScaler(feature_range=(-1, 1))
    scaler_y = MinMaxScaler(feature_range=(-1, 1))
    
    X_train_scaled = scaler_X.fit_transform(X_train.reshape(-1, 1)).reshape(X_train.shape)
    y_train_scaled = scaler_y.fit_transform(y_train)
    
    if X_val is not None and y_val is not None:
        X_val_scaled = scaler_X.transform(X_val.reshape(-1, 1)).reshape(X_val.shape)
        y_val_scaled = scaler_y.transform(y_val)
        return X_train_scaled, y_train_scaled, X_val_scaled, y_val_scaled, scaler_X, scaler_y
    
    return X_train_scaled, y_train_scaled, scaler_X, scaler_y

print("✓ Helper functions defined")

✓ Helper functions defined


## 5. Walk-Forward Validation Implementation

In [30]:
# Extract log returns
log_returns = df['GOLD_LOG_RETURN'].values
prices = df['GOLD_PRICE'].values

# Storage for predictions and actuals
all_predictions = []
all_actuals = []
forecast_dates = []

# Initialize model (will be retrained during walk-forward)
model = None
scaler_X = None
scaler_y = None

# Walk-forward loop
print("\n" + "="*80)
print("STARTING WALK-FORWARD VALIDATION")
print("="*80)
print("⚠️  This process may take 30-60 minutes depending on your hardware\n")

steps_completed = 0
total_steps = len(df) - initial_train_size - lookback - forecast_horizon + 1

for i in tqdm(range(initial_train_size, len(df) - lookback - forecast_horizon + 1), 
              desc="Walk-Forward Progress"):
    
    # 1. Extract training data (expanding window)
    train_data = log_returns[:i]
    
    # 2. Create sequences
    X_train, y_train = create_sequences(train_data, lookback, forecast_horizon)
    
    # 3. Check if we need to retrain
    should_retrain = (model is None) or (steps_completed % retrain_frequency == 0)
    
    if should_retrain:
        # Scale data
        X_train_scaled, y_train_scaled, scaler_X, scaler_y = scale_data(X_train, y_train)
        
        # Build new model
        model = build_lstm_model(lookback, forecast_horizon, lstm_units_1, 
                                lstm_units_2, dense_units, dropout_rate, learning_rate)
        
        # Train with early stopping
        early_stop = EarlyStopping(monitor='loss', patience=10, restore_best_weights=True, verbose=0)
        
        model.fit(
            X_train_scaled,
            y_train_scaled,
            epochs=epochs,
            batch_size=batch_size,
            callbacks=[early_stop],
            verbose=0
        )
    
    # 4. Prepare input for prediction (last 'lookback' days)
    X_pred = log_returns[i-lookback:i].reshape(1, lookback, 1)
    X_pred_scaled = scaler_X.transform(X_pred.reshape(-1, 1)).reshape(X_pred.shape)
    
    # 5. Make prediction
    y_pred_scaled = model.predict(X_pred_scaled, verbose=0)
    y_pred = scaler_y.inverse_transform(y_pred_scaled)
    
    # 6. Get actual values
    y_actual = log_returns[i:i+forecast_horizon]
    
    # 7. Convert log returns to prices
    last_known_price = prices[i-1]
    
    for j in range(forecast_horizon):
        # Predict price using log return
        pred_price = last_known_price * np.exp(y_pred[0, j])
        actual_price = prices[i + j]
        
        all_predictions.append(pred_price)
        all_actuals.append(actual_price)
        forecast_dates.append(df.index[i + j])
        
        # Use ACTUAL price for next step (no look-ahead bias in evaluation)
        if j < forecast_horizon - 1:
            last_known_price = actual_price
    
    steps_completed += 1

print(f"\n✓ Walk-forward validation completed!")
print(f"  Total forecasts generated: {len(all_predictions)}")
print(f"  Models trained: {steps_completed // retrain_frequency + 1}")


STARTING WALK-FORWARD VALIDATION
⚠️  This process may take 30-60 minutes depending on your hardware



ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

## 6. Evaluate Performance

In [None]:
# Calculate metrics
rmse = np.sqrt(mean_squared_error(all_actuals, all_predictions))
mae = mean_absolute_error(all_actuals, all_predictions)

# Calculate naive baseline for comparison
naive_predictions = []
for i in range(len(forecast_dates)):
    # Find the corresponding date in original dataframe
    date_idx = df.index.get_loc(forecast_dates[i])
    naive_predictions.append(df['GOLD_PRICE'].iloc[date_idx - 1])  # Previous day's price

rmse_naive = np.sqrt(mean_squared_error(all_actuals, naive_predictions))
mae_naive = mean_absolute_error(all_actuals, naive_predictions)

print("="*80)
print("WALK-FORWARD LSTM RESULTS")
print("="*80)
print(f"\nLSTM Walk-Forward:")
print(f"  RMSE: ${rmse:.2f}")
print(f"  MAE:  ${mae:.2f}")
print(f"\nNaive Baseline:")
print(f"  RMSE: ${rmse_naive:.2f}")
print(f"  MAE:  ${mae_naive:.2f}")
print(f"\nImprovement vs Naive:")
print(f"  RMSE: {(1 - rmse/rmse_naive)*100:+.2f}%")
print(f"  MAE:  {(1 - mae/mae_naive)*100:+.2f}%")
print("="*80)

# Load previous model results for comparison
try:
    lstm_basic_results = pd.read_csv('../models/lstm-deep-learning/results.csv')
    arima_results = pd.read_csv('../models/arima-baseline/results.csv')
    arima_garch_results = pd.read_csv('../models/arima-garch-hybrid/results.csv')
    
    print("\n" + "="*80)
    print("COMPARISON WITH OTHER MODELS")
    print("="*80)
    print(f"\nLSTM Walk-Forward:  RMSE=${rmse:.2f}, MAE=${mae:.2f}")
    print(f"LSTM Basic:         RMSE=${lstm_basic_results['rmse'].values[0]:.2f}, MAE=${lstm_basic_results['mae'].values[0]:.2f}")
    print(f"ARIMA-GARCH:        RMSE=${arima_garch_results['rmse'].values[0]:.2f}, MAE=${arima_garch_results['mae'].values[0]:.2f}")
    print(f"ARIMA Baseline:     RMSE=${arima_results['rmse'].values[0]:.2f}, MAE=${arima_results['mae'].values[0]:.2f}")
    print("="*80)
except:
    print("\n⚠ Could not load previous model results for comparison")

## 7. Visualize Forecasts

In [None]:
# Plot predictions vs actuals
fig, axes = plt.subplots(2, 1, figsize=(16, 10))

# Price forecasts over time
axes[0].plot(forecast_dates, all_actuals, label='Actual Price', color='black', linewidth=1.5, alpha=0.8)
axes[0].plot(forecast_dates, all_predictions, label='LSTM Walk-Forward Forecast', color='blue', linewidth=1, alpha=0.7)
axes[0].set_title('LSTM Walk-Forward: Gold Price Forecasts (5-Day Ahead)', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Price (USD)', fontsize=11)
axes[0].set_xlabel('Date', fontsize=11)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Forecast errors
errors = np.array(all_actuals) - np.array(all_predictions)
axes[1].plot(forecast_dates, errors, color='red', linewidth=1, alpha=0.7)
axes[1].axhline(y=0, color='black', linestyle='--', linewidth=1.5)
axes[1].fill_between(forecast_dates, errors, 0, alpha=0.3, color='red')
axes[1].set_title('Forecast Errors Over Time', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Error (USD)', fontsize=11)
axes[1].set_xlabel('Date', fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nForecast Period: {forecast_dates[0].strftime('%Y-%m-%d')} to {forecast_dates[-1].strftime('%Y-%m-%d')}")

## 8. Error Analysis

In [None]:
# Error distribution
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Histogram
axes[0].hist(errors, bins=60, edgecolor='black', alpha=0.7, color='blue')
axes[0].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[0].set_title('Distribution of Forecast Errors', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Error (USD)', fontsize=11)
axes[0].set_ylabel('Frequency', fontsize=11)
axes[0].grid(True, alpha=0.3)

# Q-Q plot
from scipy import stats
stats.probplot(errors, dist="norm", plot=axes[1])
axes[1].set_title('Q-Q Plot (Normality Check)', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)

# Absolute errors over time
abs_errors = np.abs(errors)
axes[2].plot(forecast_dates, abs_errors, color='orange', linewidth=1, alpha=0.7)
axes[2].axhline(y=mae, color='red', linestyle='--', linewidth=2, label=f'Mean: ${mae:.2f}')
axes[2].set_title('Absolute Errors Over Time', fontsize=12, fontweight='bold')
axes[2].set_xlabel('Date', fontsize=11)
axes[2].set_ylabel('Absolute Error (USD)', fontsize=11)
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nError Statistics:")
print(f"  Mean:     ${np.mean(errors):.2f}")
print(f"  Std Dev:  ${np.std(errors):.2f}")
print(f"  Median:   ${np.median(errors):.2f}")
print(f"  Min:      ${np.min(errors):.2f}")
print(f"  Max:      ${np.max(errors):.2f}")
print(f"  Skewness: {stats.skew(errors):.4f}")
print(f"  Kurtosis: {stats.kurtosis(errors):.4f}")

## 9. Save Results

In [None]:
# Create directory for saving results
model_dir = '../models/lstm-walk-forward'
os.makedirs(model_dir, exist_ok=True)

# Save results
results = {
    'model': 'LSTM-WalkForward',
    'rmse': rmse,
    'mae': mae,
    'rmse_naive': rmse_naive,
    'mae_naive': mae_naive,
    'n_predictions': len(all_predictions),
    'lookback': lookback,
    'forecast_horizon': forecast_horizon,
    'initial_train_size': initial_train_size,
    'retrain_frequency': retrain_frequency,
    'lstm_units_1': lstm_units_1,
    'lstm_units_2': lstm_units_2,
    'epochs': epochs
}

results_df = pd.DataFrame([results])
results_df.to_csv(f'{model_dir}/results.csv', index=False)

# Save detailed predictions
predictions_df = pd.DataFrame({
    'date': forecast_dates,
    'actual': all_actuals,
    'predicted': all_predictions,
    'error': errors,
    'abs_error': abs_errors
})
predictions_df.to_csv(f'{model_dir}/predictions.csv', index=False)

# Save final model
model.save(f'{model_dir}/model_final.h5')

print(f"✓ Results saved to '{model_dir}/'")
print("\nFiles created:")
print("  - results.csv (summary metrics)")
print("  - predictions.csv (detailed forecasts)")
print("  - model_final.h5 (final trained model)")

## 10. Final Comparison with All Models

In [None]:
# Create comprehensive comparison table
try:
    # Load all model results
    lstm_basic = pd.read_csv('../models/lstm-deep-learning/results.csv')
    arima_baseline = pd.read_csv('../models/arima-baseline/results.csv')
    arima_garch = pd.read_csv('../models/arima-garch-hybrid/results.csv')
    
    comparison_data = {
        'Model': ['Naive Baseline', 'ARIMA', 'ARIMA-GARCH', 'LSTM (Basic)', 'LSTM (Walk-Forward)'],
        'RMSE ($)': [
            rmse_naive,
            arima_baseline['rmse'].values[0],
            arima_garch['rmse'].values[0],
            lstm_basic['rmse'].values[0],
            rmse
        ],
        'MAE ($)': [
            mae_naive,
            arima_baseline['mae'].values[0],
            arima_garch['mae'].values[0],
            lstm_basic['mae'].values[0],
            mae
        ],
        'Type': ['Statistical', 'Statistical', 'Statistical', 'Deep Learning', 'Deep Learning'],
        'Validation': ['N/A', 'Walk-Forward', 'Walk-Forward', 'Train-Test Split', 'Walk-Forward'],
        'Volatility': ['No', 'No', 'Yes', 'No', 'No'],
        'Interpretability': ['High', 'High', 'Medium', 'Low', 'Low']
    }
    
    comparison_df = pd.DataFrame(comparison_data)
    comparison_df = comparison_df.sort_values('RMSE ($)')
    
    print("="*90)
    print("FINAL MODEL COMPARISON - GOLDENHOUR PROJECT")
    print("="*90)
    print(comparison_df.to_string(index=False))
    print("="*90)
    
    # Save comparison
    comparison_dir = '../models/comparison'
    os.makedirs(comparison_dir, exist_ok=True)
    comparison_df.to_csv(f'{comparison_dir}/model_comparison_full.csv', index=False)
    print(f"\n✓ Comparison saved to {comparison_dir}/model_comparison_full.csv")
    
except Exception as e:
    print(f"⚠ Could not create full comparison: {e}")
    print("\nLSTM Walk-Forward Results:")
    print(f"  RMSE: ${rmse:.2f}")
    print(f"  MAE:  ${mae:.2f}")

## 11. Key Findings

### Walk-Forward Validation Benefits

**Advantages**:
1. ✅ **Realistic Performance**: Simulates real-world deployment scenario
2. ✅ **No Future Data Leakage**: Model only sees past data at each step
3. ✅ **Adapts to Regime Changes**: Re-training captures market dynamics
4. ✅ **More Reliable Estimates**: Better reflects production performance

**Disadvantages**:
1. ⚠️ **Computationally Expensive**: Multiple model trainings required
2. ⚠️ **Time-Consuming**: Can take hours for large datasets
3. ⚠️ **Complex Implementation**: More difficult to debug and maintain

### Performance Comparison

**Expected Outcome**: Walk-forward LSTM should show:
- **Higher errors** than basic train-test split (more realistic)
- **More stable predictions** over time
- **Better generalization** to unseen market conditions

### Recommendations for Academic Project

**For thesis/paper**:
1. **Primary Model**: ARIMA-GARCH (theoretical foundation, interpretability)
2. **Comparison Model**: LSTM Walk-Forward (demonstrate modern techniques)
3. **Baseline**: Naive random walk (industry standard benchmark)

**Justification**:
- ARIMA-GARCH: Established econometric methodology, widely accepted in academia
- LSTM Walk-Forward: Shows awareness of ML best practices and realistic evaluation
- Both use walk-forward validation for fair comparison