# ML Model Factory - Complete Pipeline & Training

This notebook runs the **complete ML pipeline** from raw data to trained models.

## What This Notebook Does
1. **Setup** - Mount Drive (for results), clone GitHub repo (for code & data)
2. **Phase 1** - Data pipeline (clean -> features -> labels -> splits)
3. **Phase 2** - Model training (single or multiple models)
4. **Phase 3** - Cross-validation (optional)
5. **Phase 4** - Ensemble training (optional)

## Data Flow
- **Data Source:** `/content/research/` (cloned from GitHub)
- **Results Saved:** `/content/drive/MyDrive/research/` (Google Drive for persistence)

## Quick Start
1. Run cells in order (or use Runtime -> Run all)
2. Data is loaded from the GitHub clone
3. Results are saved to Google Drive for persistence

---

## 1. Environment Setup

In [None]:
#@title 1.1 Mount Google Drive & Clone Repository { display-mode: "form" }
#@markdown Run this cell to mount your Google Drive and set up the project.

import os
import sys
from pathlib import Path

# Mount Google Drive (for saving results only)
from google.colab import drive
drive.mount('/content/drive')

# Clone or pull repository
if not Path('/content/research').exists():
    print("Cloning repository...")
    !git clone https://github.com/Snehpatel101/research.git /content/research
else:
    print("Pulling latest changes...")
    !cd /content/research && git pull

# Change to project directory
os.chdir('/content/research')

# Create Drive directories for saving results
for d in ["experiments/runs", "results"]:
    Path('/content/drive/MyDrive/research', d).mkdir(parents=True, exist_ok=True)

print("\n" + "=" * 60)
print(" PATH CONFIGURATION")
print("=" * 60)
print(f"\nProject directory: {os.getcwd()}")
print(f"Data source: /content/research (from GitHub)")
print(f"Results saved to: /content/drive/MyDrive/research (Google Drive)")
print("=" * 60)

In [None]:
#@title 1.2 Install Dependencies { display-mode: "form" }
#@markdown Installs all required packages for the ML pipeline.

import sys

# Add project to Python path
sys.path.insert(0, '/content/research')

# Install required packages
!pip install xgboost lightgbm catboost optuna ta pywavelets scikit-learn pandas numpy -q

# Verify PyTorch with CUDA
import torch
if torch.cuda.is_available():
    print(f"PyTorch: {torch.__version__} with CUDA {torch.version.cuda}")
else:
    print(f"PyTorch: {torch.__version__} (CPU only)")

print(f"\nProject path added: /content/research")
print("Dependencies installed!")

In [None]:
#@title 1.3 Detect Hardware & Configure { display-mode: "form" }
#@markdown Detects GPU and configures optimal settings.

import torch
import platform

print("=" * 60)
print(" HARDWARE DETECTION")
print("=" * 60)

# System info
print(f"\nSystem: {platform.system()} {platform.release()}")
print(f"Python: {sys.version.split()[0]}")

# GPU detection
GPU_AVAILABLE = torch.cuda.is_available()
GPU_NAME = None
GPU_MEMORY = 0
RECOMMENDED_BATCH_SIZE = 256
MIXED_PRECISION = False

if GPU_AVAILABLE:
    props = torch.cuda.get_device_properties(0)
    GPU_NAME = props.name
    GPU_MEMORY = props.total_memory / (1024**3)
    
    print(f"\nGPU: {GPU_NAME}")
    print(f"Memory: {GPU_MEMORY:.1f} GB")
    print(f"Compute Capability: {props.major}.{props.minor}")
    
    if GPU_MEMORY >= 40:  # A100
        RECOMMENDED_BATCH_SIZE = 1024
        MIXED_PRECISION = True
    elif GPU_MEMORY >= 15:  # T4/V100
        RECOMMENDED_BATCH_SIZE = 512
        MIXED_PRECISION = True
    else:
        RECOMMENDED_BATCH_SIZE = 256
        MIXED_PRECISION = props.major >= 7
    
    print(f"\nRecommended batch size: {RECOMMENDED_BATCH_SIZE}")
    print(f"Mixed precision: {'Enabled' if MIXED_PRECISION else 'Disabled'}")
else:
    print("\nNo GPU detected - will use CPU")
    print("Tip: Runtime -> Change runtime type -> GPU")

# Verify model registry
print("\n" + "=" * 60)
print(" AVAILABLE MODELS")
print("=" * 60)

try:
    from src.models import ModelRegistry
    models = ModelRegistry.list_models()
    for family, model_list in models.items():
        print(f"\n{family.upper()}:")
        for m in model_list:
            gpu_req = "GPU" if m in ['lstm', 'gru', 'tcn'] else "CPU"
            print(f"  - {m} ({gpu_req})")
except Exception as e:
    print(f"Error loading models: {e}")

print("\n" + "=" * 60)

---
## 3. Phase 1: Data Pipeline

In [None]:
#@title 3.1 Configure Pipeline { display-mode: "form" }
#@markdown Configure the data processing pipeline.

#@markdown ### Symbol Selection
symbols = "MES"  #@param {type: "string"}
#@markdown Comma-separated symbols (e.g., "MES,MGC")

#@markdown ### Label Horizons
horizons = "5,10,15,20"  #@param {type: "string"}
#@markdown Comma-separated horizons (bars ahead)

#@markdown ### Train/Val/Test Split
train_ratio = 0.70  #@param {type: "slider", min: 0.5, max: 0.8, step: 0.05}
val_ratio = 0.15  #@param {type: "slider", min: 0.1, max: 0.25, step: 0.05}

# Parse inputs
SYMBOLS = [s.strip().upper() for s in symbols.split(',')]
HORIZONS = [int(h.strip()) for h in horizons.split(',')]
TRAIN_RATIO = train_ratio
VAL_RATIO = val_ratio
TEST_RATIO = round(1.0 - train_ratio - val_ratio, 2)

print("Pipeline Configuration:")
print(f"  Symbols: {SYMBOLS}")
print(f"  Horizons: {HORIZONS}")
print(f"  Train/Val/Test: {TRAIN_RATIO}/{VAL_RATIO}/{TEST_RATIO}")

In [None]:
#@title 3.2 Run Data Pipeline OR Use Existing Data { display-mode: "form" }
#@markdown Choose whether to run the full pipeline or use existing processed data.

data_source = "Use existing processed data"  #@param ["Run full pipeline (requires raw data)", "Use existing processed data"]

from pathlib import Path
import time

# CORRECT: Data from GitHub clone (not Google Drive)
splits_dir = Path('/content/research/data/splits/scaled')
train_file = splits_dir / "train_scaled.parquet"

if data_source == "Use existing processed data":
    if train_file.exists():
        import pandas as pd
        train_df = pd.read_parquet(train_file)
        val_df = pd.read_parquet(splits_dir / "val_scaled.parquet")
        test_df = pd.read_parquet(splits_dir / "test_scaled.parquet")
        
        print("Found existing processed data!")
        print(f"  Location: {splits_dir}")
        print(f"  Train: {len(train_df):,} samples")
        print(f"  Val: {len(val_df):,} samples")
        print(f"  Test: {len(test_df):,} samples")
        print("\nSkipping pipeline - proceeding to model training!")
    else:
        print("ERROR: Processed data not found!")
        print(f"  Expected: {splits_dir}/")
        print("\nMake sure the GitHub repo contains processed data files:")
        print("  - train_scaled.parquet")
        print("  - val_scaled.parquet")
        print("  - test_scaled.parquet")
else:
    raw_dir = Path('/content/research/data/raw')
    raw_files = list(raw_dir.glob("*.parquet")) + list(raw_dir.glob("*.csv")) if raw_dir.exists() else []
    
    if not raw_files:
        print("ERROR: No raw data files found!")
        print(f"  Expected: {raw_dir}/MES_1m.parquet or .csv")
    else:
        print("Running Phase 1 Data Pipeline...")
        print("=" * 60)
        start_time = time.time()
        
        try:
            from src.phase1.pipeline_config import PipelineConfig
            from src.pipeline.runner import PipelineRunner
            
            config = PipelineConfig(
                symbols=SYMBOLS,
                project_root=Path('/content/research'),
                label_horizons=HORIZONS,
                train_ratio=TRAIN_RATIO,
                val_ratio=VAL_RATIO,
                test_ratio=TEST_RATIO,
            )
            
            runner = PipelineRunner(config)
            success = runner.run()
            
            elapsed = time.time() - start_time
            print("\n" + "=" * 60)
            if success:
                print(f"Pipeline completed in {elapsed/60:.1f} minutes!")
            else:
                print("Pipeline failed. Check errors above.")
        except Exception as e:
            print(f"\nError: {e}")
            import traceback
            traceback.print_exc()

In [None]:
#@title 3.3 Verify Processed Data { display-mode: "form" }
#@markdown Loads and displays the processed datasets.

import pandas as pd
from pathlib import Path

# CORRECT: Data from GitHub clone (not Google Drive)
splits_dir = Path('/content/research/data/splits/scaled')

print("Loading processed datasets...")
print("=" * 60)

try:
    train_df = pd.read_parquet(splits_dir / "train_scaled.parquet")
    val_df = pd.read_parquet(splits_dir / "val_scaled.parquet")
    test_df = pd.read_parquet(splits_dir / "test_scaled.parquet")
    
    print(f"\nDataset sizes:")
    print(f"  Train: {len(train_df):,} samples")
    print(f"  Val:   {len(val_df):,} samples")
    print(f"  Test:  {len(test_df):,} samples")
    print(f"  Total: {len(train_df) + len(val_df) + len(test_df):,} samples")
    
    feature_cols = [c for c in train_df.columns if not c.startswith(('label_', 'sample_weight', 'quality_score', 'datetime', 'symbol'))]
    label_cols = [c for c in train_df.columns if c.startswith('label_')]
    
    print(f"\nFeatures: {len(feature_cols)}")
    print(f"Labels: {label_cols}")
    
    print(f"\nLabel distribution (train):")
    for col in label_cols:
        dist = train_df[col].value_counts().sort_index()
        print(f"  {col}: Long={dist.get(1, 0):,} | Neutral={dist.get(0, 0):,} | Short={dist.get(-1, 0):,}")
    
    TRAIN_DF = train_df
    VAL_DF = val_df
    TEST_DF = test_df
    FEATURE_COLS = feature_cols
    
    print("\nData ready for model training!")
    
except FileNotFoundError:
    print("Processed data not found. Run Section 3.2 first.")

In [None]:
#@title 3.4 Analyze Label Balance & Recommend Horizon { display-mode: "form" }
#@markdown Analyzes class distribution across all horizons and recommends the best horizon for training.

import matplotlib.pyplot as plt
import numpy as np

print("=" * 60)
print(" LABEL BALANCE ANALYSIS")
print("=" * 60)

# Analyze each horizon
horizon_stats = {}
for h in [5, 10, 15, 20]:
    col = f'label_h{h}'
    if col in TRAIN_DF.columns:
        counts = TRAIN_DF[col].value_counts().sort_index()
        short = counts.get(-1, 0)
        neutral = counts.get(0, 0)
        long = counts.get(1, 0)
        total = short + neutral + long
        
        # Imbalance ratio (max/min)
        min_class = min(short, neutral, long)
        max_class = max(short, neutral, long)
        imbalance = max_class / min_class if min_class > 0 else float('inf')
        
        horizon_stats[h] = {
            'short': short, 'neutral': neutral, 'long': long,
            'imbalance': imbalance
        }
        
        print(f"\nH{h}:")
        print(f"  Short: {short:,} ({short/total*100:.1f}%)")
        print(f"  Neutral: {neutral:,} ({neutral/total*100:.1f}%)")
        print(f"  Long: {long:,} ({long/total*100:.1f}%)")
        print(f"  Imbalance Ratio: {imbalance:.1f}x")

# Find best horizon (lowest imbalance)
best_h = min(horizon_stats, key=lambda h: horizon_stats[h]['imbalance'])
print(f"\n{'='*60}")
print(f" RECOMMENDATION: Use H{best_h} (lowest imbalance: {horizon_stats[best_h]['imbalance']:.1f}x)")
print(f"{'='*60}")

# Warning for H20
if 20 in horizon_stats and horizon_stats[20]['imbalance'] > 100:
    print("\n[WARNING] H20 has severe class imbalance!")
    print("  The Short class is extremely underrepresented.")
    print("  This will cause poor Macro F1 scores.")
    print("  Consider using a shorter horizon or applying class balancing.")

# Plot distributions
fig, axes = plt.subplots(1, 4, figsize=(16, 4))
for i, h in enumerate([5, 10, 15, 20]):
    if h in horizon_stats:
        stats = horizon_stats[h]
        colors = ['#d62728', '#7f7f7f', '#2ca02c']  # red, gray, green
        bars = axes[i].bar(['Short', 'Neutral', 'Long'], 
                    [stats['short'], stats['neutral'], stats['long']], 
                    color=colors)
        axes[i].set_title(f'H{h} (imbalance: {stats["imbalance"]:.1f}x)')
        axes[i].set_ylabel('Count')
        
        # Add count labels on bars
        for bar, count in zip(bars, [stats['short'], stats['neutral'], stats['long']]):
            height = bar.get_height()
            axes[i].annotate(f'{count:,}',
                           xy=(bar.get_x() + bar.get_width() / 2, height),
                           xytext=(0, 3),
                           textcoords="offset points",
                           ha='center', va='bottom', fontsize=8)

plt.suptitle('Label Distribution by Horizon', fontsize=12, fontweight='bold')
plt.tight_layout()
plt.show()

# Store recommendation for downstream cells
RECOMMENDED_HORIZON = best_h
print(f"\nRecommended horizon stored in: RECOMMENDED_HORIZON = {RECOMMENDED_HORIZON}")

---
## 4. Phase 2: Model Training

In [None]:
#@title 4.0 Check Previous Runs & Recovery { display-mode: "form" }
#@markdown Check for previous training runs and recover if needed.

from pathlib import Path
import json

experiments_dir = Path('/content/drive/MyDrive/research/experiments/runs')
results_cache = Path('/content/drive/MyDrive/research/experiments/.training_results_cache.json')

print("=" * 60)
print(" PREVIOUS TRAINING RUNS")
print("=" * 60)

# Try to recover TRAINING_RESULTS from cache
TRAINING_RESULTS = {}
if results_cache.exists():
    try:
        with open(results_cache) as f:
            TRAINING_RESULTS = json.load(f)
        print(f"\n[RECOVERED] Loaded {len(TRAINING_RESULTS)} model(s) from cache:")
        for model, data in TRAINING_RESULTS.items():
            metrics = data.get('metrics', {})
            acc = metrics.get('accuracy', 0)
            f1 = metrics.get('macro_f1', 0)
            print(f"  - {model}: Accuracy={acc:.2%}, Macro F1={f1:.4f}")
        print("\nYou can skip to section 4.3 to compare results!")
    except Exception as e:
        print(f"\n[ERROR] Could not load cache: {e}")
        TRAINING_RESULTS = {}

# List all training runs
if experiments_dir.exists():
    runs = sorted([d for d in experiments_dir.iterdir() if d.is_dir()], 
                  key=lambda x: x.name, reverse=True)
    
    if runs:
        print(f"\nFound {len(runs)} previous run(s) on disk:\n")
        for i, run in enumerate(runs[:10]):  # Show last 10
            # Try to load metrics
            metrics_file = run / 'metrics' / 'evaluation_metrics.json'
            if metrics_file.exists():
                with open(metrics_file) as f:
                    metrics = json.load(f)
                acc = metrics.get('accuracy', 0)
                f1 = metrics.get('macro_f1', 0)
                print(f"  {i+1}. {run.name}")
                print(f"      Accuracy: {acc:.2%}, Macro F1: {f1:.4f}")
            else:
                print(f"  {i+1}. {run.name} (no metrics)")
        
        # Recovery option
        print(f"\nTo manually load a previous model:")
        print(f"  from src.models import ModelRegistry")
        print(f"  model = ModelRegistry.create('xgboost')")
        print(f"  model.load('{runs[0]}/checkpoints/best_model')")
    else:
        print("\nNo previous runs found.")
else:
    print("\nExperiments directory not found.")
    print("Train your first model in section 4.2!")

print("=" * 60)

In [None]:
#@title 4.1 Training Mode Selection { display-mode: "form" }
#@markdown Choose your training mode and models.

training_mode = "Single Model"  #@param ["Single Model", "Multi-Model (Sequential)"]

#@markdown ---
#@markdown ### Single Model Options
single_model = "xgboost"  #@param ["xgboost", "lightgbm", "catboost", "random_forest", "logistic", "svm", "lstm", "gru", "tcn"]

#@markdown ---
#@markdown ### Multi-Model Options
train_boosting = True  #@param {type: "boolean"}
#@markdown XGBoost, LightGBM, CatBoost
train_classical = False  #@param {type: "boolean"}
#@markdown Random Forest, Logistic, SVM
train_neural = False  #@param {type: "boolean"}
#@markdown LSTM, GRU, TCN (requires GPU)

#@markdown ---
#@markdown ### Training Parameters
horizon = 20  #@param [5, 10, 15, 20]
sequence_length = 60  #@param {type: "slider", min: 30, max: 120, step: 10}

# Build model list
if training_mode == "Single Model":
    MODELS_TO_TRAIN = [single_model]
else:
    MODELS_TO_TRAIN = []
    if train_boosting:
        MODELS_TO_TRAIN.extend(['xgboost', 'lightgbm', 'catboost'])
    if train_classical:
        MODELS_TO_TRAIN.extend(['random_forest', 'logistic', 'svm'])
    if train_neural and GPU_AVAILABLE:
        MODELS_TO_TRAIN.extend(['lstm', 'gru', 'tcn'])
    elif train_neural and not GPU_AVAILABLE:
        print("WARNING: Neural models skipped (no GPU)")

HORIZON = horizon
SEQ_LEN = sequence_length

print(f"Training Mode: {training_mode}")
print(f"Models to train: {MODELS_TO_TRAIN}")
print(f"Horizon: H{HORIZON}")
if any(m in ['lstm', 'gru', 'tcn'] for m in MODELS_TO_TRAIN):
    print(f"Sequence length: {SEQ_LEN}")

In [None]:
#@title 4.2 Train Models { display-mode: "form" }
#@markdown Execute model training based on your selections.

import time
from pathlib import Path
import json

print("=" * 60)
print(" MODEL TRAINING")
print("=" * 60)

TRAINING_RESULTS = {}

try:
    from src.models import ModelRegistry, Trainer, TrainerConfig
    from src.phase1.stages.datasets.container import TimeSeriesDataContainer
    
    # CORRECT: Load data from GitHub clone (not Google Drive)
    print(f"\nLoading data for horizon H{HORIZON}...")
    container = TimeSeriesDataContainer.from_parquet_dir(
        path=Path('/content/research/data/splits/scaled'),
        horizon=HORIZON
    )
    print(f"  Train samples: {container.splits['train'].n_samples:,}")
    print(f"  Val samples: {container.splits['val'].n_samples:,}")
    print(f"  Features: {container.n_features}")
    
    for i, model_name in enumerate(MODELS_TO_TRAIN, 1):
        print(f"\n{'='*60}")
        print(f" [{i}/{len(MODELS_TO_TRAIN)}] Training: {model_name.upper()}")
        print("=" * 60)
        
        start_time = time.time()
        
        # Configure - save results to Google Drive
        if model_name in ['lstm', 'gru', 'tcn']:
            config = TrainerConfig(
                model_name=model_name,
                horizon=HORIZON,
                sequence_length=SEQ_LEN,
                batch_size=RECOMMENDED_BATCH_SIZE,
                max_epochs=50,
                early_stopping_patience=10,
                output_dir=Path('/content/drive/MyDrive/research/experiments/runs'),
                device="cuda" if GPU_AVAILABLE else "cpu",
                mixed_precision=MIXED_PRECISION,
            )
        else:
            config = TrainerConfig(
                model_name=model_name,
                horizon=HORIZON,
                output_dir=Path('/content/drive/MyDrive/research/experiments/runs'),
            )
        
        trainer = Trainer(config)
        results = trainer.run(container)
        
        elapsed = time.time() - start_time
        
        TRAINING_RESULTS[model_name] = {
            'metrics': results.get('evaluation_metrics', {}),
            'time': elapsed,
            'run_id': results.get('run_id', 'unknown'),
        }
        
        metrics = results.get('evaluation_metrics', {})
        print(f"\n  Results:")
        print(f"    Accuracy: {metrics.get('accuracy', 0):.2%}")
        print(f"    Macro F1: {metrics.get('macro_f1', 0):.4f}")
        print(f"    Time: {elapsed:.1f}s")
        
        # Save cache after each model (in case of kernel restart)
        results_cache = Path('/content/drive/MyDrive/research/experiments/.training_results_cache.json')
        with open(results_cache, 'w') as f:
            json.dump(TRAINING_RESULTS, f, indent=2)
        
except Exception as e:
    print(f"\nError during training: {e}")
    import traceback
    traceback.print_exc()

# Final save
if TRAINING_RESULTS:
    results_cache = Path('/content/drive/MyDrive/research/experiments/.training_results_cache.json')
    with open(results_cache, 'w') as f:
        json.dump(TRAINING_RESULTS, f, indent=2)
    print(f"\nResults cached to: {results_cache}")

print("\n" + "=" * 60)
print(" TRAINING COMPLETE")
print("=" * 60)

In [None]:
#@title 4.3 Compare Results { display-mode: "form" }
#@markdown Display comparison of all trained models.

import pandas as pd
import matplotlib.pyplot as plt

if TRAINING_RESULTS:
    print("Model Comparison")
    print("=" * 60)
    
    rows = []
    for model, data in TRAINING_RESULTS.items():
        metrics = data['metrics']
        rows.append({
            'Model': model,
            'Accuracy': metrics.get('accuracy', 0),
            'Macro F1': metrics.get('macro_f1', 0),
            'Weighted F1': metrics.get('weighted_f1', 0),
            'Time (s)': data['time'],
        })
    
    comparison_df = pd.DataFrame(rows)
    comparison_df = comparison_df.sort_values('Macro F1', ascending=False)
    print(comparison_df.to_string(index=False))
    
    if len(TRAINING_RESULTS) > 1:
        fig, axes = plt.subplots(1, 2, figsize=(12, 4))
        
        comparison_df_sorted = comparison_df.sort_values('Accuracy', ascending=True)
        axes[0].barh(comparison_df_sorted['Model'], comparison_df_sorted['Accuracy'])
        axes[0].set_xlabel('Accuracy')
        axes[0].set_title('Model Accuracy Comparison')
        axes[0].set_xlim(0, 1)
        
        comparison_df_sorted = comparison_df.sort_values('Time (s)', ascending=True)
        axes[1].barh(comparison_df_sorted['Model'], comparison_df_sorted['Time (s)'])
        axes[1].set_xlabel('Training Time (seconds)')
        axes[1].set_title('Training Time Comparison')
        
        plt.tight_layout()
        plt.show()
    
    best_model = comparison_df.iloc[0]['Model']
    print(f"\nBest model: {best_model}")
else:
    print("No training results yet. Run Section 4.2 first.")

In [None]:
#@title 4.4 Evaluate on Test Set { display-mode: "form" }
#@markdown Evaluate the best model on the held-out test set.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score
from pathlib import Path

if not TRAINING_RESULTS:
    print("No trained models. Run section 4.2 first.")
else:
    # Find best model
    best_model_name = max(TRAINING_RESULTS, key=lambda m: TRAINING_RESULTS[m]['metrics'].get('macro_f1', 0))
    best_run_id = TRAINING_RESULTS[best_model_name].get('run_id', 'unknown')
    
    print("=" * 60)
    print(f" TEST SET EVALUATION: {best_model_name.upper()}")
    print("=" * 60)
    
    # Load test data
    from src.phase1.stages.datasets.container import TimeSeriesDataContainer
    container = TimeSeriesDataContainer.from_parquet_dir(
        path=Path('/content/research/data/splits/scaled'),
        horizon=HORIZON
    )
    X_test, y_test, _ = container.get_sklearn_arrays('test')
    
    # Load model and predict
    from src.models import ModelRegistry
    model = ModelRegistry.create(best_model_name)
    model_path = Path(f'/content/drive/MyDrive/research/experiments/runs/{best_run_id}/checkpoints/best_model')
    model.load(model_path)
    
    predictions = model.predict(X_test)
    y_pred = predictions.class_predictions
    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    macro_f1 = f1_score(y_test, y_pred, average='macro')
    
    print(f"\nTest Set Results:")
    print(f"  Samples: {len(y_test):,}")
    print(f"  Accuracy: {accuracy:.2%}")
    print(f"  Macro F1: {macro_f1:.4f}")
    
    # Classification report
    print(f"\nClassification Report:")
    class_names = ['Short', 'Neutral', 'Long']
    print(classification_report(y_test, y_pred, target_names=class_names))
    
    # Confusion matrix
    cm = confusion_matrix(y_test, y_pred)
    fig, ax = plt.subplots(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=class_names, yticklabels=class_names, ax=ax)
    ax.set_xlabel('Predicted')
    ax.set_ylabel('Actual')
    ax.set_title(f'Test Set Confusion Matrix - {best_model_name.upper()}')
    plt.tight_layout()
    plt.show()
    
    print(f"\nModel loaded from: {model_path}")

In [None]:
#@title 4.5 Trading Performance Metrics { display-mode: "form" }
#@markdown Calculate trading-relevant performance metrics.

import numpy as np
import pandas as pd

print("=" * 60)
print(" TRADING PERFORMANCE ANALYSIS")
print("=" * 60)

if 'y_pred' in dir() and 'y_test' in dir():
    # Win rate per class
    print("\n1. WIN RATE BY PREDICTED CLASS:")
    for pred_class, name in [(-1, 'Short'), (0, 'Neutral'), (1, 'Long')]:
        mask = y_pred == pred_class
        if mask.sum() > 0:
            correct = (y_test[mask] == pred_class).sum()
            total = mask.sum()
            win_rate = correct / total * 100
            print(f"   {name}: {correct}/{total} = {win_rate:.1f}%")
    
    # Directional accuracy (ignoring neutral)
    print("\n2. DIRECTIONAL ACCURACY (Long/Short only):")
    directional_mask = (y_pred != 0) & (y_test != 0)
    if directional_mask.sum() > 0:
        # Correct direction: both positive or both negative
        correct_direction = ((y_pred[directional_mask] > 0) == (y_test[directional_mask] > 0)).sum()
        total_directional = directional_mask.sum()
        dir_acc = correct_direction / total_directional * 100
        print(f"   Correct: {correct_direction}/{total_directional} = {dir_acc:.1f}%")
    
    # Profit factor (simplified: correct predictions = wins)
    print("\n3. TRADING STATISTICS:")
    wins = (y_pred == y_test).sum()
    losses = (y_pred != y_test).sum()
    print(f"   Wins: {wins:,}")
    print(f"   Losses: {losses:,}")
    print(f"   Win Rate: {wins/(wins+losses)*100:.1f}%")
    
    # Consecutive losses
    results = (y_pred == y_test).astype(int)
    max_consecutive_losses = 0
    current_losses = 0
    for r in results:
        if r == 0:
            current_losses += 1
            max_consecutive_losses = max(max_consecutive_losses, current_losses)
        else:
            current_losses = 0
    print(f"   Max Consecutive Losses: {max_consecutive_losses}")
    
    # Expected value (simplified)
    print("\n4. EXPECTED VALUE ANALYSIS:")
    print("   (Assuming +1 unit for correct, -1 for incorrect)")
    ev = (wins - losses) / len(y_test)
    print(f"   Expected Value per Trade: {ev:+.4f} units")
    print(f"   After 100 trades: {ev * 100:+.2f} units")
    
    # Profit factor
    if losses > 0:
        profit_factor = wins / losses
        print(f"   Profit Factor: {profit_factor:.2f}")
    else:
        print("   Profit Factor: Infinite (no losses)")
    
else:
    print("Run section 4.4 (Test Set Evaluation) first.")

print("\n" + "=" * 60)

---
## 5. Phase 3: Cross-Validation (Optional)

---
## 6. Phase 4: Ensemble Training (Optional)

In [None]:
#@title 6.1 Train Ensemble { display-mode: "form" }
#@markdown Combine multiple models into an ensemble for improved predictions.

train_ensemble = False  #@param {type: "boolean"}
ensemble_type = "blending"  #@param ["voting", "stacking", "blending"]
base_models = "xgboost,lightgbm,catboost"  #@param {type: "string"}
meta_learner = "logistic"  #@param ["logistic", "random_forest", "xgboost"]

if train_ensemble:
    import time
    from pathlib import Path
    
    # Parse base models
    base_model_list = [m.strip() for m in base_models.split(',')]
    
    print("=" * 60)
    print(f" {ensemble_type.upper()} ENSEMBLE TRAINING")
    print("=" * 60)
    print(f"Base models: {', '.join(base_model_list)}")
    print(f"Meta-learner: {meta_learner}")
    print()
    
    # Store results for comparison
    base_model_results = {}
    ensemble_start_time = time.time()
    
    try:
        from src.models import ModelRegistry, Trainer, TrainerConfig
        from src.phase1.stages.datasets.container import TimeSeriesDataContainer
        
        # Load data from GitHub clone
        print("Loading data...")
        container = TimeSeriesDataContainer.from_parquet_dir(
            path=Path('/content/research/data/splits/scaled'),
            horizon=HORIZON
        )
        print(f"  Samples: train={container.splits['train'].n_samples:,}, "
              f"val={container.splits['val'].n_samples:,}")
        print()
        
        # Train each base model with progress
        print("-" * 60)
        print(" Training Base Models")
        print("-" * 60)
        
        for i, model_name in enumerate(base_model_list, 1):
            print(f"[{i}/{len(base_model_list)}] Training {model_name}...", end=" ", flush=True)
            model_start = time.time()
            
            try:
                # Configure base model
                if model_name in ['lstm', 'gru', 'tcn']:
                    config = TrainerConfig(
                        model_name=model_name,
                        horizon=HORIZON,
                        sequence_length=SEQ_LEN if 'SEQ_LEN' in dir() else 60,
                        batch_size=RECOMMENDED_BATCH_SIZE if 'RECOMMENDED_BATCH_SIZE' in dir() else 256,
                        max_epochs=50,
                        early_stopping_patience=10,
                        output_dir=Path('/content/drive/MyDrive/research/experiments/runs'),
                        device="cuda" if GPU_AVAILABLE else "cpu",
                        mixed_precision=MIXED_PRECISION if 'MIXED_PRECISION' in dir() else False,
                    )
                else:
                    config = TrainerConfig(
                        model_name=model_name,
                        horizon=HORIZON,
                        output_dir=Path('/content/drive/MyDrive/research/experiments/runs'),
                    )
                
                trainer = Trainer(config)
                results = trainer.run(container)
                
                model_elapsed = time.time() - model_start
                metrics = results.get('evaluation_metrics', {})
                
                base_model_results[model_name] = {
                    'accuracy': metrics.get('accuracy', 0),
                    'macro_f1': metrics.get('macro_f1', 0),
                    'weighted_f1': metrics.get('weighted_f1', 0),
                    'time': model_elapsed,
                    'run_id': results.get('run_id', 'unknown'),
                }
                
                print(f"done ({model_elapsed:.1f}s) - Acc: {metrics.get('accuracy', 0):.1%}")
                
            except Exception as e:
                print(f"FAILED: {e}")
                base_model_results[model_name] = {
                    'accuracy': 0, 'macro_f1': 0, 'weighted_f1': 0,
                    'time': time.time() - model_start, 'error': str(e)
                }
        
        print()
        
        # Train ensemble (meta-learner)
        print("-" * 60)
        print(" Training Ensemble Meta-Learner")
        print("-" * 60)
        print(f"Training {ensemble_type} with {meta_learner} meta-learner...", end=" ", flush=True)
        
        meta_start = time.time()
        
        # Configure ensemble
        ensemble_config = TrainerConfig(
            model_name=ensemble_type,
            horizon=HORIZON,
            output_dir=Path('/content/drive/MyDrive/research/experiments/runs'),
            model_config={
                "base_model_names": base_model_list,
                "meta_learner": meta_learner,
            }
        )
        
        ensemble_trainer = Trainer(ensemble_config)
        ensemble_results = ensemble_trainer.run(container)
        
        meta_elapsed = time.time() - meta_start
        ensemble_metrics = ensemble_results.get('evaluation_metrics', {})
        
        print(f"done ({meta_elapsed:.1f}s)")
        
        # Store ensemble results
        ensemble_accuracy = ensemble_metrics.get('accuracy', 0)
        ensemble_macro_f1 = ensemble_metrics.get('macro_f1', 0)
        ensemble_weighted_f1 = ensemble_metrics.get('weighted_f1', 0)
        
        total_elapsed = time.time() - ensemble_start_time
        
        # Display results
        print()
        print("=" * 60)
        print(" ENSEMBLE RESULTS")
        print("=" * 60)
        print(f"Accuracy:     {ensemble_accuracy:.2%}")
        print(f"Macro F1:     {ensemble_macro_f1:.4f}")
        print(f"Weighted F1:  {ensemble_weighted_f1:.4f}")
        print()
        
        # Comparison table
        print("-" * 60)
        print(" Comparison: Ensemble vs Base Models")
        print("-" * 60)
        print(f"{'Model':<15} {'Accuracy':>10} {'Macro F1':>10} {'Time':>10}")
        print("-" * 45)
        
        # Find best accuracy
        all_accuracies = {k: v['accuracy'] for k, v in base_model_results.items()}
        all_accuracies[ensemble_type.upper()] = ensemble_accuracy
        best_model = max(all_accuracies, key=all_accuracies.get)
        
        # Print base model results
        for model_name, data in base_model_results.items():
            acc_str = f"{data['accuracy']:.2%}" if data['accuracy'] > 0 else "ERROR"
            f1_str = f"{data['macro_f1']:.4f}" if data['macro_f1'] > 0 else "N/A"
            time_str = f"{data['time']:.1f}s"
            marker = " <-- Best!" if model_name == best_model else ""
            print(f"{model_name:<15} {acc_str:>10} {f1_str:>10} {time_str:>10}{marker}")
        
        # Print ensemble result (highlighted)
        marker = " <-- Best!" if ensemble_type.upper() == best_model else ""
        print(f"{ensemble_type.upper():<15} {ensemble_accuracy:>9.2%} {ensemble_macro_f1:>10.4f} {meta_elapsed:>9.1f}s{marker}")
        print("-" * 45)
        
        # Calculate improvement
        best_base_acc = max(v['accuracy'] for v in base_model_results.values() if v['accuracy'] > 0)
        improvement = ensemble_accuracy - best_base_acc
        
        print()
        if improvement > 0:
            print(f"Ensemble improvement: +{improvement:.2%} over best base model")
        elif improvement < 0:
            print(f"Ensemble underperformed by: {abs(improvement):.2%}")
        else:
            print("Ensemble matched best base model performance")
        
        print(f"Total training time: {total_elapsed:.1f}s")
        print()
        print("=" * 60)
        
        # Store in TRAINING_RESULTS for later comparison
        if 'TRAINING_RESULTS' not in dir():
            TRAINING_RESULTS = {}
        
        # Add base models to training results
        for model_name, data in base_model_results.items():
            if 'error' not in data:
                TRAINING_RESULTS[model_name] = {
                    'metrics': {
                        'accuracy': data['accuracy'],
                        'macro_f1': data['macro_f1'],
                        'weighted_f1': data['weighted_f1'],
                    },
                    'time': data['time'],
                    'run_id': data.get('run_id', 'unknown'),
                }
        
        # Add ensemble to training results
        TRAINING_RESULTS[f"{ensemble_type}_ensemble"] = {
            'metrics': {
                'accuracy': ensemble_accuracy,
                'macro_f1': ensemble_macro_f1,
                'weighted_f1': ensemble_weighted_f1,
            },
            'time': total_elapsed,
            'run_id': ensemble_results.get('run_id', 'unknown'),
            'base_models': base_model_list,
            'meta_learner': meta_learner,
        }
        
        print(f"Results stored in TRAINING_RESULTS['{ensemble_type}_ensemble']")
        
    except ImportError as e:
        print(f"\nImport Error: {e}")
        print("Make sure all required modules are available.")
        print("Try running: !pip install xgboost lightgbm catboost scikit-learn")
        
    except FileNotFoundError as e:
        print(f"\nData Error: {e}")
        print("Processed data not found. Run Section 3.2 first to prepare data.")
        
    except Exception as e:
        print(f"\nUnexpected Error: {e}")
        import traceback
        traceback.print_exc()
        print()
        print("Troubleshooting tips:")
        print("  1. Verify data exists: !ls /content/research/data/splits/scaled/")
        print("  2. Check model registry: from src.models import ModelRegistry; print(ModelRegistry.list_all())")
        print("  3. Try training base models individually first (Section 4.2)")
        
else:
    print("Ensemble training skipped.")
    print("Enable 'train_ensemble' checkbox above to run.")
    print()
    print("Available ensemble types:")
    print("  - voting: Weighted average of base model predictions")
    print("  - stacking: Train meta-learner on out-of-fold predictions")
    print("  - blending: Train meta-learner on holdout set predictions")

---
## 7. Save Results & Next Steps

In [None]:
#@title 7.1 Summary & Saved Artifacts { display-mode: "form" }
#@markdown Display summary and location of all saved files.

from pathlib import Path

print("=" * 60)
print(" SESSION SUMMARY")
print("=" * 60)

# Data summary
print("\n DATA (from GitHub clone):")
splits_dir = Path('/content/research/data/splits/scaled')
if splits_dir.exists():
    for f in splits_dir.glob("*.parquet"):
        size_mb = f.stat().st_size / 1e6
        print(f"  {f.name}: {size_mb:.1f} MB")

# Training results
print("\n TRAINED MODELS (saved to Google Drive):")
experiments_dir = Path('/content/drive/MyDrive/research/experiments/runs')
if experiments_dir.exists():
    runs = list(experiments_dir.iterdir())
    for run_dir in sorted(runs)[-5:]:
        if run_dir.is_dir():
            print(f"  {run_dir.name}")

# Next steps
print("\n NEXT STEPS:")
print("  1. Review model metrics in Google Drive: experiments/runs/")
print("  2. Try different model configurations")
print("  3. Run cross-validation for robust evaluation")
print("  4. Train ensemble for best performance")
print("  5. Export best model for production")

print("\n" + "=" * 60)
print(" Data loaded from: /content/research")
print(" Results saved to: /content/drive/MyDrive/research")
print("=" * 60)

---
## Appendix: Quick Commands

```bash
# Train single model
!python scripts/train_model.py --model xgboost --horizon 20

# Train neural model
!python scripts/train_model.py --model lstm --horizon 20 --seq-len 60

# Run cross-validation
!python scripts/run_cv.py --models xgboost,lightgbm --horizons 20 --n-splits 5

# Train ensemble
!python scripts/train_model.py --model voting --horizon 20

# List all models
!python scripts/train_model.py --list-models
```