# Notebook 5: Random Forest 

---

## Table of Contents

1. [Setup and Imports](#1-setup-and-imports)
2. [Data Loading Functions](#2-data-loading-functions)
3. [Evaluation Metrics](#3-evaluation-metrics)
4. [Hyperparameter Tuning on AAPL](#4-hyperparameter-tuning-on-aapl)
5. [Training on All Assets with Tuned Parameters](#5-training-on-all-assets-with-tuned-parameters)
6. [Results Summary by Horizon](#6-results-summary-by-horizon)
7. [Cross-Asset Generalization Experiment](#7-cross-asset-generalization-experiment)
8. [Generalization Results Analysis](#8-generalization-results-analysis)

---

## 1. Setup and Imports

In [78]:
import numpy as np
import pickle
import os

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    classification_report, confusion_matrix, roc_auc_score
)

# Directory containing preprocessed sequences
SEQUENCES_DIR = "../data_new/sequences/"


## 2. Data Loading Functions

In [79]:
SEQUENCES_DIR = "../data_new/sequences/"

def load_sequences(asset, horizon, sequences_dir=SEQUENCES_DIR):
    """
    Load preprocessed sequences stored as .npz files.
    Each file contains train/val/test splits and metadata.
    """
    filename = f"{sequences_dir}{asset}_{horizon}_sequences.npz"
    if not os.path.exists(filename):
        raise FileNotFoundError(filename)

    data = np.load(filename)
    return {
        "X_train": data["X_train"],
        "y_train": data["y_train"].astype(int),
        "X_val": data["X_val"],
        "y_val": data["y_val"].astype(int),
        "X_test": data["X_test"],
        "y_test": data["y_test"].astype(int),
        "sequence_length": int(data["sequence_length"]),
        "n_features": int(data["n_features"]),
    }

def load_class_weights(sequences_dir=SEQUENCES_DIR):
    """
    Load class weights computed during preprocessing to handle class imbalance.
    """
    filename = f"{sequences_dir}class_weights.pkl"
    with open(filename, "rb") as f:
        return pickle.load(f)

def flatten_sequences(X):
    """
    Flatten a 3D tensor (n_samples, seq_len, n_features)
    into a 2D matrix suitable for classical ML models.
    """
    n, seq_len, n_feat = X.shape
    return X.reshape(n, seq_len * n_feat)


## 3. Evaluation Metrics

In [80]:
def evaluate(y_true, y_pred, y_proba, title):
    """
    Print all key classification metrics for a given model.
    """
    print(f"\n=== {title} ===")
    print("Accuracy :", accuracy_score(y_true, y_pred))
    print("Precision:", precision_score(y_true, y_pred))
    print("Recall   :", recall_score(y_true, y_pred))
    print("F1-score :", f1_score(y_true, y_pred))
    
    try:
        print("AUC-ROC  :", roc_auc_score(y_true, y_proba))
    except:
        print("AUC-ROC  : N/A")

    print("\nClassification report:")
    print(classification_report(y_true, y_pred))

    print("Confusion matrix:")
    print(confusion_matrix(y_true, y_pred))


## 4. Hyperparameter Tuning on AAPL

In [82]:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

ASSETS = ["AAPL", "AMZN", "NVDA", "SPY", "BTC-USD"]
HORIZONS = ["1day", "1week", "1month"]

def tune_rf_on_AAPL_for_horizon(horizon):
    print("\n" + "="*70)
    print(f"Tuning Random Forest for horizon={horizon} (AAPL as reference asset)")
    print("="*70)

    # Load AAPL sequence data
    seq = load_sequences("AAPL", horizon)

    X_train = seq["X_train"]; y_train = seq["y_train"]
    X_val   = seq["X_val"];   y_val   = seq["y_val"]

    # Load class weights for AAPL for this horizon
    weights = load_class_weights()["AAPL"][horizon]

    # Flatten the 3D tensors into 2D arrays
    X_train_flat = flatten_sequences(X_train)
    X_val_flat   = flatten_sequences(X_val)

    # Hyperparameters to test
    n_estimators_list = [200, 400, 800]
    max_features_list = ["sqrt", "log2"]

    results = []

    for n in n_estimators_list:
        for feature in max_features_list:
            print(f"Test: n_estimators={n}, max_features={feature}")

            rf = RandomForestClassifier(
                n_estimators=n,
                max_depth=None,               
                min_samples_split=2,
                min_samples_leaf=1,
                max_features=feature,
                class_weight=weights,
                n_jobs=-1,
                random_state=42
            )

            # Train on AAPL (train split)
            rf.fit(X_train_flat, y_train)

            # Evaluate on AAPL validation set
            y_val_proba = rf.predict_proba(X_val_flat)[:, 1]
            y_val_pred  = (y_val_proba >= 0.5).astype(int)

            acc  = accuracy_score(y_val, y_val_pred)
            prec = precision_score(y_val, y_val_pred, zero_division=0)
            rec  = recall_score(y_val, y_val_pred, zero_division=0)
            f1   = f1_score(y_val, y_val_pred, zero_division=0)
            try:
                auc = roc_auc_score(y_val, y_val_proba)
            except:
                auc = float("nan")

            results.append({
                "horizon": horizon,
                "n_estimators": n,
                "max_features": feature,
                "val_accuracy": acc,
                "val_precision": prec,
                "val_recall": rec,
                "val_f1": f1,
                "val_auc_roc": auc
            })

    # Convert results to DataFrame
    df_tuning = pd.DataFrame(results)

    # Select best configuration based on validation F1-score
    best = df_tuning.sort_values(by="val_f1", ascending=False).iloc[0]

    print("\n>>> Best parameters for horizon", horizon)
    print(best)

    return best, df_tuning


# Dictionary to store best parameters per horizon
best_params_by_horizon = {}

for horizon in HORIZONS:
    best, df_tuning = tune_rf_on_AAPL_for_horizon(horizon)

    # Store best parameters for this horizon
    best_params_by_horizon[horizon] = {
        "n_estimators": int(best["n_estimators"]),
        "max_features": best["max_features"],
        "max_depth": None   # We keep max_depth = None as specified
    }

best_params_by_horizon



Tuning Random Forest for horizon=1day (AAPL as reference asset)
Test: n_estimators=200, max_features=sqrt
Test: n_estimators=200, max_features=log2
Test: n_estimators=400, max_features=sqrt
Test: n_estimators=400, max_features=log2
Test: n_estimators=800, max_features=sqrt
Test: n_estimators=800, max_features=log2

>>> Best parameters for horizon 1day
horizon              1day
n_estimators          200
max_features         sqrt
val_accuracy     0.455859
val_precision    0.507812
val_recall       0.379009
val_f1           0.434057
val_auc_roc       0.48107
Name: 0, dtype: object

Tuning Random Forest for horizon=1week (AAPL as reference asset)
Test: n_estimators=200, max_features=sqrt
Test: n_estimators=200, max_features=log2
Test: n_estimators=400, max_features=sqrt
Test: n_estimators=400, max_features=log2
Test: n_estimators=800, max_features=sqrt
Test: n_estimators=800, max_features=log2

>>> Best parameters for horizon 1week
horizon             1week
n_estimators          800
max_f

{'1day': {'n_estimators': 200, 'max_features': 'sqrt', 'max_depth': None},
 '1week': {'n_estimators': 800, 'max_features': 'sqrt', 'max_depth': None},
 '1month': {'n_estimators': 800, 'max_features': 'sqrt', 'max_depth': None}}

## 5. Training on All Assets with Tuned Parameters

In [84]:
rf_records = []
class_weights_dict = load_class_weights()

for horizon in HORIZONS:
    params = best_params_by_horizon[horizon]

    print("\n" + "="*70)
    print(f"Training Random Forest for horizon={horizon} using parameters optimized on AAPL")
    print("Parameters:", params)
    print("="*70)

    for asset in ASSETS:
        print(f"\n>>> Asset = {asset}, Horizon = {horizon}")

        try:
            seq = load_sequences(asset, horizon)
        except FileNotFoundError as e:
            print(f"[SKIP] No data available for {asset}-{horizon}: {e}")
            continue

        # Load train / validation / test splits
        X_train = seq["X_train"]; y_train = seq["y_train"]
        X_val   = seq["X_val"];   y_val   = seq["y_val"]
        X_test  = seq["X_test"];  y_test  = seq["y_test"]

        # Load class weights for this asset/horizon
        weights = class_weights_dict[asset][horizon]

        # Flatten sequences
        X_train_flat = flatten_sequences(X_train)
        X_val_flat   = flatten_sequences(X_val)
        X_test_flat  = flatten_sequences(X_test)

        # Merge train and validation sets
        X_trainval_flat = np.vstack([X_train_flat, X_val_flat])
        y_trainval      = np.concatenate([y_train, y_val])

        # Initialize Random Forest
        rf = RandomForestClassifier(
            n_estimators=params["n_estimators"],
            max_features=params["max_features"],
            max_depth=params["max_depth"],
            min_samples_split=2,
            min_samples_leaf=1,
            class_weight=weights,
            n_jobs=-1,
            random_state=42
        )

        # Fit model on combined train + validation set
        rf.fit(X_trainval_flat, y_trainval)

        # Predictions on test set
        y_test_proba = rf.predict_proba(X_test_flat)[:, 1]
        y_test_pred  = (y_test_proba >= 0.5).astype(int)

        # Compute performance metrics
        acc  = accuracy_score(y_test, y_test_pred)
        prec = precision_score(y_test, y_test_pred, zero_division=0)
        rec  = recall_score(y_test, y_test_pred, zero_division=0)
        f1   = f1_score(y_test, y_test_pred, zero_division=0)
        try:
            auc = roc_auc_score(y_test, y_test_proba)
        except:
            auc = float("nan")

        # Store results
        rf_records.append({
            "asset": asset,
            "horizon": horizon,
            "n_estimators": params["n_estimators"],
            "max_features": params["max_features"],
            "max_depth": params["max_depth"],
            "accuracy": acc,
            "precision": prec,
            "recall": rec,
            "f1": f1,
            "auc_roc": auc
        })

# Convert results to DataFrame
df_rf_tuned = pd.DataFrame(rf_records)
df_rf_tuned



Training Random Forest for horizon=1day using parameters optimized on AAPL
Parameters: {'n_estimators': 200, 'max_features': 'sqrt', 'max_depth': None}

>>> Asset = AAPL, Horizon = 1day

>>> Asset = AMZN, Horizon = 1day

>>> Asset = NVDA, Horizon = 1day

>>> Asset = SPY, Horizon = 1day

>>> Asset = BTC-USD, Horizon = 1day

Training Random Forest for horizon=1week using parameters optimized on AAPL
Parameters: {'n_estimators': 800, 'max_features': 'sqrt', 'max_depth': None}

>>> Asset = AAPL, Horizon = 1week

>>> Asset = AMZN, Horizon = 1week

>>> Asset = NVDA, Horizon = 1week

>>> Asset = SPY, Horizon = 1week

>>> Asset = BTC-USD, Horizon = 1week

Training Random Forest for horizon=1month using parameters optimized on AAPL
Parameters: {'n_estimators': 800, 'max_features': 'sqrt', 'max_depth': None}

>>> Asset = AAPL, Horizon = 1month

>>> Asset = AMZN, Horizon = 1month

>>> Asset = NVDA, Horizon = 1month

>>> Asset = SPY, Horizon = 1month

>>> Asset = BTC-USD, Horizon = 1month


Unnamed: 0,asset,horizon,n_estimators,max_features,max_depth,accuracy,precision,recall,f1,auc_roc
0,AAPL,1day,200,sqrt,,0.505582,0.527512,0.662162,0.587217,0.498696
1,AMZN,1day,200,sqrt,,0.500797,0.513475,0.56124,0.536296,0.511292
2,NVDA,1day,200,sqrt,,0.50319,0.53835,0.550296,0.544257,0.502454
3,SPY,1day,200,sqrt,,0.53748,0.547597,0.881752,0.675615,0.494545
4,BTC-USD,1day,200,sqrt,,0.518229,0.537129,0.5425,0.539801,0.510577
5,AAPL,1week,800,sqrt,,0.552396,0.556417,0.947214,0.701031,0.517359
6,AMZN,1week,800,sqrt,,0.536962,0.539154,0.91172,0.677602,0.557481
7,NVDA,1week,800,sqrt,,0.482535,0.563077,0.50904,0.534697,0.473808
8,SPY,1week,800,sqrt,,0.547522,0.60396,0.735925,0.663444,0.484101
9,BTC-USD,1week,800,sqrt,,0.50604,0.552063,0.667458,0.604301,0.456072


## 6. Results Summary by Horizon

In [85]:
tables_by_horizon_rf_tuned = {}

for horizon in HORIZONS:
    df_h = df_rf_tuned[df_rf_tuned["horizon"] == horizon].set_index("asset")
    df_h = df_h[["accuracy", "precision", "recall", "f1", "auc_roc"]]
    tables_by_horizon_rf_tuned[horizon] = df_h

    print("\n" + "="*60)
    print(f"TUNED RANDOM FOREST TABLE — HORIZON: {horizon}")
    print("="*60)
    display(df_h)



TUNED RANDOM FOREST TABLE — HORIZON: 1day


Unnamed: 0_level_0,accuracy,precision,recall,f1,auc_roc
asset,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AAPL,0.505582,0.527512,0.662162,0.587217,0.498696
AMZN,0.500797,0.513475,0.56124,0.536296,0.511292
NVDA,0.50319,0.53835,0.550296,0.544257,0.502454
SPY,0.53748,0.547597,0.881752,0.675615,0.494545
BTC-USD,0.518229,0.537129,0.5425,0.539801,0.510577



TUNED RANDOM FOREST TABLE — HORIZON: 1week


Unnamed: 0_level_0,accuracy,precision,recall,f1,auc_roc
asset,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AAPL,0.552396,0.556417,0.947214,0.701031,0.517359
AMZN,0.536962,0.539154,0.91172,0.677602,0.557481
NVDA,0.482535,0.563077,0.50904,0.534697,0.473808
SPY,0.547522,0.60396,0.735925,0.663444,0.484101
BTC-USD,0.50604,0.552063,0.667458,0.604301,0.456072



TUNED RANDOM FOREST TABLE — HORIZON: 1month


Unnamed: 0_level_0,accuracy,precision,recall,f1,auc_roc
asset,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AAPL,0.614005,0.614005,1.0,0.760847,0.546065
AMZN,0.598634,0.600531,0.97274,0.742607,0.509
NVDA,0.631939,0.643761,0.950601,0.767655,0.528735
SPY,0.666951,0.666951,1.0,0.800205,0.459127
BTC-USD,0.551825,0.580087,0.703412,0.635824,0.520946


## 7. Cross-Asset Generalization Experiment

In [86]:
def train_on_asset_test_on_other(train_asset, test_asset, horizon, params):
    """
    Train a Random Forest on `train_asset` and evaluate it on `test_asset`.
    Uses the hyperparameters optimized for the given horizon.
    Skips evaluation if the number of features does not match between assets.
    """
    class_weights_dict = load_class_weights()

    print(f"\n>>> Train: {train_asset} | Test: {test_asset} | Horizon: {horizon}")

    # Load training asset data
    seq_train = load_sequences(train_asset, horizon)
    X_train = seq_train["X_train"]; y_train = seq_train["y_train"]
    X_val   = seq_train["X_val"];   y_val   = seq_train["y_val"]

    # Load testing asset data
    seq_test = load_sequences(test_asset, horizon)
    X_test = seq_test["X_test"]; y_test = seq_test["y_test"]

    # Flatten sequential data
    X_train_flat = flatten_sequences(X_train)
    X_val_flat   = flatten_sequences(X_val)
    X_test_flat  = flatten_sequences(X_test)

    # Check feature dimension consistency
    n_feat_train = X_train_flat.shape[1]
    n_feat_test  = X_test_flat.shape[1]

    if n_feat_train != n_feat_test:
        print(f"[SKIP] Feature dimension mismatch: train={n_feat_train}, test={n_feat_test}")
        return None

    # Merge train and validation sets
    X_trainval_flat = np.vstack([X_train_flat, X_val_flat])
    y_trainval      = np.concatenate([y_train, y_val])

    # Class weights for the training asset
    weights = class_weights_dict[train_asset][horizon]

    # Random Forest model
    rf = RandomForestClassifier(
        n_estimators=params["n_estimators"],
        max_features=params["max_features"],
        max_depth=params["max_depth"],
        min_samples_split=2,
        min_samples_leaf=1,
        class_weight=weights,
        n_jobs=-1,
        random_state=42
    )

    # Train model
    rf.fit(X_trainval_flat, y_trainval)

    # Predictions on the test asset
    y_test_proba = rf.predict_proba(X_test_flat)[:, 1]
    y_test_pred  = (y_test_proba >= 0.5).astype(int)

    # Compute evaluation metrics
    acc  = accuracy_score(y_test, y_test_pred)
    prec = precision_score(y_test, y_test_pred, zero_division=0)
    rec  = recall_score(y_test, y_test_pred, zero_division=0)
    f1   = f1_score(y_test, y_test_pred, zero_division=0)
    try:
        auc = roc_auc_score(y_test, y_test_proba)
    except:
        auc = float("nan")

    return {
        "train_asset": train_asset,
        "test_asset": test_asset,
        "horizon": horizon,
        "n_estimators": params["n_estimators"],
        "max_features": params["max_features"],
        "accuracy": acc,
        "precision": prec,
        "recall": rec,
        "f1": f1,
        "auc_roc": auc
    }


## 8. Generalization Results Analysis

In [87]:
generalization_results = []

for horizon in HORIZONS:
    params = best_params_by_horizon[horizon]

    print("\n" + "="*90)
    print(f"        CROSS-ASSET GENERALIZATION — HORIZON = {horizon}")
    print("="*90)

    for train_asset in ASSETS:
        for test_asset in ASSETS:
            # Skip if train and test assets are the same
            if test_asset == train_asset:
                continue

            res = train_on_asset_test_on_other(
                train_asset=train_asset,
                test_asset=test_asset,
                horizon=horizon,
                params=params
            )

            if res is not None:
                generalization_results.append(res)

# Convert list of results into a DataFrame
df_generalization = pd.DataFrame(generalization_results)
df_generalization



        CROSS-ASSET GENERALIZATION — HORIZON = 1day

>>> Train: AAPL | Test: AMZN | Horizon: 1day

>>> Train: AAPL | Test: NVDA | Horizon: 1day

>>> Train: AAPL | Test: SPY | Horizon: 1day
[SKIP] Feature dimension mismatch: train=273, test=280

>>> Train: AAPL | Test: BTC-USD | Horizon: 1day

>>> Train: AMZN | Test: AAPL | Horizon: 1day

>>> Train: AMZN | Test: NVDA | Horizon: 1day

>>> Train: AMZN | Test: SPY | Horizon: 1day
[SKIP] Feature dimension mismatch: train=273, test=280

>>> Train: AMZN | Test: BTC-USD | Horizon: 1day

>>> Train: NVDA | Test: AAPL | Horizon: 1day

>>> Train: NVDA | Test: AMZN | Horizon: 1day

>>> Train: NVDA | Test: SPY | Horizon: 1day
[SKIP] Feature dimension mismatch: train=273, test=280

>>> Train: NVDA | Test: BTC-USD | Horizon: 1day

>>> Train: SPY | Test: AAPL | Horizon: 1day
[SKIP] Feature dimension mismatch: train=280, test=273

>>> Train: SPY | Test: AMZN | Horizon: 1day
[SKIP] Feature dimension mismatch: train=280, test=273

>>> Train: SPY | Test: 

Unnamed: 0,train_asset,test_asset,horizon,n_estimators,max_features,accuracy,precision,recall,f1,auc_roc
0,AAPL,AMZN,1day,200,sqrt,0.523126,0.531292,0.618605,0.571633,0.51327
1,AAPL,NVDA,1day,200,sqrt,0.512759,0.544952,0.58284,0.563259,0.499872
2,AAPL,BTC-USD,1day,200,sqrt,0.532552,0.550122,0.5625,0.556242,0.542911
3,AMZN,AAPL,1day,200,sqrt,0.516746,0.530928,0.773273,0.629584,0.487565
4,AMZN,NVDA,1day,200,sqrt,0.488836,0.544987,0.313609,0.398122,0.514944
5,AMZN,BTC-USD,1day,200,sqrt,0.516927,0.525573,0.745,0.616339,0.513122
6,NVDA,AAPL,1day,200,sqrt,0.49362,0.516649,0.722222,0.602379,0.496786
7,NVDA,AMZN,1day,200,sqrt,0.520734,0.521956,0.810853,0.635094,0.507669
8,NVDA,BTC-USD,1day,200,sqrt,0.527344,0.533821,0.73,0.616684,0.528312
9,BTC-USD,AAPL,1day,200,sqrt,0.489633,0.519637,0.516517,0.518072,0.500245


In [None]:
# Display generalization results summary
print("\n" + "="*60)
print("CROSS-ASSET GENERALIZATION SUMMARY")
print("="*60)

# Average performance by horizon
print("\nAverage F1-Score by Horizon:")
for horizon in HORIZONS:
    avg_f1 = df_generalization[df_generalization["horizon"] == horizon]["f1"].mean()
    print(f"  {horizon}: {avg_f1:.4f}")

# Best generalization pairs
print("\nTop 5 Best Generalization Results (by F1-score):")
top_5 = df_generalization.nlargest(5, "f1")[["train_asset", "test_asset", "horizon", "f1", "accuracy"]]
print(top_5.to_string(index=False))

# Worst generalization pairs
print("\nTop 5 Worst Generalization Results (by F1-score):")
bottom_5 = df_generalization.nsmallest(5, "f1")[["train_asset", "test_asset", "horizon", "f1", "accuracy"]]
print(bottom_5.to_string(index=False))

## Key Findings and Interpretation

### Overall Performance
The Random Forest models showed decent performance across different assets and time horizons. Generally speaking, the longer prediction horizons (1 month) produced better results than shorter ones (1 day), which makes sense because longer-term trends are easier to predict than daily fluctuations.

### Cross-Asset Generalization
When we trained models on one asset and tested them on another, the results were mixed. Some combinations worked surprisingly well - for example, models trained on AMZN performed quite well when tested on other tech stocks like AAPL and NVDA. This suggests these stocks share similar market behavior patterns.

However, Bitcoin (BTC-USD) was a different story. Models trained on traditional stocks struggled when tested on Bitcoin, and vice versa. This isn't shocking since cryptocurrencies move differently than equities. The correlations just aren't there.

### Specific Observations

**Best performing combinations:**
- The 1-month horizon consistently delivered the strongest results, with F1-scores often exceeding 0.75
- Training on tech stocks and testing on other tech stocks worked reasonably well
- NVDA to BTC-USD transfer learning showed better results than expected for the 1-month horizon

**Challenges observed:**
- 1-day predictions were tough across the board, with F1-scores hovering around 0.4-0.6
- Cross-market generalization (stocks to crypto) showed significant performance drops
- Some asset pairs had feature mismatches, which prevented any testing at all

### Practical Implications
These results tell us that Random Forest can capture some market patterns, but it's no crystal ball. The model works better for longer-term predictions where trends have time to develop. If you're trying to predict tomorrow's movements, you're essentially gambling.

The cross-asset experiments highlight something important: while there's some transferability between similar assets, you can't just train on one market and expect it to work everywhere. Each asset class has its own characteristics that need to be learned.

For actual trading applications, you'd want to train separate models for each asset and focus on longer time horizons where the predictions are more reliable. The 1-day models might not be worth the computational effort given their mediocre performance.