# Factor Allocation Strategy

**Point-in-Time FRED-MD** macroeconomic data to predict factor allocation.

**Model**: MICRO Transformer (12k params)

## Comparison: 16 Combinations

| Strategy | Allocation | Horizons |
|----------|------------|----------|
| E2E (3-phase) | Binary (2F) | 1M, 3M, 6M, 12M |
| E2E (3-phase) | Multi (6F) | 1M, 3M, 6M, 12M |
| Supervised | Binary (2F) | 1M, 3M, 6M, 12M |
| Supervised | Multi (6F) | 1M, 3M, 6M, 12M |

**Key**: All models rebalance monthly. Horizon = Sharpe optimization target in training.

In [1]:
# ============================================================
# SETUP
# ============================================================

import sys
import warnings
from pathlib import Path

warnings.filterwarnings('ignore')

project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / "src"))

import numpy as np
import pandas as pd
import torch

np.random.seed(42)
torch.manual_seed(42)

print("Setup complete")

Setup complete


In [2]:
# ============================================================
# DATA LOADING
# ============================================================

from data.point_in_time_loader import PointInTimeFREDMDLoader, PointInTimeConfig
from data.factor_data_loader import FactorDataLoader, FactorDataConfig
from data.data_loader import Region
from features.feature_engineering import FeatureEngineer, FeatureConfig

# Load Point-in-Time FRED-MD
vintages_dir = project_root / "data_cache" / "vintages"

pit_loader = PointInTimeFREDMDLoader(
    PointInTimeConfig(
        vintages_dir=vintages_dir,
        publication_lag=1,
        apply_transformations=True
    )
)

macro_data = pit_loader.create_pit_macro_dataframe(
    start_date="2000-01-01",
    end_date="2024-12-31"
)

market_data = pit_loader.create_pit_market_context(
    start_date="2000-01-01",
    end_date="2024-12-31"
)

indicators = pit_loader.get_indicators()

# Load factor returns
factor_loader = FactorDataLoader(
    FactorDataConfig(
        start_date="2000-01-01",
        end_date="2024-12-31",
        cache_dir=project_root / "data_cache" / "factors",
        use_cache=True
    )
)

factor_data = factor_loader.load_all_factors()

print(f"\nData loaded: {len(macro_data)} macro obs, {len(factor_data)} factor obs")

Indexed 305 vintage files
  Coverage: 1999-08 to 2024-12
Loading cached all factors from /Users/mathis/Finance-Quant-thinking/Strategies/factor_allocation_strategy_macro/data_cache/factors/all_factors.parquet

Data loaded: 32799 macro obs, 288 factor obs


---
## Model Configuration

In [3]:
# ============================================================
# CONFIGURATION
# ============================================================

# MICRO Transformer (12k params - reduced to prevent overfitting)
CONFIG = {
    # Architecture
    "sequence_length": 12,
    "num_factors": 6,
    "d_model": 32,
    "num_heads": 1,
    "num_layers": 1,
    "d_ff": 64,
    "dropout": 0.6,
    
    # Training
    "learning_rate": 0.001,
    "batch_size": 32,
    "weight_decay": 0.02,
    "epochs_phase1": 20,
    "epochs_phase2": 15,
    "epochs_phase3": 15,
    
    # Backtest
    "execution_threshold": 0.05,
    "transaction_cost": 0.001,
    "val_split": 0.2,
}

# Feature engineering
feature_config = FeatureConfig(
    sequence_length=12,
    include_momentum=True,
    include_market_context=True,
    use_fred_md=True,
    aggregation_windows=[1, 3, 6, 12]
)

feature_engineer = FeatureEngineer(
    config=feature_config,
    region=Region.US,
    fred_md_indicators=indicators
)

print(f"Model: d_model={CONFIG['d_model']}, layers={CONFIG['num_layers']}, heads={CONFIG['num_heads']}")
print(f"Features: {feature_engineer.get_num_indicators()} indicators")

Model: d_model=32, layers=1, heads=1
Features: 112 indicators


---
## Training & Evaluation

**16 combinations** to compare:

| Strategy | Allocation | Horizons |
|----------|------------|----------|
| End-to-End (3-phase) | Binary (2F) | 1M, 3M, 6M, 12M |
| End-to-End (3-phase) | Multi-factor (6F) | 1M, 3M, 6M, 12M |
| Supervised | Binary (2F) | 1M, 3M, 6M, 12M |
| Supervised | Multi-factor (6F) | 1M, 3M, 6M, 12M |

All models rebalance monthly. Horizons differ only in Phase 3 Sharpe optimization target (cumulative returns).

In [4]:
# ============================================================
# TRAIN ALL 16 COMBINATIONS
# ============================================================

from comparison_runner import prepare_data, run_all_combinations

# Prepare multi-horizon data
targets, cumulative_returns = prepare_data(factor_loader, factor_data)

# Run all combinations
all_results = run_all_combinations(
    macro_data=macro_data,
    factor_data=factor_data,
    market_data=market_data,
    targets=targets,
    cumulative_returns=cumulative_returns,
    indicators=indicators,
    feature_engineer=feature_engineer,
    config=CONFIG,
    verbose=True,
)


Horizon 1M: 287 observations
  Target distribution: {1: np.int64(149), 0: np.int64(138)}

Horizon 3M: 285 observations
  Target distribution: {1: np.int64(154), 0: np.int64(131)}

Horizon 6M: 282 observations
  Target distribution: {1: np.int64(144), 0: np.int64(138)}

Horizon 12M: 276 observations
  Target distribution: {1: np.int64(139), 0: np.int64(137)}
[1/16] E2E Binary 1M...
[2/16] E2E Binary 3M...
[3/16] E2E Binary 6M...
[4/16] E2E Binary 12M...
[5/16] E2E Multi 1M...
[6/16] E2E Multi 3M...
[7/16] E2E Multi 6M...
[8/16] E2E Multi 12M...
[9/16] Sup Binary 1M...
[10/16] Sup Binary 3M...
[11/16] Sup Binary 6M...
[12/16] Sup Binary 12M...
[13/16] Sup Multi 1M...
[14/16] Sup Multi 3M...
[15/16] Sup Multi 6M...
[16/16] Sup Multi 12M...

Completed: 16 combinations


In [7]:
# ============================================================
# RESULTS
# ============================================================

# Reload module if cached
import importlib
import comparison_runner
importlib.reload(comparison_runner)

from comparison_runner import (
    format_results_table,
    compute_summary_stats,
    format_ranking_table,
)

# Display results table
print(format_results_table(all_results))

# Model ranking (composite score)
print("\n")
print(format_ranking_table(all_results, top_n=5))

# Summary statistics
summary = compute_summary_stats(all_results)

print("\n" + "-" * 70)
print("SUMMARY BY DIMENSION:")
print("-" * 70)

print(f"Strategy:   E2E={summary['strategy']['E2E_avg']:+.3f}, Sup={summary['strategy']['Sup_avg']:+.3f}")
print(f"Allocation: Binary={summary['allocation']['Binary_avg']:+.3f}, Multi={summary['allocation']['Multi_avg']:+.3f}")

print("\nHorizon breakdown:")
for h, sharpe in summary['horizon'].items():
    print(f"  {h}: avg Sharpe={sharpe:+.3f}")

                                UNIFIED RESULTS: 16 COMBINATIONS
Strategy   Allocation   Horizon        Sharpe         IC     Max DD   Accuracy
----------------------------------------------------------------------------------------------------
E2E        Binary       1M            +0.5209    +0.1023    -0.3448      51.9%
E2E        Binary       3M            +0.7806    +0.2601    -0.3537      60.4%
E2E        Binary       6M            +0.7286    +0.2673    -0.3537      50.0%
E2E        Binary       12M            +0.7607    +0.2645    -0.3537      49.6%
E2E        Multi        1M            +0.9287    +0.0199    -0.1057          -
E2E        Multi        3M            +0.9259    -0.1250    -0.0920          -
E2E        Multi        6M            +0.6722    +0.1939    -0.1099          -
E2E        Multi        12M            +0.8500    -0.0564    -0.1575          -
Sup        Binary       1M            +0.7246    +0.0508    -0.3537      48.1%
Sup        Binary       3M            +0.7