# AlphaForge Research Notebook 🚀

**Interactive environment for systematic alpha research and factor model validation**

This notebook demonstrates the full capabilities of AlphaForge for quantitative factor research, portfolio construction, and performance analysis.

## Table of Contents
1. [Setup and Data Loading](#setup)
2. [Factor Analysis](#factors)
3. [Portfolio Construction](#portfolio)
4. [Backtesting](#backtest)
5. [Walk-Forward Analysis](#walkforward)
6. [Performance Analysis](#performance)
7. [Advanced Techniques](#advanced)

## 1. Setup and Data Loading

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Import AlphaForge framework
from factor_backtester import (
    Backtester, BacktestConfig, DataProvider, 
    FactorCalculator, PortfolioConstructor, 
    PerformanceAnalyzer
)

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
%matplotlib inline

print("🚀 AlphaForge Research Environment Ready!")
print("📊 Systematic alpha research toolkit loaded")

In [None]:
# AlphaForge Configuration
config = BacktestConfig(
    start_date="2015-01-01",
    end_date="2023-12-31",
    rebalance_freq="M",
    transaction_cost=0.001,
    max_weight=0.05,
    min_weight=-0.05,
    leverage=1.0
)

# Initialize AlphaForge components
data_provider = DataProvider()
backtester = Backtester(config)

print(f"⚙️ AlphaForge Configuration:")
print(f"   📅 Period: {config.start_date} to {config.end_date}")
print(f"   🔄 Rebalancing: {config.rebalance_freq}")
print(f"   💰 Transaction Cost: {config.transaction_cost:.1%}")
print(f"   📊 Position Limits: {config.min_weight:.1%} to {config.max_weight:.1%}")
print(f"   🎯 Leverage: {config.leverage}x")

In [None]:
# Load market universe
print("📈 Loading S&P 500 universe...")
tickers = data_provider.get_universe("SP500")
print(f"🌟 Universe: {len(tickers)} stocks")
print(f"📋 Sample tickers: {tickers[:15]}")

# Fetch market data with caching
print("\n🔄 Fetching market data...")
raw_data = data_provider.fetch_yahoo_data(tickers, config.start_date, config.end_date)
print(f"✅ Loaded {len(raw_data):,} observations")
print(f"📊 {len(raw_data['ticker'].unique())} unique tickers")
print(f"📅 Date range: {raw_data['Date'].min()} to {raw_data['Date'].max()}")

# Display sample data
print("\n📋 Sample Data:")
display(raw_data.head(10))

## 2. Factor Analysis

Calculate and analyze classic risk factors using AlphaForge's factor engineering capabilities.

In [None]:
# Calculate factors using AlphaForge
print("🔬 Calculating systematic risk factors...")
factor_calculator = FactorCalculator(raw_data)
factor_data = factor_calculator.calculate_all_factors()

print(f"📊 Factor data shape: {factor_data.shape}")
print(f"📅 Factor coverage: {factor_data['Date'].min()} to {factor_data['Date'].max()}")
print(f"🎯 {len(factor_data['ticker'].unique())} stocks with factor scores")

# Display factor summary statistics
factor_cols = ['momentum', 'value', 'quality', 'size', 'low_vol']
print("\n📈 Factor Summary Statistics:")
factor_summary = factor_data[factor_cols].describe()
display(factor_summary)

In [None]:
# Factor correlation and distribution analysis
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Factor correlation heatmap
corr_matrix = factor_data[factor_cols].corr()
sns.heatmap(corr_matrix, annot=True, cmap='RdBu_r', center=0, 
            square=True, ax=axes[0, 0], fmt='.3f')
axes[0, 0].set_title('📊 Factor Correlation Matrix')

# Factor rank distributions
factor_ranks = [col + '_rank' for col in factor_cols if col + '_rank' in factor_data.columns]
if factor_ranks:
    factor_data[factor_ranks].hist(bins=50, ax=axes[0, 1], alpha=0.7)
    axes[0, 1].set_title('📈 Factor Rank Distributions')

# Factor stability over time
factor_ts = factor_data.groupby('Date')[factor_cols].mean()
for factor in factor_cols:
    axes[1, 0].plot(factor_ts.index, factor_ts[factor], label=factor.title(), alpha=0.8)
axes[1, 0].set_title('🔄 Factor Evolution Over Time')
axes[1, 0].set_ylabel('Cross-Sectional Mean')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Factor volatility
factor_vol = factor_data.groupby('Date')[factor_cols].std().mean()
factor_vol.plot(kind='bar', ax=axes[1, 1], alpha=0.7)
axes[1, 1].set_title('📊 Average Factor Volatility')
axes[1, 1].set_ylabel('Cross-Sectional Std Dev')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## 3. Portfolio Construction

Demonstrate systematic portfolio construction with factor-based signals.

In [None]:
# Portfolio construction example
portfolio_constructor = PortfolioConstructor(config)

# Select a sample date for analysis
sample_date = factor_data['Date'].iloc[len(factor_data)//2]
print(f"📅 Portfolio construction date: {sample_date.strftime('%Y-%m-%d')}")

# Construct portfolio with AlphaForge
weights = portfolio_constructor.construct_portfolio(
    factor_data, sample_date, use_shrinkage=True, use_lasso=True
)

print(f"\n📊 Portfolio Statistics:")
print(f"   🎯 Total positions: {len(weights)}")
print(f"   📈 Long positions: {(weights > 0).sum()}")
print(f"   📉 Short positions: {(weights < 0).sum()}")
print(f"   💰 Total long weight: {weights[weights > 0].sum():.2%}")
print(f"   💸 Total short weight: {weights[weights < 0].sum():.2%}")
print(f"   🎪 Net exposure: {weights.sum():.2%}")
print(f"   🌐 Gross exposure: {weights.abs().sum():.2%}")

# Display top holdings
if len(weights) > 0:
    print("\n🔝 Top 10 Long Positions:")
    display(weights.nlargest(10).to_frame('Weight'))
    
    print("\n🔻 Top 10 Short Positions:")
    display(weights.nsmallest(10).to_frame('Weight'))

## 4. Backtesting

Execute comprehensive backtesting with transaction costs and performance analytics.

In [None]:
# Run comprehensive backtest
print("🚀 Running AlphaForge backtest...")
print("   🔬 Applying Bayesian shrinkage and Lasso regularization")
print("   💰 Including realistic transaction costs")
print("   📊 Computing comprehensive performance metrics")

results = backtester.run_backtest(tickers=tickers, use_shrinkage=True, use_lasso=True)

if results:
    print("\n✅ Backtest completed successfully!")
    print("\n📊 Performance Summary:")
    print("=" * 40)
    
    metrics = results['metrics']
    
    print(f"📈 Total Return:        {metrics['total_return']:>8.2%}")
    print(f"📊 Annualized Return:   {metrics['annualized_return']:>8.2%}")
    print(f"📉 Volatility:          {metrics['volatility']:>8.2%}")
    print(f"⚡ Sharpe Ratio:        {metrics['sharpe_ratio']:>8.2f}")
    print(f"🔻 Maximum Drawdown:    {metrics['max_drawdown']:>8.2%}")
    print(f"🎯 Win Rate:            {metrics['win_rate']:>8.2%}")
    print(f"📐 Skewness:            {metrics['skewness']:>8.2f}")
    print(f"📊 Kurtosis:            {metrics['kurtosis']:>8.2f}")
    print(f"📋 Observations:        {metrics['num_observations']:>8d}")
    
else:
    print("❌ Backtest failed - check data availability and configuration")

In [None]:
# Comprehensive performance visualization
if results:
    returns = results['returns']
    gross_returns = results['gross_returns']
    transaction_costs = results['transaction_costs']
    
    # Create performance dashboard
    fig, axes = plt.subplots(2, 3, figsize=(20, 12))
    fig.suptitle('📊 AlphaForge Performance Dashboard', fontsize=16, fontweight='bold')
    
    # 1. Cumulative returns
    cum_returns = (1 + returns).cumprod()
    cum_gross_returns = (1 + gross_returns).cumprod()
    
    axes[0, 0].plot(cum_returns.index, cum_returns.values, 
                    label='Net Returns', linewidth=2.5, color='steelblue')
    axes[0, 0].plot(cum_gross_returns.index, cum_gross_returns.values, 
                    label='Gross Returns', linewidth=2, alpha=0.7, color='orange')
    axes[0, 0].set_title('📈 Cumulative Returns')
    axes[0, 0].set_ylabel('Cumulative Return')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # 2. Rolling Sharpe ratio
    rolling_sharpe = returns.rolling(60).mean() / returns.rolling(60).std() * np.sqrt(252)
    axes[0, 1].plot(rolling_sharpe.index, rolling_sharpe.values, 
                    color='green', linewidth=2)
    axes[0, 1].set_title('⚡ Rolling Sharpe Ratio (60-day)')
    axes[0, 1].set_ylabel('Sharpe Ratio')
    axes[0, 1].axhline(y=0, color='red', linestyle='--', alpha=0.7)
    axes[0, 1].grid(True, alpha=0.3)
    
    # 3. Drawdown analysis
    rolling_max = cum_returns.expanding().max()
    drawdown = (cum_returns / rolling_max) - 1
    axes[0, 2].fill_between(drawdown.index, drawdown.values, 0, 
                           alpha=0.4, color='red')
    axes[0, 2].set_title('🔻 Drawdown Analysis')
    axes[0, 2].set_ylabel('Drawdown')
    axes[0, 2].grid(True, alpha=0.3)
    
    # 4. Return distribution
    axes[1, 0].hist(returns, bins=50, alpha=0.7, color='lightblue', 
                    edgecolor='black', density=True)
    axes[1, 0].axvline(returns.mean(), color='red', linestyle='--', 
                       alpha=0.8, linewidth=2, label=f'Mean: {returns.mean():.3f}')
    axes[1, 0].set_title('📊 Return Distribution')
    axes[1, 0].set_xlabel('Daily Return')
    axes[1, 0].set_ylabel('Density')
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)
    
    # 5. Transaction costs
    axes[1, 1].plot(transaction_costs.index, transaction_costs.cumsum(), 
                    color='purple', linewidth=2)
    axes[1, 1].set_title('💸 Cumulative Transaction Costs')
    axes[1, 1].set_ylabel('Cumulative Costs')
    axes[1, 1].grid(True, alpha=0.3)
    
    # 6. Rolling volatility
    rolling_vol = returns.rolling(60).std() * np.sqrt(252)
    axes[1, 2].plot(rolling_vol.index, rolling_vol.values, 
                    color='orange', linewidth=2)
    axes[1, 2].set_title('📉 Rolling Volatility (60-day)')
    axes[1, 2].set_ylabel('Annualized Volatility')
    axes[1, 2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("📊 Performance dashboard generated successfully!")

## 5. Walk-Forward Analysis

Rigorous out-of-sample testing with expanding windows for unbiased performance estimates.

In [None]:
# Walk-forward out-of-sample analysis
print("🔄 Running walk-forward analysis...")
print("   📈 Expanding window approach for robust validation")
print("   🎯 Out-of-sample performance estimation")
print("   ⚡ Statistical significance testing")

oos_results = backtester.walk_forward_analysis(
    tickers=tickers,
    initial_window=504,
    step_size=21
)

if oos_results:
    print("\n✅ Walk-forward analysis completed!")
    print("\n📊 Out-of-Sample Results:")
    print("=" * 45)
    
    oos_metrics = oos_results['oos_metrics']
    
    print(f"📈 OOS Total Return:      {oos_metrics['total_return']:>8.2%}")
    print(f"📊 OOS Annualized Return: {oos_metrics['annualized_return']:>8.2%}")
    print(f"📉 OOS Volatility:        {oos_metrics['volatility']:>8.2%}")
    print(f"⚡ OOS Sharpe Ratio:      {oos_metrics['sharpe_ratio']:>8.2f}")
    print(f"🔻 OOS Maximum Drawdown:  {oos_metrics['max_drawdown']:>8.2%}")
    print(f"🎯 OOS Win Rate:          {oos_metrics['win_rate']:>8.2%}")
    print(f"📋 OOS Observations:      {oos_metrics['num_observations']:>8d}")
    
else:
    print("❌ Walk-forward analysis failed")

## Summary and Conclusions

This notebook demonstrates the comprehensive capabilities of **AlphaForge** - a systematic alpha research platform:

### 🎯 Key Features Demonstrated:
1. **🔗 Data Integration**: Seamless multi-source data fetching with intelligent caching
2. **⚙️ Factor Engineering**: Classic factors (momentum, value, quality, size, low-vol) with robust calculation
3. **🎪 Portfolio Construction**: Long-short strategies with position limits and risk controls
4. **🧠 Advanced Techniques**: Bayesian shrinkage and Lasso regularization for overfitting prevention
5. **🔬 Rigorous Testing**: Walk-forward analysis for unbiased out-of-sample validation
6. **📊 Comprehensive Analytics**: Risk metrics, performance attribution, and sensitivity analysis

### 🚀 Next Steps for Research:
- **Custom Factors**: Develop proprietary signals using the extensible framework
- **ML Integration**: Implement ensemble methods and deep learning models
- **Alternative Data**: Incorporate sentiment, satellite, and patent data
- **Multi-Asset**: Extend to fixed income, commodities, and currencies
- **Regime Models**: Add market regime detection and adaptive strategies

### 💼 Framework Benefits:
- **🏗️ Modular Design**: Easy to extend and customize for specific research needs
- **⚡ Performance**: Parallel processing and caching for institutional-scale research
- **🛡️ Robust**: Transaction costs, survivorship bias, and statistical validation
- **📈 Research-Ready**: Publication-quality analytics and visualizations

### 🔥 Production Applications:
- **Hedge Funds**: Systematic strategy development and risk management
- **Asset Managers**: Portfolio optimization and performance attribution
- **Risk Teams**: Factor exposure monitoring and stress testing
- **Academic Research**: Empirical asset pricing and factor model validation

**AlphaForge** provides a professional-grade foundation for systematic alpha research that scales from academic studies to institutional trading strategies.

---

🚀 **Ready to forge alpha?** Explore the examples directory for advanced use cases and custom factor development patterns.