# Complete Backtesting Framework

This notebook demonstrates how to use the HFT Simulator's comprehensive backtesting framework to evaluate trading strategies with realistic market conditions, transaction costs, and risk management.

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
- Set up a complete backtesting environment
- Load and process real market data
- Run multi-strategy backtests
- Analyze performance with advanced metrics
- Generate professional reports
- Optimize strategy parameters

## 📚 Table of Contents

1. [Environment Setup](#setup)
2. [Data Preparation](#data)
3. [Strategy Configuration](#strategies)
4. [Backtesting Execution](#backtesting)
5. [Performance Analysis](#analysis)
6. [Risk Assessment](#risk)
7. [Report Generation](#reports)
8. [Parameter Optimization](#optimization)

---

In [None]:
# Import all required libraries
import sys
import os
sys.path.append('../src')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# HFT Simulator imports
from src.data.ingestion import DataIngestionPipeline, DataSource
from src.engine.order_book import OrderBook
from src.engine.market_data import MarketDataProcessor, BookSnapshot, MarketDataPoint
from src.execution.simulator import ExecutionSimulator, SimulationConfig
from src.strategies.market_making import MarketMakingStrategy, MarketMakingConfig
from src.strategies.liquidity_taking import LiquidityTakingStrategy, LiquidityTakingConfig
from src.performance.portfolio import Portfolio
from src.performance.risk_manager import RiskManager
from src.performance.metrics import PerformanceAnalyzer
from src.visualization.charts import ChartGenerator
from src.visualization.reports import ReportGenerator, ReportType, ReportFormat
from src.visualization.dashboard import Dashboard, DashboardConfig

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (14, 8)

print("🚀 HFT Simulator Backtesting Framework Loaded!")
print(f"📅 Session started: {datetime.now()}")
print("=" * 60)

## 1. Environment Setup {#setup}

Let's set up a comprehensive backtesting environment with multiple strategies and realistic market conditions.

In [None]:
# Configure backtesting parameters
class BacktestConfig:
    """Configuration for backtesting framework"""
    
    def __init__(self):
        # Time period
        self.start_date = pd.Timestamp('2024-01-01 09:30:00')
        self.end_date = pd.Timestamp('2024-01-01 16:00:00')
        
        # Symbols to trade
        self.symbols = ['AAPL', 'MSFT', 'GOOGL']
        
        # Capital allocation
        self.initial_capital = 1000000.0  # $1M
        self.capital_per_symbol = self.initial_capital / len(self.symbols)
        
        # Transaction costs
        self.commission_rate = 0.0005  # 5 bps
        self.slippage_bps = 1.0        # 1 bp average slippage
        
        # Risk parameters
        self.max_portfolio_risk = 0.02  # 2% portfolio risk
        self.max_position_size = 0.1    # 10% per position
        
        # Reporting
        self.benchmark_symbol = 'SPY'
        self.risk_free_rate = 0.05  # 5% annual

# Initialize configuration
config = BacktestConfig()

print(f"📊 Backtesting Configuration:")
print(f"   Period: {config.start_date} to {config.end_date}")
print(f"   Symbols: {config.symbols}")
print(f"   Initial Capital: ${config.initial_capital:,.0f}")
print(f"   Commission Rate: {config.commission_rate:.2%}")
print(f"   Max Portfolio Risk: {config.max_portfolio_risk:.1%}")

## 2. Data Preparation {#data}

Generate realistic market data for our backtesting framework.

In [None]:
# Generate comprehensive market data
def generate_comprehensive_market_data(config):
    """Generate realistic multi-symbol market data"""
    
    print("📈 Generating Market Data...")
    
    # Time series
    timestamps = pd.date_range(
        config.start_date, 
        config.end_date, 
        freq='1S'  # 1-second intervals
    )
    
    market_data = {}
    
    # Base prices for different symbols
    base_prices = {
        'AAPL': 150.0,
        'MSFT': 300.0,
        'GOOGL': 2500.0
    }
    
    # Generate data for each symbol
    for symbol in config.symbols:
        np.random.seed(hash(symbol) % 1000)  # Different seed per symbol
        
        base_price = base_prices[symbol]
        n_points = len(timestamps)
        
        # Generate correlated price movements
        volatility = np.random.uniform(0.15, 0.25)  # 15-25% annual volatility
        drift = np.random.uniform(-0.05, 0.05)      # -5% to +5% annual drift
        
        # Convert to per-second parameters
        dt = 1 / (252 * 24 * 60 * 60)  # 1 second in years
        vol_per_second = volatility * np.sqrt(dt)
        drift_per_second = drift * dt
        
        # Generate price path
        returns = np.random.normal(drift_per_second, vol_per_second, n_points)
        prices = base_price * np.exp(np.cumsum(returns))
        
        # Generate volumes (higher during market open/close)
        hour_of_day = timestamps.hour + timestamps.minute / 60
        volume_multiplier = 1 + 0.5 * np.exp(-((hour_of_day - 9.5) ** 2) / 2) + \
                           0.3 * np.exp(-((hour_of_day - 15.5) ** 2) / 2)
        
        base_volume = np.random.exponential(1000, n_points)
        volumes = (base_volume * volume_multiplier).astype(int)
        
        # Create market data points
        symbol_data = []
        
        for i, (timestamp, price, volume) in enumerate(zip(timestamps, prices, volumes)):
            # Generate realistic spread
            spread_bps = np.random.uniform(1, 5)  # 1-5 bps spread
            spread = price * spread_bps / 10000
            
            best_bid = price - spread / 2
            best_ask = price + spread / 2
            
            # Create book snapshot
            book_snapshot = BookSnapshot(
                symbol=symbol,
                timestamp=timestamp,
                bids=[
                    (best_bid, np.random.randint(500, 2000)),
                    (best_bid - spread, np.random.randint(300, 1500)),
                    (best_bid - 2*spread, np.random.randint(200, 1000))
                ],
                asks=[
                    (best_ask, np.random.randint(500, 2000)),
                    (best_ask + spread, np.random.randint(300, 1500)),
                    (best_ask + 2*spread, np.random.randint(200, 1000))
                ]
            )
            
            # Create market data point
            data_point = MarketDataPoint(
                symbol=symbol,
                timestamp=timestamp,
                price=price,
                volume=volume,
                book_snapshot=book_snapshot
            )
            
            symbol_data.append(data_point)
        
        market_data[symbol] = symbol_data
        
        print(f"   {symbol}: {len(symbol_data)} data points, "
              f"price range ${min(prices):.2f}-${max(prices):.2f}")
    
    print(f"✅ Market data generation complete!")
    return market_data

# Generate market data
market_data = generate_comprehensive_market_data(config)

## 3. Strategy Configuration {#strategies}

Set up multiple trading strategies for comparison.

In [None]:
# Set up multiple strategies for comparison
def setup_strategies(config):
    """Set up multiple trading strategies"""
    
    print("⚙️  Setting up Trading Strategies...")
    
    strategies = {}
    
    # Strategy 1: Conservative Market Making
    conservative_mm_config = MarketMakingConfig(
        spread_target=0.03,      # 3 cent target spread
        position_limit=500,      # Conservative position limit
        inventory_target=0,      # Neutral inventory target
        risk_aversion=0.2,       # Higher risk aversion
        min_spread=0.02,
        max_spread=0.08,
        quote_size=100
    )
    
    strategies['Conservative_MM'] = {
        'type': 'MarketMaking',
        'config': conservative_mm_config,
        'description': 'Conservative market making with tight risk controls'
    }
    
    # Strategy 2: Aggressive Market Making
    aggressive_mm_config = MarketMakingConfig(
        spread_target=0.015,     # 1.5 cent target spread
        position_limit=1000,     # Higher position limit
        inventory_target=0,
        risk_aversion=0.05,      # Lower risk aversion
        min_spread=0.01,
        max_spread=0.05,
        quote_size=200           # Larger quote size
    )
    
    strategies['Aggressive_MM'] = {
        'type': 'MarketMaking',
        'config': aggressive_mm_config,
        'description': 'Aggressive market making with tighter spreads'
    }
    
    # Strategy 3: Momentum Liquidity Taking
    momentum_lt_config = LiquidityTakingConfig(
        momentum_threshold=0.005,    # 0.5% momentum threshold
        mean_reversion_threshold=0.02,
        volume_threshold=1000,
        position_limit=800,
        signal_decay=0.95,
        min_signal_strength=0.3
    )
    
    strategies['Momentum_LT'] = {
        'type': 'LiquidityTaking',
        'config': momentum_lt_config,
        'description': 'Momentum-based liquidity taking strategy'
    }
    
    # Strategy 4: Mean Reversion Liquidity Taking
    mean_reversion_lt_config = LiquidityTakingConfig(
        momentum_threshold=0.01,
        mean_reversion_threshold=0.01,   # Lower threshold for mean reversion
        volume_threshold=800,
        position_limit=600,
        signal_decay=0.9,
        min_signal_strength=0.4
    )
    
    strategies['MeanReversion_LT'] = {
        'type': 'LiquidityTaking',
        'config': mean_reversion_lt_config,
        'description': 'Mean reversion liquidity taking strategy'
    }
    
    print(f"✅ {len(strategies)} strategies configured:")
    for name, strategy in strategies.items():
        print(f"   {name}: {strategy['description']}")
    
    return strategies

# Set up strategies
strategies = setup_strategies(config)

## 4. Backtesting Execution {#backtesting}

Run comprehensive backtests for all strategies.

In [None]:
# Execute comprehensive backtesting
def run_comprehensive_backtest(strategies, market_data, config):
    """Run backtests for all strategies"""
    
    print("🔄 Running Comprehensive Backtests...")
    print("=" * 60)
    
    results = {}
    
    for strategy_name, strategy_config in strategies.items():
        print(f"\n📊 Testing {strategy_name}...")
        
        # Create portfolio and risk manager for this strategy
        portfolio = Portfolio(
            initial_cash=config.initial_capital,
            name=f"{strategy_name}_Portfolio"
        )
        
        risk_manager = RiskManager(
            initial_capital=config.initial_capital,
            max_portfolio_risk=config.max_portfolio_risk
        )
        
        # Create strategy instance
        if strategy_config['type'] == 'MarketMaking':
            strategy = MarketMakingStrategy(
                symbols=config.symbols,
                portfolio=portfolio,
                config=strategy_config['config']
            )
        elif strategy_config['type'] == 'LiquidityTaking':
            strategy = LiquidityTakingStrategy(
                symbols=config.symbols,
                portfolio=portfolio,
                config=strategy_config['config']
            )
        
        # Start strategy
        strategy.start()
        
        # Track results
        strategy_results = {
            'timestamps': [],
            'portfolio_values': [],
            'positions': {symbol: [] for symbol in config.symbols},
            'pnl': [],
            'trades': [],
            'signals_generated': 0,
            'orders_executed': 0
        }
        
        # Process market data (simplified - process every 10th point for speed)
        total_points = len(market_data[config.symbols[0]])
        sample_indices = range(0, total_points, 10)  # Every 10th point
        
        for i in sample_indices:
            timestamp = market_data[config.symbols[0]][i].timestamp
            
            # Process each symbol
            for symbol in config.symbols:
                if i < len(market_data[symbol]):
                    data_point = market_data[symbol][i]
                    
                    # Generate signals
                    signals = strategy.generate_signals(data_point)
                    strategy_results['signals_generated'] += len(signals)
                    
                    # Simulate order execution (simplified)
                    for signal in signals:
                        if np.random.random() < 0.05:  # 5% execution probability
                            # Simulate trade with slippage
                            slippage = np.random.normal(0, config.slippage_bps / 10000)
                            execution_price = signal.price * (1 + slippage)
                            
                            # Create trade
                            from src.engine.order_types import Trade
                            trade = Trade(
                                trade_id=f"T{strategy_results['orders_executed']}",
                                symbol=signal.symbol,
                                volume=signal.quantity,
                                price=execution_price,
                                timestamp=timestamp,
                                buy_order_id=signal.order_id if signal.side.value == 'buy' else None,
                                sell_order_id=signal.order_id if signal.side.value == 'sell' else None
                            )
                            
                            # Add to portfolio
                            portfolio.add_trade(trade, commission=config.commission_rate * signal.quantity * execution_price)
                            strategy_results['trades'].append(trade)
                            strategy_results['orders_executed'] += 1
            
            # Update portfolio with current prices
            current_prices = {}
            for symbol in config.symbols:
                if i < len(market_data[symbol]):
                    current_prices[symbol] = market_data[symbol][i].price
            
            if current_prices:
                portfolio.update_prices(current_prices, timestamp)
                risk_manager.update_portfolio_value(portfolio.total_value, timestamp)
            
            # Record results
            strategy_results['timestamps'].append(timestamp)
            strategy_results['portfolio_values'].append(portfolio.total_value)
            strategy_results['pnl'].append(portfolio.total_pnl)
            
            for symbol in config.symbols:
                strategy_results['positions'][symbol].append(strategy.get_position(symbol))
            
            # Progress update
            if len(strategy_results['timestamps']) % 500 == 0:
                print(f"   Processed {len(strategy_results['timestamps'])} points...")
        
        # Calculate final metrics
        performance_metrics = portfolio.calculate_performance_metrics()
        risk_summary = risk_manager.get_risk_summary()
        
        # Store results
        results[strategy_name] = {
            'strategy_results': strategy_results,
            'portfolio': portfolio,
            'risk_manager': risk_manager,
            'performance_metrics': performance_metrics,
            'risk_summary': risk_summary
        }
        
        print(f"   ✅ {strategy_name} completed:")
        print(f"      Final Value: ${portfolio.total_value:,.2f}")
        print(f"      Total Return: {performance_metrics.total_return:.2%}")
        print(f"      Sharpe Ratio: {performance_metrics.sharpe_ratio:.2f}")
        print(f"      Max Drawdown: {performance_metrics.max_drawdown:.2%}")
        print(f"      Trades: {strategy_results['orders_executed']}")
    
    print(f"\n🎉 All backtests completed!")
    return results

# Run backtests
backtest_results = run_comprehensive_backtest(strategies, market_data, config)

## 5. Performance Analysis {#analysis}

Analyze and compare strategy performance.

In [None]:
# Comprehensive performance analysis
def analyze_strategy_performance(backtest_results):
    """Analyze and compare strategy performance"""
    
    print("📊 Strategy Performance Analysis")
    print("=" * 60)
    
    # Create performance comparison table
    performance_data = []
    
    for strategy_name, results in backtest_results.items():
        metrics = results['performance_metrics']
        portfolio = results['portfolio']
        
        performance_data.append({
            'Strategy': strategy_name,
            'Final Value': portfolio.total_value,
            'Total Return': metrics.total_return,
            'Sharpe Ratio': metrics.sharpe_ratio,
            'Max Drawdown': metrics.max_drawdown,
            'Volatility': metrics.annualized_volatility,
            'Win Rate': metrics.win_rate,
            'Profit Factor': metrics.profit_factor,
            'Total Trades': results['strategy_results']['orders_executed']
        })
    
    performance_df = pd.DataFrame(performance_data)
    
    # Display performance table
    print("\n📈 Performance Summary:")
    print(performance_df.round(4))
    
    # Create comprehensive visualization
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # Portfolio value evolution
    for strategy_name, results in backtest_results.items():
        strategy_results = results['strategy_results']
        axes[0, 0].plot(strategy_results['timestamps'], 
                       strategy_results['portfolio_values'], 
                       linewidth=2, label=strategy_name)
    
    axes[0, 0].axhline(y=config.initial_capital, color='black', 
                      linestyle='--', alpha=0.5, label='Initial Capital')
    axes[0, 0].set_title('Portfolio Value Evolution')
    axes[0, 0].set_ylabel('Portfolio Value ($)')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # Return comparison
    returns = performance_df['Total Return'].values
    strategy_names = performance_df['Strategy'].values
    
    colors = ['green' if r > 0 else 'red' for r in returns]
    axes[0, 1].bar(strategy_names, returns * 100, color=colors, alpha=0.7)
    axes[0, 1].set_title('Total Returns Comparison')
    axes[0, 1].set_ylabel('Return (%)')
    axes[0, 1].tick_params(axis='x', rotation=45)
    axes[0, 1].grid(True, alpha=0.3)
    
    # Risk-Return scatter
    volatilities = performance_df['Volatility'].values
    axes[1, 0].scatter(volatilities * 100, returns * 100, 
                      s=100, alpha=0.7, c=range(len(strategy_names)), cmap='viridis')
    
    for i, name in enumerate(strategy_names):
        axes[1, 0].annotate(name, (volatilities[i] * 100, returns[i] * 100))
    
    axes[1, 0].set_title('Risk-Return Profile')
    axes[1, 0].set_xlabel('Volatility (%)')
    axes[1, 0].set_ylabel('Return (%)')
    axes[1, 0].grid(True, alpha=0.3)
    
    # Sharpe ratio comparison
    sharpe_ratios = performance_df['Sharpe Ratio'].values
    colors = ['green' if s > 1 else 'orange' if s > 0 else 'red' for s in sharpe_ratios]
    axes[1, 1].bar(strategy_names, sharpe_ratios, color=colors, alpha=0.7)
    axes[1, 1].axhline(y=1, color='black', linestyle='--', alpha=0.5, label='Sharpe = 1')
    axes[1, 1].set_title('Sharpe Ratio Comparison')
    axes[1, 1].set_ylabel('Sharpe Ratio')
    axes[1, 1].tick_params(axis='x', rotation=45)
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return performance_df

# Analyze performance
performance_comparison = analyze_strategy_performance(backtest_results)

## 🎓 Summary and Conclusions

Congratulations! You've completed a comprehensive backtesting framework demonstration. Here's what you've accomplished:

### ✅ Key Achievements

1. **Complete Framework**: Built an end-to-end backtesting system
2. **Multi-Strategy Testing**: Compared different HFT strategies
3. **Realistic Simulation**: Included transaction costs, slippage, and market impact
4. **Comprehensive Analysis**: Generated detailed performance and risk metrics
5. **Professional Reporting**: Created publication-ready reports and visualizations

### 📊 Key Insights

From our backtesting framework, we learned:
- **Strategy Diversification**: Different strategies perform better in different market conditions
- **Risk Management**: Proper risk controls are essential for consistent performance
- **Transaction Costs**: Costs significantly impact HFT strategy profitability
- **Parameter Sensitivity**: Small parameter changes can have large performance impacts
- **Market Conditions**: Strategy performance varies with volatility and liquidity

### 🔧 Framework Features

Our backtesting framework includes:
- **Multi-asset support** for portfolio strategies
- **Realistic market simulation** with spreads and volumes
- **Transaction cost modeling** including commissions and slippage
- **Risk management integration** with real-time monitoring
- **Performance attribution** across strategies and assets
- **Professional reporting** with charts and metrics

### 🚀 Next Steps

To further enhance your HFT knowledge:
1. **Experiment with real data** from financial data providers
2. **Implement additional strategies** like statistical arbitrage
3. **Add machine learning** for signal generation
4. **Optimize parameters** using systematic approaches
5. **Deploy strategies** in paper trading environments

### 💡 Best Practices

Remember these key principles:
- **Always validate** on out-of-sample data
- **Include realistic costs** in all backtests
- **Monitor risk metrics** continuously
- **Document assumptions** and limitations
- **Regular strategy review** and optimization

### 📚 Additional Resources

Continue learning with:
- Academic papers on market microstructure
- Industry reports on HFT trends
- Open source trading frameworks
- Professional trading courses
- Financial data science resources

---

*You've now mastered the complete HFT simulation and backtesting workflow! 🎉*