# Opening Range Breakout (ORB) Strategy Backtest

This notebook demonstrates the Opening Range Breakout strategy based on the research paper:
"Can Day Trading Really Be Profitable?" by Carlo Zarattini and Andrew Aziz (2023)

## Strategy Overview
- Identify the opening range during the first 5 minutes of trading
- Enter long on breakout above the range, short on breakout below
- Stop loss at the opposite side of the range
- Profit target at 10x risk (10R)
- Exit all positions at market close

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import glob
import sys
from pathlib import Path
from datetime import datetime, time

# Add parent directory to path for imports
sys.path.append('..')

# Import our modules
from src.data.preprocessor import DataPreprocessor
from src.strategies.examples.orb import OpeningRangeBreakout
from src.backtesting.engines.vectorbt_engine import VectorBTEngine
from src.backtesting.costs import TransactionCostEngine, CommissionModel

# Configure display
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_rows', 100)

# Plot settings
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (14, 8)

## 1. Load and Prepare Data

We'll use QQQ data to replicate the paper's approach. If QQQ is not available, we'll use SPY as a proxy.

In [None]:
# Try to load QQQ data first, fall back to SPY
qqq_files = sorted(glob.glob('../data/raw/minute_aggs/by_symbol/QQQ/*.csv.gz'))
spy_files = sorted(glob.glob('../data/raw/minute_aggs/by_symbol/SPY/*.csv.gz'))

if qqq_files:
    data_files = qqq_files
    symbol = 'QQQ'
    print(f"Using QQQ data ({len(data_files)} months available)")
elif spy_files:
    data_files = spy_files
    symbol = 'SPY'
    print(f"QQQ not found, using SPY data ({len(data_files)} months available)")
else:
    print("No data found. Please ensure data has been downloaded.")
    data_files = []

# Load first 3 months for demo
if data_files:
    dfs = []
    for file in data_files[:3]:  # First 3 months
        df = pd.read_csv(file, compression='gzip')
        dfs.append(df)
    
    raw_data = pd.concat(dfs, ignore_index=True)
    print(f"\nLoaded {len(raw_data)} minute bars")
    print(f"Columns: {raw_data.columns.tolist()}")

In [None]:
# Preprocess the data
if data_files:
    preprocessor = DataPreprocessor()
    
    # Process data
    clean_data = preprocessor.process_polygon_data(
        raw_data,
        symbol=symbol,
        cache_key=f'{symbol.lower()}_3months_orb'
    )
    
    print(f"Processed data shape: {clean_data.shape}")
    print(f"Date range: {clean_data.index[0]} to {clean_data.index[-1]}")
    
    # Show sample intraday data
    sample_day = clean_data.index.date[0]
    sample_data = clean_data[clean_data.index.date == sample_day]
    print(f"\nSample day ({sample_day}): {len(sample_data)} bars")
    print(f"Market hours: {sample_data.index[0].time()} to {sample_data.index[-1].time()}")

## 2. ORB Strategy Configuration

Following the paper's parameters:
- 5-minute opening range
- Stop loss at opposite side of range
- Profit target at 10R
- 1% risk per trade

In [None]:
# Create ORB strategy with paper's parameters
orb_strategy = OpeningRangeBreakout(parameters={
    'range_minutes': 5,           # 5-minute opening range
    'range_type': 'high_low',     # Use high/low of range
    'stop_type': 'range',         # Stop at opposite side of range
    'profit_target_r': 10.0,      # 10R profit target
    'exit_at_close': True,        # Exit at market close
    'risk_per_trade': 0.01,       # 1% risk per trade
    'position_sizing': 'fixed',   # Fixed position sizing
    'use_volume_filter': False,   # No volume filter in paper
    'trade_both_directions': True # Trade both long and short
})

# Show strategy metadata
metadata = orb_strategy.get_metadata()
print(f"Strategy: {metadata.name}")
print(f"Version: {metadata.version}")
print(f"\nParameters:")
for key, value in orb_strategy.parameters.items():
    print(f"  {key}: {value}")

## 3. Generate Trading Signals

In [None]:
if data_files:
    # Generate signals
    signals = orb_strategy.generate_signals(clean_data)
    
    # Analyze signals by day
    signal_dates = signals[signals != 0].index.date
    unique_signal_days = np.unique(signal_dates)
    
    print(f"Total trading days: {len(np.unique(clean_data.index.date))}")
    print(f"Days with signals: {len(unique_signal_days)}")
    print(f"\nSignal Statistics:")
    print(f"  Total signals: {(signals != 0).sum()}")
    print(f"  Long signals: {(signals > 0).sum()}")
    print(f"  Short signals: {(signals < 0).sum()}")
    print(f"  Signal ratio: {(signals > 0).sum() / max(1, (signals < 0).sum()):.2f}")
    
    # Show first few signals
    print("\nFirst 5 signals:")
    first_signals = signals[signals != 0].head(5)
    for idx, signal in first_signals.items():
        direction = 'LONG' if signal > 0 else 'SHORT'
        print(f"  {idx}: {direction}")

## 4. Visualize Opening Range Pattern

In [None]:
if data_files:
    # Find a day with a signal to visualize
    signal_days = signals[signals != 0].index.date
    if len(signal_days) > 0:
        example_day = signal_days[0]
        day_data = clean_data[clean_data.index.date == example_day]
        day_signals = signals[signals.index.date == example_day]
        
        # Calculate opening range
        opening_range = day_data.iloc[:5]
        range_high = opening_range['high'].max()
        range_low = opening_range['low'].min()
        
        # Create visualization
        fig, ax = plt.subplots(figsize=(14, 8))
        
        # Plot price
        ax.plot(day_data.index, day_data['close'], 'b-', linewidth=1, label='Close Price')
        
        # Plot opening range
        ax.axhline(range_high, color='green', linestyle='--', alpha=0.7, label=f'Range High: ${range_high:.2f}')
        ax.axhline(range_low, color='red', linestyle='--', alpha=0.7, label=f'Range Low: ${range_low:.2f}')
        
        # Shade opening range period
        ax.axvspan(day_data.index[0], day_data.index[4], alpha=0.2, color='gray', label='Opening Range (5 min)')
        
        # Mark signals
        for idx, signal in day_signals[day_signals != 0].items():
            if signal > 0:
                ax.scatter(idx, day_data.loc[idx, 'close'], color='green', marker='^', s=200, zorder=5)
                ax.annotate('LONG', (idx, day_data.loc[idx, 'close']), xytext=(0, 20), 
                           textcoords='offset points', ha='center', color='green', fontweight='bold')
            else:
                ax.scatter(idx, day_data.loc[idx, 'close'], color='red', marker='v', s=200, zorder=5)
                ax.annotate('SHORT', (idx, day_data.loc[idx, 'close']), xytext=(0, -30), 
                           textcoords='offset points', ha='center', color='red', fontweight='bold')
        
        # Format x-axis
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
        ax.xaxis.set_major_locator(mdates.HourLocator(interval=1))
        
        ax.set_title(f'ORB Strategy Example - {example_day}')
        ax.set_xlabel('Time')
        ax.set_ylabel('Price ($)')
        ax.legend(loc='best')
        ax.grid(True, alpha=0.3)
        
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
        
        # Print range statistics
        range_size = range_high - range_low
        range_pct = (range_size / range_low) * 100
        print(f"\nOpening Range Statistics for {example_day}:")
        print(f"  Range Size: ${range_size:.2f} ({range_pct:.2f}%)")
        print(f"  Range High: ${range_high:.2f}")
        print(f"  Range Low: ${range_low:.2f}")

## 5. Backtest the Strategy

In [None]:
if data_files:
    # Configure transaction costs (as per paper: $0.0005/share)
    commission_model = CommissionModel(model_type='per_share', rate=0.0005)
    cost_engine = TransactionCostEngine(commission_model=commission_model)
    
    # Create backtesting engine
    engine = VectorBTEngine(transaction_costs=cost_engine)
    
    # Run backtest with paper's initial capital
    initial_capital = 25000  # $25,000 as per paper
    
    backtest_result = engine.run_backtest(
        strategy=orb_strategy,
        data=clean_data,
        initial_capital=initial_capital,
        commission=0.0005,  # Per share
        slippage=0.0001     # Minimal slippage
    )
    
    portfolio = backtest_result['portfolio']
    metrics = backtest_result['metrics']
    
    print("Backtest completed successfully!")

## 6. Performance Analysis

Compare our results with the paper's findings.

In [None]:
if data_files:
    print("=== ORB Strategy Performance Summary ===")
    print(f"\nPeriod: {clean_data.index[0].date()} to {clean_data.index[-1].date()}")
    print(f"Trading Days: {len(np.unique(clean_data.index.date))}")
    print(f"Initial Capital: ${initial_capital:,}")
    print(f"Final Value: ${portfolio.value()[-1]:,.2f}")
    
    print(f"\nReturns:")
    print(f"  Total Return: {metrics['total_return']:.2%}")
    print(f"  Annual Return: {metrics['annual_return']:.2%}")
    print(f"  Daily Avg Return: {metrics.get('daily_return', 0):.3%}")
    
    print(f"\nRisk Metrics:")
    print(f"  Sharpe Ratio: {metrics['sharpe_ratio']:.2f}")
    print(f"  Sortino Ratio: {metrics['sortino_ratio']:.2f}")
    print(f"  Max Drawdown: {metrics['max_drawdown']:.2%}")
    print(f"  Volatility (Annual): {metrics.get('annual_volatility', 0):.2%}")
    
    print(f"\nTrading Statistics:")
    print(f"  Total Trades: {metrics['total_trades']}")
    print(f"  Win Rate: {metrics['win_rate']:.2%}")
    print(f"  Profit Factor: {metrics['profit_factor']:.2f}")
    print(f"  Avg Win: ${metrics['avg_win']:.2f}")
    print(f"  Avg Loss: ${metrics['avg_loss']:.2f}")
    print(f"  Avg Win/Loss Ratio: {abs(metrics['avg_win'] / metrics['avg_loss']):.2f}")
    
    # Compare with paper's findings
    print(f"\n=== Paper Comparison ===")
    print(f"Paper's Win Rate: 24%")
    print(f"Our Win Rate: {metrics['win_rate']:.2%}")
    print(f"\nPaper's Avg PnL per trade: 0.13R")
    if metrics['total_trades'] > 0:
        avg_r = (metrics['total_return'] * initial_capital) / (initial_capital * 0.01 * metrics['total_trades'])
        print(f"Our Avg PnL per trade: {avg_r:.2f}R")

## 7. Equity Curve and Drawdown Analysis

In [None]:
if data_files:
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10), sharex=True)
    
    # Equity curve
    portfolio_value = portfolio.value()
    portfolio_value.plot(ax=ax1, label='ORB Strategy', color='blue', linewidth=2)
    
    # Add benchmark (buy and hold)
    benchmark_value = initial_capital * (clean_data['close'] / clean_data['close'].iloc[0])
    benchmark_value.plot(ax=ax1, label=f'Buy & Hold {symbol}', color='gray', linewidth=1, alpha=0.7)
    
    ax1.set_ylabel('Portfolio Value ($)')
    ax1.set_title('ORB Strategy Performance vs Buy & Hold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.set_yscale('log')  # Log scale to better show percentage changes
    
    # Drawdown
    drawdown = portfolio.drawdown() * 100
    drawdown.plot(ax=ax2, label='Strategy Drawdown', color='red', linewidth=1)
    ax2.fill_between(drawdown.index, 0, drawdown, color='red', alpha=0.3)
    
    ax2.set_ylabel('Drawdown (%)')
    ax2.set_xlabel('Date')
    ax2.set_title('Drawdown Analysis')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print comparison stats
    benchmark_return = (benchmark_value.iloc[-1] / initial_capital - 1) * 100
    strategy_return = metrics['total_return'] * 100
    outperformance = strategy_return - benchmark_return
    
    print(f"\nPerformance Comparison:")
    print(f"  ORB Strategy Return: {strategy_return:.2f}%")
    print(f"  Buy & Hold Return: {benchmark_return:.2f}%")
    print(f"  Outperformance: {outperformance:+.2f}%")

## 8. Daily PnL Analysis (in R-multiples)

In [None]:
if data_files:
    # Calculate daily PnL
    daily_returns = portfolio.returns()
    daily_pnl = daily_returns * portfolio.value().shift(1)
    
    # Convert to R-multiples (assuming 1% risk per trade)
    daily_pnl_r = daily_pnl / (initial_capital * 0.01)
    
    # Create histogram similar to paper's Figure 3
    fig, ax = plt.subplots(figsize=(12, 6))
    
    # Filter out zero days
    non_zero_pnl = daily_pnl_r[daily_pnl_r != 0]
    
    # Create bins
    bins = np.arange(-1.5, 11, 0.5)
    
    # Plot histogram
    n, bins, patches = ax.hist(non_zero_pnl, bins=bins, alpha=0.7, edgecolor='black')
    
    # Color negative bars red, positive bars green
    for i, patch in enumerate(patches):
        if bins[i] < 0:
            patch.set_facecolor('red')
        else:
            patch.set_facecolor('green')
    
    ax.axvline(0, color='black', linestyle='--', linewidth=2)
    ax.set_xlabel('Daily PnL (in R)')
    ax.set_ylabel('Frequency')
    ax.set_title('Daily PnL Distribution (R-multiples)')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Statistics
    print(f"Daily PnL Statistics (R-multiples):")
    print(f"  Days with trades: {len(non_zero_pnl)}")
    print(f"  Average: {non_zero_pnl.mean():.3f}R")
    print(f"  Median: {non_zero_pnl.median():.3f}R")
    print(f"  Std Dev: {non_zero_pnl.std():.3f}R")
    print(f"  Max Win: {non_zero_pnl.max():.2f}R")
    print(f"  Max Loss: {non_zero_pnl.min():.2f}R")

## 9. Trade Analysis

In [None]:
if data_files and 'trades' in backtest_result:
    trades_df = backtest_result['trades']
    
    if len(trades_df) > 0:
        # Analyze trade distribution
        long_trades = trades_df[trades_df['size'] > 0]
        short_trades = trades_df[trades_df['size'] < 0]
        
        print(f"Trade Direction Analysis:")
        print(f"  Long trades: {len(long_trades)} ({len(long_trades)/len(trades_df)*100:.1f}%)")
        print(f"  Short trades: {len(short_trades)} ({len(short_trades)/len(trades_df)*100:.1f}%)")
        
        print(f"\nLong Trade Performance:")
        if len(long_trades) > 0:
            long_wins = long_trades[long_trades['pnl'] > 0]
            print(f"  Win rate: {len(long_wins)/len(long_trades)*100:.1f}%")
            print(f"  Avg PnL: ${long_trades['pnl'].mean():.2f}")
            print(f"  Total PnL: ${long_trades['pnl'].sum():.2f}")
        
        print(f"\nShort Trade Performance:")
        if len(short_trades) > 0:
            short_wins = short_trades[short_trades['pnl'] > 0]
            print(f"  Win rate: {len(short_wins)/len(short_trades)*100:.1f}%")
            print(f"  Avg PnL: ${short_trades['pnl'].mean():.2f}")
            print(f"  Total PnL: ${short_trades['pnl'].sum():.2f}")
        
        # Time of day analysis
        trades_df['hour'] = pd.to_datetime(trades_df['entry_time']).dt.hour
        hourly_trades = trades_df.groupby('hour').agg({
            'pnl': ['count', 'sum', 'mean'],
            'return': 'mean'
        })
        
        print(f"\nTrades by Hour of Day:")
        print(hourly_trades)

## 10. Parameter Sensitivity Analysis

Test different opening range periods and profit targets.

In [None]:
if data_files:
    # Define parameter grid
    param_grid = {
        'range_minutes': [5, 10, 15],
        'profit_target_r': [5.0, 10.0, 15.0],
        'stop_type': ['range']
    }
    
    print("Running parameter optimization...")
    
    # Run optimization
    optimization_result = engine.optimize_parameters(
        strategy_class=OpeningRangeBreakout,
        data=clean_data,
        param_grid=param_grid,
        metric='sharpe_ratio',
        initial_capital=initial_capital
    )
    
    # Display results
    print("\nTop 5 Parameter Combinations:")
    print("="*80)
    for i, (params, metrics) in enumerate(optimization_result['results'][:5]):
        print(f"\n{i+1}. Range: {params['range_minutes']}min, Target: {params['profit_target_r']}R")
        print(f"   Sharpe: {metrics['sharpe_ratio']:.2f}, Return: {metrics['total_return']:.2%}, " 
              f"Trades: {metrics['total_trades']}, Win Rate: {metrics['win_rate']:.1%}")

## 11. Conclusion

### Summary
This notebook demonstrated the Opening Range Breakout (ORB) strategy as described in the research paper. Key findings:

1. **Signal Generation**: The strategy generates relatively few signals (typically 1-3 per day) as it requires a clear breakout from the opening range.

2. **Win Rate**: The paper reported a 24% win rate, which aligns with a high R-multiple strategy where small frequent losses are offset by larger wins.

3. **Risk Management**: The 10R profit target creates an asymmetric risk/reward profile essential for profitability with a low win rate.

### Next Steps
- Test with actual QQQ data to better replicate the paper
- Implement ATR-based stops as suggested in the paper's optimization
- Test with leveraged ETFs (TQQQ) to explore the leverage effect
- Add walk-forward analysis to validate out-of-sample performance
- Compare different market regimes (bull vs bear markets)