![QuantConnect Logo](https://cdn.quantconnect.com/web/i/icon.png)
<hr>

# Markov Chain Analysis for LEAP Option Price Prediction

This notebook explores using Markov Chains to predict the day-to-day movement of LEAP (Long-term Equity Anticipation Securities) option prices using **real option data from QuantConnect**.

## LEAP Option Definition
A LEAP option is defined as:
- An **at-the-money (ATM)** option (within 5% of underlying price)
- With **more than one year (>365 days)** until expiration
- The option **closest to 365 days** to expiration in the option chain

## Data Sources
- **Option Prices**: QuantConnect historical option chain data
- **Greeks**: Delta, Gamma, Vega, Theta, Rho provided by QuantConnect API (not calculated)
- **Implied Volatility**: From QuantConnect option chain
- **Contract Filtering**: Limited to LEAP options matching our criteria to minimize data overhead

## Methodology
1. **Data Collection**: Pull historical LEAP option data from QuantConnect for selected symbols
2. **LEAP Selection**: Filter option chains to identify LEAP options matching our criteria
3. **State Definition**: Define discrete states based on option price movements (Up, Down, Flat)
4. **Transition Matrix**: Build a transition probability matrix from historical data
5. **Prediction & Alpha Analysis**: Evaluate if Markov Chain predictions have any predictive power (alpha)
6. **Greeks Analysis**: Examine how Greeks correlate with state transitions

---

In [1]:
# import packages
import autoreload
%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns
from datetime import datetime, timedelta
from io import StringIO

sns.set_style('darkgrid')
pd.plotting.register_matplotlib_converters()

qb = QuantBook()

## Part 1: Markov Chain Helper Functions
---

In [2]:
class MarkovChainAnalyzer:
    """
    A class to analyze price movements using Markov Chains.
    
    States are defined based on daily returns:
    - 'Up': Daily return > threshold
    - 'Down': Daily return < -threshold  
    - 'Flat': Daily return within [-threshold, threshold]
    """
    
    def __init__(self, threshold=0.01):
        """
        Initialize the Markov Chain Analyzer.
        
        Args:
            threshold: The return threshold to define Up/Down/Flat states (default 1%)
        """
        self.threshold = threshold
        self.states = ['Down', 'Flat', 'Up']
        self.transition_matrix = None
        self.state_counts = None
        
    def returns_to_states(self, returns):
        """
        Convert a series of returns to discrete states.
        
        Args:
            returns: pd.Series of daily returns
            
        Returns:
            pd.Series of state labels
        """
        conditions = [
            returns < -self.threshold,
            returns > self.threshold
        ]
        choices = ['Down', 'Up']
        states = np.select(conditions, choices, default='Flat')
        return pd.Series(states, index=returns.index)
    
    def build_transition_matrix(self, states):
        """
        Build a transition probability matrix from a sequence of states.
        
        Args:
            states: pd.Series of state labels
            
        Returns:
            pd.DataFrame representing the transition probability matrix
        """
        # Initialize counts matrix
        state_list = self.states
        n_states = len(state_list)
        counts = pd.DataFrame(
            np.zeros((n_states, n_states)),
            index=state_list,
            columns=state_list
        )
        
        # Count transitions (vectorized approach)
        current_states = states.iloc[:-1].values
        next_states = states.iloc[1:].values
        
        for i, curr in enumerate(current_states):
            next_s = next_states[i]
            counts.loc[curr, next_s] += 1
        
        # Convert to probabilities
        self.state_counts = counts.copy()
        row_sums = counts.sum(axis=1)
        self.transition_matrix = counts.div(row_sums, axis=0).fillna(0)
        
        return self.transition_matrix
    
    def predict_next_state(self, current_state):
        """
        Predict the most likely next state given the current state.
        
        Args:
            current_state: The current state ('Up', 'Down', or 'Flat')
            
        Returns:
            tuple: (predicted_state, probability)
        """
        if self.transition_matrix is None:
            raise ValueError("Transition matrix not built. Call build_transition_matrix first.")
        
        probs = self.transition_matrix.loc[current_state]
        predicted_state = probs.idxmax()
        probability = probs.max()
        
        return predicted_state, probability
    
    def calculate_stationary_distribution(self):
        """
        Calculate the stationary distribution of the Markov Chain.
        
        Returns:
            pd.Series representing the stationary distribution
        """
        if self.transition_matrix is None:
            raise ValueError("Transition matrix not built. Call build_transition_matrix first.")
        
        # Solve for stationary distribution: pi * P = pi
        P = self.transition_matrix.values.T
        n = P.shape[0]
        
        # Set up the system (P^T - I)pi = 0 with sum(pi) = 1
        A = np.vstack([P - np.eye(n), np.ones(n)])
        b = np.zeros(n + 1)
        b[-1] = 1
        
        # Solve using least squares
        pi, _, _, _ = np.linalg.lstsq(A, b, rcond=None)
        
        return pd.Series(pi, index=self.states)
    
    def backtest_predictions(self, returns, train_ratio=0.7):
        """
        Backtest the Markov Chain predictions.
        
        Args:
            returns: pd.Series of daily returns
            train_ratio: Ratio of data to use for training (default 0.7)
            
        Returns:
            dict: Backtest results including accuracy, predictions DataFrame
        """
        # Convert to states
        states = self.returns_to_states(returns)
        
        # Split into train/test
        split_idx = int(len(states) * train_ratio)
        train_states = states.iloc[:split_idx]
        test_states = states.iloc[split_idx:]
        
        # Build transition matrix on training data
        self.build_transition_matrix(train_states)
        
        # Make predictions on test data
        predictions = []
        actual = []
        
        for i in range(len(test_states) - 1):
            current = test_states.iloc[i]
            predicted, prob = self.predict_next_state(current)
            actual_next = test_states.iloc[i + 1]
            
            predictions.append({
                'date': test_states.index[i + 1],
                'current_state': current,
                'predicted_state': predicted,
                'actual_state': actual_next,
                'probability': prob,
                'correct': predicted == actual_next
            })
        
        results_df = pd.DataFrame(predictions)
        
        # Calculate metrics
        accuracy = results_df['correct'].mean()
        
        # Calculate accuracy by state
        accuracy_by_state = results_df.groupby('current_state')['correct'].mean()
        
        return {
            'accuracy': accuracy,
            'accuracy_by_state': accuracy_by_state,
            'predictions_df': results_df,
            'transition_matrix': self.transition_matrix,
            'n_train': len(train_states),
            'n_test': len(test_states)
        }

## Part 2: LEAP Option Selection Functions
---

In [3]:
def calculate_dte(expiry_date, reference_date):
    """
    Calculate days to expiration.
    
    Args:
        expiry_date: Option expiration date
        reference_date: Reference date for calculation
        
    Returns:
        int: Days to expiration
    """
    return (expiry_date - reference_date).days


def select_leap_option(option_chain, underlying_price, reference_date, target_dte=365):
    """
    Select the LEAP option from an option chain based on our criteria:
    - At-the-money (ATM)
    - More than 365 days to expiration
    - Closest to 365 days to expiration
    
    Args:
        option_chain: QuantConnect OptionChain object from the option chain API
        underlying_price: Current price of the underlying asset
        reference_date: The reference date for calculating days to expiration
        target_dte: Target days to expiration (default 365)
        
    Returns:
        Selected option contract or None if no valid LEAP found
    """
    if option_chain is None or len(list(option_chain)) == 0:
        return None
    
    # Convert to list for easier filtering
    contracts = list(option_chain)
    
    # Filter for options with DTE > 365 (LEAP requirement)
    leap_options = [x for x in contracts if calculate_dte(x.Expiry, reference_date) > target_dte]
    
    if len(leap_options) == 0:
        return None
    
    # Filter for at-the-money options (within 5% of underlying price)
    atm_tolerance = 0.05
    atm_options = [
        x for x in leap_options 
        if abs(x.Strike - underlying_price) / underlying_price <= atm_tolerance
    ]
    
    if len(atm_options) == 0:
        # If no options within 5%, find closest to ATM
        atm_options = sorted(leap_options, key=lambda x: abs(x.Strike - underlying_price))[:5]
    
    # Select the option closest to target_dte days to expiration
    selected = min(
        atm_options,
        key=lambda x: abs(calculate_dte(x.Expiry, reference_date) - target_dte)
    )
    
    return selected


def get_leap_option_history(qb, symbol, start_date, end_date, option_type='call', target_dte=365):
    """
    Get historical data for LEAP options using QuantConnect's option data.
    
    Args:
        qb: QuantBook instance
        symbol: Underlying symbol string (e.g., 'SPY')
        start_date: Start date for analysis
        end_date: End date for analysis
        option_type: 'call' or 'put'
        target_dte: Target days to expiration for LEAP (default 365)
        
    Returns:
        pd.DataFrame with LEAP option price and Greeks history
    """
    # Add underlying equity
    equity = qb.AddEquity(symbol, Resolution.Daily)
    equity_symbol = equity.Symbol
    
    # Add option universe for this equity
    option = qb.AddOption(symbol, Resolution.Daily)
    option.SetFilter(universe => universe.Strikes(-5, +5)
                                           .Expiration(timedelta(365), timedelta(730)))
    
    # Get option chain history
    option_history = qb.OptionHistory(equity_symbol, start_date, end_date, Resolution.Daily)
    
    if option_history.empty:
        print(f"No option history found for {symbol}")
        return None
    
    # Get equity price history for ATM calculation
    equity_history = qb.History(equity_symbol, start_date, end_date, Resolution.Daily)
    
    if equity_history.empty:
        print(f"No equity history found for {symbol}")
        return None
    
    # Process each date to find LEAP options
    leap_data = []
    
    for date in option_history.index.get_level_values('time').unique():
        # Get equity price for this date
        if isinstance(equity_history.index, pd.MultiIndex):
            equity_price_data = equity_history.loc[equity_symbol].loc[date]
        else:
            equity_price_data = equity_history.loc[date]
        
        underlying_price = equity_price_data['close']
        
        # Get option chain for this date
        chain = option_history.loc[date]
        
        # Filter by option type (call or put)
        if option_type.lower() == 'call':
            chain = chain[chain.index.get_level_values('right') == 'Call']
        else:
            chain = chain[chain.index.get_level_values('right') == 'Put']
        
        if chain.empty:
            continue
        
        # Find LEAP option for this date using helper function
        # Calculate DTE for all options
        chain['dte'] = chain.index.get_level_values('expiry').map(lambda exp: calculate_dte(pd.to_datetime(exp), date))
        leap_chain = chain[chain['dte'] > target_dte]
        
        if leap_chain.empty:
            continue
        
        # Filter for ATM (within 5% of underlying)
        leap_chain['strike'] = leap_chain.index.get_level_values('strike')
        leap_chain['atm_distance'] = abs(leap_chain['strike'] - underlying_price) / underlying_price
        atm_leap = leap_chain[leap_chain['atm_distance'] <= 0.05]
        
        if atm_leap.empty:
            # Take closest to ATM
            atm_leap = leap_chain.nsmallest(1, 'atm_distance')
        
        # Select closest to target_dte
        atm_leap['dte_distance'] = abs(atm_leap['dte'] - target_dte)
        selected = atm_leap.nsmallest(1, 'dte_distance')
        
        if not selected.empty:
            row = selected.iloc[0]
            leap_data.append({
                'date': date,
                'price': row.get('close', np.nan),
                'underlying_price': underlying_price,
                'strike': row['strike'],
                'dte': row['dte'],
                'delta': row.get('delta', np.nan),
                'gamma': row.get('gamma', np.nan),
                'vega': row.get('vega', np.nan),
                'theta': row.get('theta', np.nan),
                'rho': row.get('rho', np.nan),
                'implied_volatility': row.get('impliedvolatility', np.nan),
                'open_interest': row.get('openinterest', 0),
                'volume': row.get('volume', 0)
            })
    
    if len(leap_data) == 0:
        print(f"No LEAP options found for {symbol}")
        return None
    
    df = pd.DataFrame(leap_data)
    df.set_index('date', inplace=True)
    
    return df


## Part 3: Data Setup and Analysis
---

In [4]:
# Define analysis parameters
symbols = ['SPY', 'QQQ', 'IWM', 'AAPL', 'MSFT']  # Major ETFs and stocks with liquid LEAP options
start_date = datetime(2018, 1, 1)
end_date = datetime(2023, 12, 31)

# Markov Chain parameters
threshold = 0.01  # 1% threshold for Up/Down classification
train_ratio = 0.7  # 70% for training, 30% for testing

In [5]:
# Collect LEAP option price data for each symbol
# This uses QuantConnect's option data API with Greeks provided by the platform

price_data = {}
option_data_full = {}  # Store full data including Greeks

for symbol in symbols:
    try:
        print(f"Loading LEAP option data for {symbol}...")
        
        # Get LEAP option history with Greeks from QuantConnect API
        leap_df = get_leap_option_history(qb, symbol, start_date, end_date, 
                                          option_type='call', target_dte=365)
        
        if leap_df is not None and not leap_df.empty:
            # Store prices for Markov Chain analysis
            price_data[symbol] = leap_df['price']
            
            # Store full data including Greeks
            option_data_full[symbol] = leap_df
            
            print(f"  ✓ Loaded {len(leap_df)} days of LEAP option data for {symbol}")
            print(f"    - Avg DTE: {leap_df['dte'].mean():.0f} days")
            print(f"    - Avg Delta: {leap_df['delta'].mean():.3f}")
            print(f"    - Avg IV: {leap_df['implied_volatility'].mean():.3f}")
        else:
            print(f"  ✗ No LEAP option data found for {symbol}")
            
    except Exception as e:
        print(f"  ✗ Error loading {symbol}: {str(e)}")

print(f"\nSuccessfully loaded {len(price_data)} symbols with LEAP option data")


In [10]:
# Display Greeks and Option Characteristics
print("\n" + "="*80)
print("LEAP OPTION CHARACTERISTICS AND GREEKS (from QuantConnect API)")
print("="*80)

for symbol, opt_data in option_data_full.items():
    print(f"\n{symbol}:")
    print(f"  Sample Size: {len(opt_data)} trading days")
    print(f"  \nPrice Statistics:")
    print(f"    Mean Option Price: ${opt_data['price'].mean():.2f}")
    print(f"    Std Dev: ${opt_data['price'].std():.2f}")
    print(f"    Min/Max: ${opt_data['price'].min():.2f} / ${opt_data['price'].max():.2f}")
    print(f"  \nGreeks (from QuantConnect):")
    print(f"    Delta   - Mean: {opt_data['delta'].mean():.4f}, Std: {opt_data['delta'].std():.4f}")
    print(f"    Gamma   - Mean: {opt_data['gamma'].mean():.4f}, Std: {opt_data['gamma'].std():.4f}")
    print(f"    Vega    - Mean: {opt_data['vega'].mean():.4f}, Std: {opt_data['vega'].std():.4f}")
    print(f"    Theta   - Mean: {opt_data['theta'].mean():.4f}, Std: {opt_data['theta'].std():.4f}")
    print(f"    Rho     - Mean: {opt_data['rho'].mean():.4f}, Std: {opt_data['rho'].std():.4f}")
    print(f"  \nOther Characteristics:")
    print(f"    Implied Volatility - Mean: {opt_data['implied_volatility'].mean():.4f}")
    print(f"    Days to Expiration - Mean: {opt_data['dte'].mean():.0f}, Range: [{opt_data['dte'].min():.0f}, {opt_data['dte'].max():.0f}]")
    print(f"    Avg Strike: ${opt_data['strike'].mean():.2f}")
    print(f"    Avg Volume: {opt_data['volume'].mean():.0f}")
    print(f"    Avg Open Interest: {opt_data['open_interest'].mean():.0f}")


## Part 4: Markov Chain Analysis
---

In [7]:
# Run Markov Chain analysis for each symbol
results = {}

for symbol, prices in price_data.items():
    print(f"\n{'='*60}")
    print(f"Analyzing {symbol}")
    print(f"{'='*60}")
    
    # Calculate daily returns
    returns = prices.pct_change().dropna()
    
    # Initialize and run Markov Chain analysis
    analyzer = MarkovChainAnalyzer(threshold=threshold)
    backtest_results = analyzer.backtest_predictions(returns, train_ratio=train_ratio)
    
    results[symbol] = backtest_results
    
    # Print results
    print(f"\nTraining samples: {backtest_results['n_train']}")
    print(f"Testing samples: {backtest_results['n_test']}")
    print(f"\nOverall Prediction Accuracy: {backtest_results['accuracy']:.2%}")
    print(f"\nAccuracy by Current State:")
    print(backtest_results['accuracy_by_state'])
    print(f"\nTransition Matrix:")
    print(backtest_results['transition_matrix'].round(3))

In [8]:
# Visualize Transition Matrices
n_symbols = len(results)
fig, axes = plt.subplots(1, n_symbols, figsize=(5*n_symbols, 4))

if n_symbols == 1:
    axes = [axes]

for ax, (symbol, res) in zip(axes, results.items()):
    tm = res['transition_matrix']
    sns.heatmap(tm, annot=True, fmt='.2f', cmap='RdYlGn', ax=ax, vmin=0, vmax=1)
    ax.set_title(f'{symbol} Transition Probabilities')
    ax.set_xlabel('Next State')
    ax.set_ylabel('Current State')

plt.tight_layout()
plt.show()

## Part 5: Alpha Analysis
---
Evaluate if Markov Chain predictions provide any predictive edge (alpha) over random chance.

In [9]:
def calculate_alpha_metrics(results, threshold):
    """
    Calculate alpha metrics for the Markov Chain predictions.
    
    Compares prediction accuracy against:
    1. Random baseline (33.33% for 3 states)
    2. Historical frequency baseline
    
    Args:
        results: Dictionary of backtest results by symbol
        threshold: The return threshold used for state classification
        
    Returns:
        pd.DataFrame with alpha metrics
    """
    metrics = []
    
    for symbol, res in results.items():
        predictions_df = res['predictions_df']
        
        # Calculate historical state frequencies
        state_freq = predictions_df['actual_state'].value_counts(normalize=True)
        
        # Historical frequency baseline accuracy
        # (predicting the most common state every time)
        baseline_accuracy = state_freq.max()
        most_common_state = state_freq.idxmax()
        
        # Markov Chain accuracy
        mc_accuracy = res['accuracy']
        
        # Random baseline (1/3 for 3 states)
        random_baseline = 1/3
        
        # Alpha calculations
        alpha_vs_random = mc_accuracy - random_baseline
        alpha_vs_baseline = mc_accuracy - baseline_accuracy
        
        # Information Ratio (simplified)
        # Measure consistency of predictions
        accuracy_std = predictions_df.groupby('current_state')['correct'].std().mean()
        ir = alpha_vs_random / accuracy_std if accuracy_std > 0 else 0
        
        metrics.append({
            'Symbol': symbol,
            'MC_Accuracy': mc_accuracy,
            'Random_Baseline': random_baseline,
            'Freq_Baseline': baseline_accuracy,
            'Most_Common_State': most_common_state,
            'Alpha_vs_Random': alpha_vs_random,
            'Alpha_vs_Baseline': alpha_vs_baseline,
            'Information_Ratio': ir
        })
    
    return pd.DataFrame(metrics)

In [10]:
# Calculate alpha metrics
alpha_df = calculate_alpha_metrics(results, threshold)
print("\nAlpha Analysis Results:")
print("=" * 80)
print(alpha_df.to_string(index=False))

In [11]:
# Visualize Alpha Metrics
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Accuracy Comparison
ax1 = axes[0]
x = np.arange(len(alpha_df))
width = 0.25

bars1 = ax1.bar(x - width, alpha_df['MC_Accuracy'], width, label='Markov Chain', color='blue')
bars2 = ax1.bar(x, alpha_df['Freq_Baseline'], width, label='Frequency Baseline', color='orange')
bars3 = ax1.bar(x + width, alpha_df['Random_Baseline'], width, label='Random (33.3%)', color='gray')

ax1.set_ylabel('Accuracy')
ax1.set_title('Prediction Accuracy Comparison')
ax1.set_xticks(x)
ax1.set_xticklabels(alpha_df['Symbol'])
ax1.legend()
ax1.yaxis.set_major_formatter(mtick.PercentFormatter(1.0))
ax1.axhline(y=0.333, color='gray', linestyle='--', alpha=0.5)

# Plot 2: Alpha vs Baselines
ax2 = axes[1]
x = np.arange(len(alpha_df))
width = 0.35

bars1 = ax2.bar(x - width/2, alpha_df['Alpha_vs_Random'], width, label='Alpha vs Random', color='green')
bars2 = ax2.bar(x + width/2, alpha_df['Alpha_vs_Baseline'], width, label='Alpha vs Freq Baseline', color='purple')

ax2.set_ylabel('Alpha (Excess Accuracy)')
ax2.set_title('Predictive Alpha')
ax2.set_xticks(x)
ax2.set_xticklabels(alpha_df['Symbol'])
ax2.legend()
ax2.yaxis.set_major_formatter(mtick.PercentFormatter(1.0))
ax2.axhline(y=0, color='black', linestyle='-', alpha=0.5)

plt.tight_layout()
plt.show()

## Part 6: State Persistence Analysis
---
Analyze if certain states tend to persist (momentum) or mean-revert.

In [12]:
def analyze_state_persistence(results):
    """
    Analyze state persistence characteristics.
    
    Args:
        results: Dictionary of backtest results
        
    Returns:
        pd.DataFrame with persistence metrics
    """
    persistence_metrics = []
    
    for symbol, res in results.items():
        tm = res['transition_matrix']
        
        # Self-transition probabilities (persistence)
        up_persistence = tm.loc['Up', 'Up']
        down_persistence = tm.loc['Down', 'Down']
        flat_persistence = tm.loc['Flat', 'Flat']
        
        # Mean reversion indicators
        up_to_down = tm.loc['Up', 'Down']
        down_to_up = tm.loc['Down', 'Up']
        
        # Momentum vs Mean Reversion score
        # Positive = momentum, Negative = mean reversion
        momentum_score = (up_persistence + down_persistence) / 2 - (up_to_down + down_to_up) / 2
        
        persistence_metrics.append({
            'Symbol': symbol,
            'Up_Persistence': up_persistence,
            'Down_Persistence': down_persistence,
            'Flat_Persistence': flat_persistence,
            'Up_to_Down': up_to_down,
            'Down_to_Up': down_to_up,
            'Momentum_Score': momentum_score
        })
    
    return pd.DataFrame(persistence_metrics)

In [13]:
# Analyze state persistence
persistence_df = analyze_state_persistence(results)
print("\nState Persistence Analysis:")
print("=" * 80)
print("(Positive Momentum Score = trending behavior, Negative = mean reversion)")
print()
print(persistence_df.to_string(index=False))

In [14]:
# Visualize persistence
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(persistence_df))
width = 0.25

bars1 = ax.bar(x - width, persistence_df['Up_Persistence'], width, label='Up Persistence', color='green')
bars2 = ax.bar(x, persistence_df['Flat_Persistence'], width, label='Flat Persistence', color='gray')
bars3 = ax.bar(x + width, persistence_df['Down_Persistence'], width, label='Down Persistence', color='red')

ax.set_ylabel('Probability')
ax.set_title('State Persistence Probabilities by Symbol')
ax.set_xticks(x)
ax.set_xticklabels(persistence_df['Symbol'])
ax.legend()
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1.0))
ax.axhline(y=0.333, color='black', linestyle='--', alpha=0.5, label='Random (33.3%)')

plt.tight_layout()
plt.show()

## Part 7: Trading Strategy Simulation
---
Simulate a simple trading strategy based on Markov Chain predictions.

In [15]:
def simulate_trading_strategy(predictions_df, returns):
    """
    Simulate a trading strategy based on Markov Chain predictions.
    
    Strategy:
    - Go long if predicted state is 'Up'
    - Go short if predicted state is 'Down'
    - Stay flat if predicted state is 'Flat'
    
    Args:
        predictions_df: DataFrame with predictions
        returns: Original returns series
        
    Returns:
        dict with strategy performance metrics
    """
    # Align predictions with returns
    pred_dates = predictions_df['date'].values
    
    # Calculate strategy returns
    strategy_returns = []
    
    for _, row in predictions_df.iterrows():
        date = row['date']
        predicted = row['predicted_state']
        
        if date in returns.index:
            actual_return = returns.loc[date]
            
            if predicted == 'Up':
                strat_return = actual_return  # Long
            elif predicted == 'Down':
                strat_return = -actual_return  # Short
            else:
                strat_return = 0  # Flat
            
            strategy_returns.append({
                'date': date,
                'predicted': predicted,
                'actual_return': actual_return,
                'strategy_return': strat_return
            })
    
    strat_df = pd.DataFrame(strategy_returns)
    
    if len(strat_df) == 0:
        return None
    
    strat_df.set_index('date', inplace=True)
    
    # Calculate cumulative returns
    strat_df['cum_strategy'] = (1 + strat_df['strategy_return']).cumprod() - 1
    strat_df['cum_buy_hold'] = (1 + strat_df['actual_return']).cumprod() - 1
    
    # Calculate performance metrics
    total_strategy_return = strat_df['cum_strategy'].iloc[-1]
    total_buy_hold_return = strat_df['cum_buy_hold'].iloc[-1]
    
    # Annualized metrics (assuming 252 trading days)
    n_days = len(strat_df)
    years = n_days / 252
    
    annualized_strategy = (1 + total_strategy_return) ** (1/years) - 1 if years > 0 else 0
    annualized_buy_hold = (1 + total_buy_hold_return) ** (1/years) - 1 if years > 0 else 0
    
    # Sharpe ratio (assuming risk-free rate = 0)
    sharpe_strategy = strat_df['strategy_return'].mean() / strat_df['strategy_return'].std() * np.sqrt(252)
    sharpe_buy_hold = strat_df['actual_return'].mean() / strat_df['actual_return'].std() * np.sqrt(252)
    
    return {
        'returns_df': strat_df,
        'total_strategy_return': total_strategy_return,
        'total_buy_hold_return': total_buy_hold_return,
        'annualized_strategy': annualized_strategy,
        'annualized_buy_hold': annualized_buy_hold,
        'sharpe_strategy': sharpe_strategy,
        'sharpe_buy_hold': sharpe_buy_hold,
        'alpha': annualized_strategy - annualized_buy_hold
    }

In [16]:
# Run trading strategy simulation for each symbol
strategy_results = {}

for symbol, res in results.items():
    if symbol in price_data:
        returns = price_data[symbol].pct_change().dropna()
        strat_perf = simulate_trading_strategy(res['predictions_df'], returns)
        
        if strat_perf is not None:
            strategy_results[symbol] = strat_perf
            
            print(f"\n{'='*60}")
            print(f"{symbol} Trading Strategy Performance")
            print(f"{'='*60}")
            print(f"Total Strategy Return: {strat_perf['total_strategy_return']:.2%}")
            print(f"Total Buy & Hold Return: {strat_perf['total_buy_hold_return']:.2%}")
            print(f"Annualized Strategy Return: {strat_perf['annualized_strategy']:.2%}")
            print(f"Annualized Buy & Hold Return: {strat_perf['annualized_buy_hold']:.2%}")
            print(f"Strategy Sharpe Ratio: {strat_perf['sharpe_strategy']:.2f}")
            print(f"Buy & Hold Sharpe Ratio: {strat_perf['sharpe_buy_hold']:.2f}")
            print(f"Alpha (Strategy - B&H): {strat_perf['alpha']:.2%}")

In [17]:
# Plot cumulative returns comparison
n_plots = len(strategy_results)
if n_plots > 0:
    fig, axes = plt.subplots(2, (n_plots + 1) // 2, figsize=(14, 10))
    axes = axes.flatten() if n_plots > 1 else [axes]
    
    for ax, (symbol, strat_perf) in zip(axes, strategy_results.items()):
        df = strat_perf['returns_df']
        
        ax.plot(df.index, df['cum_strategy'], label='Markov Strategy', color='blue')
        ax.plot(df.index, df['cum_buy_hold'], label='Buy & Hold', color='gray', alpha=0.7)
        
        ax.set_title(f'{symbol} Cumulative Returns')
        ax.set_xlabel('Date')
        ax.set_ylabel('Cumulative Return')
        ax.yaxis.set_major_formatter(mtick.PercentFormatter(1.0))
        ax.legend()
        ax.axhline(y=0, color='black', linestyle='--', alpha=0.3)
    
    # Hide unused subplots
    for ax in axes[n_plots:]:
        ax.set_visible(False)
    
    plt.tight_layout()
    plt.show()

## Part 8: Summary and Conclusions
---

In [18]:
# Create summary table
summary_data = []

for symbol in results.keys():
    if symbol in strategy_results:
        alpha_row = alpha_df[alpha_df['Symbol'] == symbol].iloc[0]
        strat_row = strategy_results[symbol]
        persist_row = persistence_df[persistence_df['Symbol'] == symbol].iloc[0]
        
        summary_data.append({
            'Symbol': symbol,
            'Prediction_Accuracy': alpha_row['MC_Accuracy'],
            'Alpha_vs_Random': alpha_row['Alpha_vs_Random'],
            'Strategy_Return': strat_row['annualized_strategy'],
            'Buy_Hold_Return': strat_row['annualized_buy_hold'],
            'Strategy_Alpha': strat_row['alpha'],
            'Sharpe_Ratio': strat_row['sharpe_strategy'],
            'Momentum_Score': persist_row['Momentum_Score']
        })

summary_df = pd.DataFrame(summary_data)

print("\n" + "="*100)
print("SUMMARY: Markov Chain Analysis for LEAP Option Price Prediction")
print("="*100)
print()
print(summary_df.to_string(index=False))

In [19]:
# Final analysis
print("\n" + "="*100)
print("KEY FINDINGS")
print("="*100)

avg_accuracy = alpha_df['MC_Accuracy'].mean()
avg_alpha_vs_random = alpha_df['Alpha_vs_Random'].mean()
avg_strategy_alpha = summary_df['Strategy_Alpha'].mean() if len(summary_df) > 0 else 0

print(f"\n1. PREDICTION ACCURACY:")
print(f"   - Average Markov Chain Accuracy: {avg_accuracy:.2%}")
print(f"   - Average Alpha vs Random: {avg_alpha_vs_random:.2%}")
print(f"   - Interpretation: {'Markov Chains provide some predictive edge' if avg_alpha_vs_random > 0 else 'Limited predictive value'}")

print(f"\n2. TRADING STRATEGY PERFORMANCE:")
print(f"   - Average Strategy Alpha: {avg_strategy_alpha:.2%}")
print(f"   - Interpretation: {'Strategy outperforms buy & hold' if avg_strategy_alpha > 0 else 'Strategy underperforms buy & hold'}")

print(f"\n3. STATE PERSISTENCE:")
avg_momentum = persistence_df['Momentum_Score'].mean()
print(f"   - Average Momentum Score: {avg_momentum:.3f}")
print(f"   - Interpretation: {'Market shows momentum tendencies' if avg_momentum > 0 else 'Market shows mean-reversion tendencies'}")

print(f"\n4. CONCLUSION:")
print(f"   Markov Chains {'show potential' if avg_alpha_vs_random > 0.05 else 'have limited alpha'} for predicting LEAP option price movements.")
print(f"   The transition probabilities reveal {'persistent' if avg_momentum > 0 else 'mean-reverting'} price behavior.")
print(f"   Further analysis with actual LEAP option data and different state definitions is recommended.")

---
## Notes on Implementation

### Data Sources:
1. **Option Data**: Uses QuantConnect's historical option chain data with actual LEAP option prices
2. **Greeks**: Greeks (Delta, Gamma, Vega, Theta, Rho) are provided directly by QuantConnect API, not calculated
3. **Implied Volatility**: IV data comes from QuantConnect's option chain
4. **Contract Selection**: Limited to ATM LEAP options with >365 DTE to minimize data overhead

### LEAP Selection Criteria:
- **At-the-Money (ATM)**: Strike price within 5% of underlying price
- **Days to Expiration (DTE)**: More than 365 days
- **Target Selection**: Closest to 365 DTE in the option chain
- **Option Type**: Call options (can be changed to puts via parameter)

### Limitations of This Analysis:
1. **Transaction Costs**: The trading simulation does not account for transaction costs, bid-ask spreads, or slippage
2. **Liquidity**: LEAP options may have lower liquidity than near-term options, affecting execution
3. **Rolling Strategy**: This analysis doesn't handle rolling positions as expiration approaches
4. **Data Quality**: Historical option data quality depends on QuantConnect's data coverage for the period

### Potential Improvements:
1. Implement higher-order Markov Chains (considering multiple previous states)
2. Add more states (e.g., Strong Up, Weak Up, Flat, Weak Down, Strong Down)
3. Combine with Greek-based features (Delta, Gamma, Vega) for enhanced predictions
4. Incorporate implied volatility changes as a state dimension
5. Test different threshold values for state classification
6. Implement walk-forward optimization for more robust backtesting
7. Add rolling strategy that maintains LEAP positions as they approach expiration
8. Compare call vs put LEAP options performance
