# Volatility Estimators Comparison

This notebook explores and compares the four historical volatility estimators used in the Volts strategy:

1. **Parkinson (1980)** - Uses high-low range
2. **Garman-Klass (1980)** - Incorporates OHLC data
3. **Rogers-Satchell (1991)** - Accounts for drift
4. **Yang-Zhang (2000)** - Most robust, accounts for opening jumps

Use this notebook to understand the differences between estimators and choose the best one for your data.

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
import warnings
warnings.filterwarnings('ignore')

from volatility_estimators import VolatilityEstimator

plt.style.use('seaborn-v0_8-darkgrid')
print("Libraries loaded successfully!")

## Download Sample Data

Let's download price data for a few stocks to compare the estimators.

In [None]:
# Download data for multiple stocks
tickers = ['AAPL', 'MSFT', 'NVDA', 'GOOGL']
start_date = '2023-01-01'
end_date = '2024-01-01'

print(f"Downloading data for {len(tickers)} tickers...\n")

data_dict = {}
for ticker in tickers:
    df = yf.download(ticker, start=start_date, end=end_date, progress=False)
    data_dict[ticker] = df
    print(f"{ticker}: {len(df)} days")

print(f"\nData downloaded successfully!")

## Calculate All Volatility Estimators

Let's calculate all four estimators for each stock.

In [None]:
# Initialize volatility estimator
estimator = VolatilityEstimator(annualization_factor=252)
rolling_window = 20

# Calculate all estimators for each ticker
all_volatilities = {}

for ticker, df in data_dict.items():
    print(f"\nCalculating volatility for {ticker}...")
    all_vols = estimator.calculate_all(df, rolling_window=rolling_window)
    all_volatilities[ticker] = all_vols
    
    print(f"Mean volatility by estimator:")
    for method in all_vols.columns:
        mean_vol = all_vols[method].mean()
        print(f"  {method:20s}: {mean_vol:.2%}")

print("\nVolatility calculation complete!")

## Visualize Estimator Comparison

Compare all four estimators for each stock.

In [None]:
# Plot all estimators for each ticker
fig, axes = plt.subplots(len(tickers), 1, figsize=(14, 4*len(tickers)))

if len(tickers) == 1:
    axes = [axes]

colors = ['blue', 'green', 'red', 'purple']

for idx, ticker in enumerate(tickers):
    vol_df = all_volatilities[ticker]
    
    for col, color in zip(vol_df.columns, colors):
        axes[idx].plot(vol_df.index, vol_df[col], 
                      label=col.replace('_', ' ').title(), 
                      color=color, alpha=0.7, linewidth=1.5)
    
    axes[idx].set_title(f'{ticker} - Volatility Estimator Comparison', 
                       fontsize=12, fontweight='bold')
    axes[idx].set_ylabel('Annualized Volatility')
    axes[idx].legend(loc='best')
    axes[idx].grid(True, alpha=0.3)

axes[-1].set_xlabel('Date')
plt.tight_layout()
plt.show()

## Statistical Comparison

Compare the statistical properties of each estimator.

In [None]:
# Create comparison statistics
comparison_stats = []

for ticker in tickers:
    vol_df = all_volatilities[ticker]
    
    for method in vol_df.columns:
        series = vol_df[method].dropna()
        
        comparison_stats.append({
            'Ticker': ticker,
            'Estimator': method.replace('_', ' ').title(),
            'Mean': series.mean(),
            'Std': series.std(),
            'Min': series.min(),
            'Max': series.max(),
            'Median': series.median()
        })

stats_df = pd.DataFrame(comparison_stats)

print("\nVolatility Estimator Statistics:")
print("="*80)
display(stats_df)

In [None]:
# Visualize mean volatility by estimator
fig, ax = plt.subplots(figsize=(12, 6))

pivot_df = stats_df.pivot(index='Ticker', columns='Estimator', values='Mean')
pivot_df.plot(kind='bar', ax=ax, width=0.8)

ax.set_title('Mean Volatility by Estimator and Stock', fontsize=14, fontweight='bold')
ax.set_ylabel('Mean Annualized Volatility')
ax.set_xlabel('Stock')
ax.legend(title='Estimator', bbox_to_anchor=(1.05, 1), loc='upper left')
ax.grid(True, alpha=0.3, axis='y')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

## Correlation Between Estimators

Examine how closely the different estimators track each other.

In [None]:
# Calculate correlation between estimators for each stock
for ticker in tickers:
    vol_df = all_volatilities[ticker].dropna()
    
    print(f"\nCorrelation Matrix for {ticker}:")
    print("="*80)
    corr_matrix = vol_df.corr()
    display(corr_matrix)
    
    # Visualize correlation
    plt.figure(figsize=(8, 6))
    sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='coolwarm', 
                center=0.95, vmin=0.9, vmax=1.0,
                square=True, linewidths=1)
    plt.title(f'{ticker} - Correlation Between Volatility Estimators', 
             fontsize=12, fontweight='bold')
    plt.tight_layout()
    plt.show()

## Distribution Analysis

Compare the distributions of volatility estimates.

In [None]:
# Plot distribution of volatility estimates
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

estimators = ['parkinson', 'garman_klass', 'rogers_satchell', 'yang_zhang']

for idx, estimator_name in enumerate(estimators):
    for ticker in tickers:
        vol_series = all_volatilities[ticker][estimator_name].dropna()
        axes[idx].hist(vol_series, bins=30, alpha=0.5, label=ticker, density=True)
    
    axes[idx].set_title(f'{estimator_name.replace("_", " ").title()} Distribution', 
                       fontsize=12, fontweight='bold')
    axes[idx].set_xlabel('Volatility')
    axes[idx].set_ylabel('Density')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Key Insights

### Parkinson Estimator
- **Pros**: Simple, uses only high-low range
- **Cons**: Misses overnight gaps and jumps
- **Best for**: Intraday data with continuous trading

### Garman-Klass Estimator
- **Pros**: More efficient than Parkinson, uses OHLC
- **Cons**: Assumes log-normal distribution, no drift
- **Best for**: Assets without strong trends

### Rogers-Satchell Estimator
- **Pros**: Accounts for drift in prices
- **Cons**: Still misses overnight gaps
- **Best for**: Trending markets with continuous trading

### Yang-Zhang Estimator (RECOMMENDED)
- **Pros**: Most robust, accounts for opening jumps and drift
- **Cons**: More complex to calculate
- **Best for**: General use, especially with overnight gaps

### Observations from Analysis

1. **High Correlation**: All estimators are highly correlated (typically > 0.95)
2. **Relative Levels**: Parkinson often gives slightly different levels than others
3. **Stability**: Yang-Zhang is generally the most stable across different market conditions
4. **Recommendation**: Use Yang-Zhang as the default for the Volts strategy

## Test on Custom Data

You can test the estimators on your own ticker and date range.

In [None]:
# Customize these parameters
custom_ticker = 'TSLA'
custom_start = '2023-06-01'
custom_end = '2024-01-01'
custom_window = 20

print(f"Downloading data for {custom_ticker}...")
custom_df = yf.download(custom_ticker, start=custom_start, end=custom_end, progress=False)

print(f"Calculating volatility with {custom_window}-day window...")
custom_vols = estimator.calculate_all(custom_df, rolling_window=custom_window)

# Plot
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Price chart
axes[0].plot(custom_df.index, custom_df['Close'], linewidth=2, color='black')
axes[0].set_title(f'{custom_ticker} - Price', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Price ($)')
axes[0].grid(True, alpha=0.3)

# Volatility chart
for col, color in zip(custom_vols.columns, colors):
    axes[1].plot(custom_vols.index, custom_vols[col], 
                label=col.replace('_', ' ').title(), 
                color=color, alpha=0.7, linewidth=1.5)

axes[1].set_title(f'{custom_ticker} - Volatility Comparison', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Annualized Volatility')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nMean volatility for {custom_ticker}:")
for method in custom_vols.columns:
    print(f"  {method:20s}: {custom_vols[method].mean():.2%}")