# ⭐ Tutorial: Microstructural Features with RiskLabAI

This notebook is a tutorial for the microstructural feature estimators in the `RiskLabAI` library, based on Chapter 19 of 'Advances in Financial Machine Learning' by Marcos López de Prado.

These features are considered "first-generation" methods as they are derived directly from price data (High, Low, Close) to estimate market properties like illiquidity and volatility.

We will demonstrate:
1.  **Data Preparation:** Load daily High, Low, and Close (HLC) data for SPY.
2.  **Corwin-Schultz Spread:** Use `corwin_schultz_estimator` to estimate the bid-ask spread, a measure of market liquidity.
3.  **Bekker-Parkinson Volatility:** Use `bekker_parkinson_volatility_estimates` to estimate volatility, which adjusts for the spread.
4.  **Visualization:** Plot the price, the estimated spread, and the estimated volatility to see how they relate.
5.  **Conclusion:** Summarize the applications of these estimators.

## 0. Setup and Imports

First, we import our libraries and the necessary modules from `RiskLabAI`.

In [None]:
# Standard Imports
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt

# RiskLabAI Imports
import RiskLabAI.features.microstructural_features as msf
import RiskLabAI.utils.publication_plots as pub_plots

# --- Notebook Configuration ---
pub_plots.setup_publication_style()

## 1. Load and Prepare Data

We'll load daily OHLCV data for SPY (an ETF tracking the S&P 500) from 2010 to 2023. These estimators only require the `High`, `Low`, and `Close` prices.

In [None]:
# Load SPY data
data = yf.Ticker("SPY").history(start="2010-01-01", end="2023-01-01")

# Prepare the required price series
high_prices = data['High']
low_prices = data['Low']
close_prices = data['Close']

print("Loaded OHLCV data:")
data.head()

## 2. Corwin-Schultz Spread Estimator

The Corwin-Schultz (2012) estimator provides a way to estimate the bid-ask spread using only daily High and Low prices. A higher spread implies lower liquidity.

We will calculate it using a 20-day rolling window (`window_span=20`).

In [None]:
print("Calculating Corwin-Schultz Spread...")
spread_cs = msf.corwin_schultz_estimator(
    high_prices,
    low_prices,
    window_span=20
)

spread_cs.name = "Corwin-Schultz Spread"
print("Calculation complete.")
spread_cs.tail()

## 3. Bekker-Parkinson Volatility Estimator

The Bekker-Parkinson estimator (an adjustment to the classic Parkinson volatility) provides an estimate of volatility that is adjusted for the bid-ask spread (using the `beta` and `gamma` components from the Corwin-Schultz calculation).

This gives a more robust volatility estimate than simple close-to-close calculations.

In [None]:
print("Calculating Bekker-Parkinson Volatility...")
vol_bp = msf.bekker_parkinson_volatility_estimates(
    high_prices,
    low_prices,
    window_span=20
)

vol_bp.name = "Bekker-Parkinson Volatility"
print("Calculation complete.")
vol_bp.tail()

## 4. Visualization

Let's plot the SPY price alongside our two new features. We'll use three stacked plots to see the relationships clearly.

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(
    3, 1, 
    figsize=(14, 15), 
    sharex=True, 
    gridspec_kw={'height_ratios': [2, 1, 1]}
)
fig.suptitle('Microstructural Features for SPY (2010-2023)', fontsize=16)

# --- Panel 1: Price ---
ax1.plot(close_prices.index, close_prices, label='SPY Close Price', color='C0')
pub_plots.apply_plot_style(ax1, 'SPY Close Price', '', 'Price ($)')
ax1.legend(loc='upper left')

# --- Panel 2: Spread ---
ax2.plot(spread_cs.index, spread_cs, label='Corwin-Schultz Spread (20d)', color='C1')
pub_plots.apply_plot_style(ax2, 'Estimated Bid-Ask Spread', '', 'Spread (S)')
ax2.legend(loc='upper left')

# --- Panel 3: Volatility ---
ax3.plot(vol_bp.index, vol_bp, label='Bekker-Parkinson Vol (20d)', color='C2')
pub_plots.apply_plot_style(ax3, 'Estimated Volatility', 'Date', 'Volatility (σ)')
ax3.legend(loc='upper left')

plt.tight_layout(rect=[0, 0.03, 1, 0.97])
plt.show()

**Analysis:** The plots clearly show the value of these estimators.

* **Spread (Plot 2):** The estimated bid-ask spread (liquidity) is not constant. It spiked dramatically during high-stress periods like the 2011 Flash Crash, the 2018 "Volmageddon," and the 2020 COVID-19 crash. This shows it's a good measure of market stress and illiquidity.
* **Volatility (Plot 3):** The Bekker-Parkinson volatility estimate closely tracks the same periods of market stress, providing a much more responsive measure of risk than a simple rolling close-to-close standard deviation would.

## 5. Conclusion

This notebook demonstrated how to use the `RiskLabAI.features.microstructural_features` module to extract valuable, first-generation features from simple HLC data.

1.  **`corwin_schultz_estimator`:** Provides a rolling estimate of the bid-ask spread, which serves as a powerful proxy for market liquidity.
2.  **`bekker_parkinson_volatility_estimates`:** Provides a rolling estimate of volatility that is inherently adjusted for the bid-ask spread, making it a more robust feature for risk models.

These features can be used directly in machine learning models to help predict market regimes or forecast risk.