In [1]:
# ---
# title: 06. Alpha, Beta, and Sigma
# tags: [Finance, Statistics, Scikit-Learn]
# difficulty: Intermediate
# ---

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from pathlib import Path

# Setup
processed_path = Path("../data/silver")
returns_file = max(list(processed_path.glob("market_returns_*.parquet")), key=lambda f: f.stat().st_mtime)
df_returns = pd.read_parquet(returns_file)

# Calculate Market Return (Equal Weight Proxy)
market_return = df_returns.mean(axis=1)
df_returns['MARKET'] = market_return

target_stock = 'NVDA'

# Decomposing Returns

A stock's return can be explained by two components:
1. **Beta** ($\beta$): Return due to the overall market moving.
2. **Alpha** ($\alpha$): Return specific to the stock (The "Edge").

Formula: $R_i = \alpha + \beta R_m + \epsilon$

In [2]:
# Linear Regression (CAPM Style)
X = df_returns['MARKET'].values.reshape(-1, 1)
y = df_returns[target_stock].values

model = LinearRegression()
model.fit(X, y)

beta = model.coef_[0]
alpha = model.intercept_ * 252 # Annualized Alpha

print(f"{target_stock} Beta: {beta:.2f}")
print(f"{target_stock} Annualized Alpha: {alpha*100:.2f}%")

# Plot
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.5, label='Daily Returns')
plt.plot(X, model.predict(X), color='red', linewidth=2, label=f'Beta Line (Slope={beta:.2f})')
plt.xlabel('Market Return')
plt.ylabel(f'{target_stock} Return')
plt.title(f'Beta Calculation: {target_stock} vs Market')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

ValueError: Input X contains NaN.
LinearRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

### Sigma (Idiosyncratic Risk)

The residuals (errors) of this regression represent the risk that is specific to the stock and NOT correlated with the market. This is the risk we can diversify away.

In [None]:
residuals = y - model.predict(X)
sigma_idiosyncratic = np.std(residuals) * np.sqrt(252)

print(f"Idiosyncratic Risk (Sigma): {sigma_idiosyncratic*100:.2f}%")