## CAPM Regression of Returns to Find Alpha & Beta ##
 **The Capital Asset Pricing Model (CAPM):**
 
 $$
 \mathbb{E}[R_i] = r_f + \beta_i \big(\mathbb{E}[R_m] - r_f\big)
 $$
 
 - $\mathbb{E}[R_i]$ : Expected return of asset $i$
 - $r_f$ : Risk-free rate   
 - $\beta_i$ : Sensitivity (“beta”) of $i$ to market returns
 - $\mathbb{E}[R_m]$ : Expected market return
 - $\big(\mathbb{E}[R_m] - r_f\big)$ : Market risk premium
 
Goal of this project: Run a regression to see how much of the fund's returns are explained by the market

Reference: Sharpe, W.F. (1964). "Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk." Journal of Finance.

If CAPM is true, everything is just varying degrees of market exposure - in reality this is not the case!
 
 The standard CAPM model (above) has no "alpha" term—CAPM assumes the market is efficient, so on average, all assets plot exactly on the line: 
 
 $$\mathbb{E}[R_i] = r_f + \beta_i (\mathbb{E}[R_m] - r_f)$$
 
 In real data, we often observe persistent deviations from this line. 
 
 We capture these deviations with an **"alpha"** term by amending the regression equation:
 
 $$
 R_i = \alpha_i + r_f + \beta_i (R_m - r_f) + \varepsilon
 $$
 
 Here, $\alpha_i$ (alpha) measures the excess return of asset $i$ beyond what is predicted by beta and the market, after adjusting for risk-free rate. If $\alpha>0$, the asset or portfolio is "beating CAPM"; if $\alpha<0$, it's underperforming.



In [129]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf

In [130]:
# Pull historical data from yfinance starting from the year 2000
market_data = yf.download("ACWI", start="2000-01-01", multi_level_index=False) #MSCI ACWI index
market_data['market_rets'] = market_data['Close'].pct_change()
market_data = market_data.dropna()
display(market_data.head())
display(market_data.tail())

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Close,High,Low,Open,Volume,market_rets
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008-03-31,35.11364,35.59777,35.11364,35.59777,400,-0.015569
2008-04-01,36.060551,36.060551,35.711691,35.775767,600,0.026967
2008-04-02,36.459236,36.594509,36.345323,36.39516,10700,0.011056
2008-04-03,36.66571,42.503746,36.274135,42.503746,29100,0.005663
2008-04-04,36.644341,39.121945,36.644341,39.121945,9900,-0.000583


Unnamed: 0_level_0,Close,High,Low,Open,Volume,market_rets
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2026-02-05,143.210007,144.539993,142.919998,143.809998,4062200,-0.012277
2026-02-06,146.279999,146.470001,144.419998,144.460007,4312200,0.021437
2026-02-09,147.410004,147.679993,146.039993,146.279999,3577700,0.007725
2026-02-10,147.309998,148.009995,147.240005,147.820007,6799600,-0.000678
2026-02-11,147.710007,148.410004,146.880005,148.25,7414900,0.002715


In [131]:
#Get returns data (depends on fund)
fund_data = pd.read_csv("C:/Users/admin/Desktop/Python Projects/parkman-healthcare-partners_returns.csv")
fund_data = fund_data[['Date', 'Return']].dropna() #The column names must be "Date" and "Return" 

# Prepare fund returns
fund_data['Date'] = pd.to_datetime(fund_data['Date'])
fund_data = fund_data.set_index('Date').sort_index()
fund_data['Return'] = fund_data['Return'].str.rstrip('%').astype('float') / 100

display(fund_data.head())
display(fund_data.tail())

  fund_data['Date'] = pd.to_datetime(fund_data['Date'])


Unnamed: 0_level_0,Return
Date,Unnamed: 1_level_1
2019-06-30,8.5e-05
2019-07-31,-0.0154
2019-08-31,-0.0289
2019-09-30,-0.0367
2019-10-31,0.028


Unnamed: 0_level_0,Return
Date,Unnamed: 1_level_1
2025-09-30,0.000246
2025-10-31,0.000187
2025-11-30,0.000344
2025-12-31,-8.8e-05
2026-01-31,-0.00012


In [132]:
#Get 10-year treasury bill data for each month
rf = pd.read_csv("C:/Users/admin/Desktop/Python Projects/10_yr_treasury_monthly.csv") #columns are "Date" and "Yield"
rf['Date'] = pd.to_datetime(rf['Date'])
rf = rf.set_index('Date')

#De-annualize it to monthly rate, change to decimal
rf['Yield'] = rf['Yield'] / 12 / 100
rf

Unnamed: 0_level_0,Yield
Date,Unnamed: 1_level_1
2026-01-30,0.003531
2025-12-31,0.003474
2025-11-28,0.003346
2025-10-31,0.003398
2025-09-30,0.003459
...,...
2000-05-31,0.005227
2000-04-28,0.005177
2000-03-31,0.005003
2000-02-29,0.005341


In [133]:
#get fund returns and market returns
fund_rets = fund_data['Return']

#Change this depending on the hedge fund returns data frequency
market_rets = market_data['market_rets'].resample('M').apply(lambda x: (1 + x).prod() - 1) #monthly
# market_rets = df['market_rets'].resample('D') #daily

# force all dates to the calendar month end
fund_rets.index = fund_rets.index.to_period('M').to_timestamp('M')
market_rets.index = market_rets.index.to_period('M').to_timestamp('M')
rf.index = rf.index.to_period('M').to_timestamp('M')

#SANITY CHECKS
market_rets.to_csv('market_returns.csv')
rf.to_csv('monthly_yield.csv')
fund_rets.to_csv('fund_returns.csv')

# Align series on common dates
df = pd.concat([fund_rets, market_rets, rf], axis = 1, join='inner')
df.columns = ['Return', 'Market Return', 'Risk Free']
df

  market_rets = market_data['market_rets'].resample('M').apply(lambda x: (1 + x).prod() - 1) #monthly


Unnamed: 0_level_0,Return,Market Return,Risk Free
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-06-30,0.000085,0.064904,0.001672
2019-07-31,-0.015400,0.000678,0.001679
2019-08-31,-0.028900,-0.022099,0.001248
2019-09-30,-0.036700,0.022459,0.001388
2019-10-31,0.028000,0.027390,0.001410
...,...,...,...
2025-09-30,0.000246,0.035971,0.003459
2025-10-31,0.000187,0.022931,0.003398
2025-11-30,0.000344,0.000424,0.003346
2025-12-31,-0.000088,0.008929,0.003474


### Returns with risk-free rate ###

In [None]:
alpha_color = "#f90e0eff"
beta_color = '#009efb'

market_excess = df['Market Return'] - df['Risk Free']

# fund_excess = df['Return'] - df['Risk Free']

# # CAPM regression on excess returns: fund_excess = alpha + beta * market_excess
# beta, alpha = np.polyfit(market_excess, fund_excess, 1)

# # Mean components (no residual/noise term)
# alpha_component = float(alpha)
# beta_component = float(beta) * float(market_excess.mean())

# print(f'beta = {beta:.4f}, alpha = {alpha:.6f}')
# print(f'mean alpha component = {alpha_component:.6f}')
# print(f'mean beta component  = {beta_component:.6f}')

# # Normalize components (0-100%)
# components = [alpha_component, beta_component]
# components_nonneg = [max(0.0, float(x)) for x in components] #Set negative components to zero (can't plot)
# components_sum = float(sum(components_nonneg))
# if components_sum > 0:
#     components_pct = [x / components_sum * 100.0 for x in components_nonneg]
# else:
#     components_pct = [0.0, 0.0]

# # Scatter plot to display the fund's return corelation with market
# x = market_excess.dropna() # x is a series
# y = fund_excess.reindex(x.index).dropna()
# x = x.reindex(y.index)

# #find r squared for style points
# y_hat = alpha + beta * x  
# ss_res = ((y - y_hat) ** 2).sum()
# ss_tot = ((y - y.mean()) ** 2).sum()
# r2 = 1 - ss_res / ss_tot
# print(f"R^2 = {r2:.6f}")

# # Fitted line: y = alpha + beta * x
# x_sorted = x.sort_values()
# y_fit = alpha + beta * x_sorted

# plt.figure(figsize=(8, 6))
# plt.scatter(x, y, alpha=0.6, color=beta_color)
# plt.plot(
#     x_sorted,
#     y_fit,
#     color=alpha_color,
#     linewidth=2,
#     label=f"Fit: y={beta:.2f}x+{alpha:.5f}",
# )
# plt.xlabel("Market excess monthly return")
# plt.ylabel("Fund excess monthly return")
# plt.title("Fund vs Market (E(R) - Rf) excess returns")
# plt.grid(True, alpha=0.3)
# plt.legend()

# # ax = plt.gca()

# # # Same bounds on both axes
# # lo = float(min(x.min(), y.min()))
# # hi = float(max(x.max(), y.max()))
# # pad = (hi - lo) * 0.05 if hi > lo else 0.01
# # lo -= pad
# # hi += pad

# # ax.set_xlim(lo, hi)
# # ax.set_ylim(lo, hi)

# # # Equal scaling (45° is truly 45°)
# # ax.set_aspect("equal", adjustable="box")

# plt.show()


# # Plot normalized alpha and beta (0-100%)

# labels = ['Alpha', 'Beta']
# colors = [alpha_color, beta_color]

# plt.figure(figsize=(6, 4))
# bars = plt.bar(labels, components_pct, color=colors)
# plt.ylabel("Component weight (%)")
# plt.title("CAPM Component Weights (Normalized)")
# plt.ylim(0, 110)
# plt.grid(axis="y", alpha=0.25)
# for b in bars:
#     h = b.get_height()
#     plt.text(
#         b.get_x() + b.get_width() / 2,
#         h + 2,
#         f"{h:.1f}%",
#         ha="center",
#         va="bottom",
#     )
# plt.show()



Date
2019-06-30    0.063233
2019-07-31   -0.001001
2019-08-31   -0.023347
2019-09-30    0.021071
2019-10-31    0.025980
                ...   
2025-09-30    0.032512
2025-10-31    0.019533
2025-11-30   -0.002922
2025-12-31    0.005454
2026-01-31    0.024810
Freq: ME, Length: 80, dtype: float64