# Homework 4

## Team Members:
### 1. Dongtong Zhong (8124193969)
### 2. Liwen Dai (5282656931)
### 3. Feifan Gu (8135699631)

## Start

In [17]:
import pandas as pd
import numpy as np
from pandas.tseries.offsets import MonthEnd

# load data
# CRSP monthly (we assume columns include: permno, date, ret, prc, shrout)
crsp = pd.read_feather("crsp_monthly_stocks_HW4.feather")
# Compustat annual (we assume columns include: permno, datadate, SEQ, IB, DVT, SALE)
comp = pd.read_feather("compustat_annual_HW4.feather")

In [18]:
crsp['DATE'] = crsp['DATE'] + MonthEnd(0) # ensure month-end dates
comp['DATE'] = comp['DATADATE'] + MonthEnd(0) # ensure month-end dates
comp['AVAIL_DATE'] = comp['DATE'] + MonthEnd(6) # assume 6-month lag for annual data availability
# rename for merging
comp.rename(columns={"LPERMNO":"PERMNO"}, inplace=True)
crsp['PRC'] = crsp['PRC'].abs()  # use absolute price
crsp['Market Cap'] = crsp['PRC'] * crsp['SHROUT']
crsp = crsp[crsp['Market Cap'] > 0]

In [19]:
vars_cols = ["SEQ", "IB", "DVT", "SALE"]
comp_use = comp[["PERMNO", "AVAIL_DATE", *vars_cols]].dropna(subset=["PERMNO", "AVAIL_DATE"])
calendar = crsp[["PERMNO", "DATE"]].drop_duplicates()
tmp = calendar.merge(comp_use, on="PERMNO", how="left")
tmp = tmp[tmp["AVAIL_DATE"] <= tmp["DATE"]]

# For each (PERMNO, DATE), keep the most recent available fundamentals (max AVAIL_DATE)
tmp = (
    tmp.sort_values(["PERMNO", "DATE", "AVAIL_DATE"])
       .groupby(["PERMNO", "DATE"], as_index=False)
       .tail(1)
)
monthly_comp = (
    calendar.merge(tmp[["PERMNO", "DATE", *vars_cols]], on=["PERMNO", "DATE"], how="left")
            .sort_values(["PERMNO", "DATE"])
)
monthly_comp[vars_cols] = monthly_comp.groupby("PERMNO", group_keys=False)[vars_cols].ffill()
merged_panel = crsp.merge(monthly_comp, on=["PERMNO", "DATE"], how="left")

In [20]:
merged_panel

Unnamed: 0,PERMNO,DATE,SHRCD,EXCHCD,SICCD,PRC,VOL,RET,SPREAD,RETX,SHROUT,Market Cap,SEQ,IB,DVT,SALE
0,10000,1986-01-31,10,3,3990,4.37500,1771.0,,0.25000,,3680.0,1.610000e+04,,,,
1,10000,1986-02-28,10,3,3990,3.25000,828.0,-0.257143,0.25000,-0.257143,3680.0,1.196000e+04,,,,
2,10000,1986-03-31,10,3,3990,4.43750,1078.0,0.365385,0.12500,0.365385,3680.0,1.633000e+04,,,,
3,10000,1986-04-30,10,3,3990,4.00000,957.0,-0.098592,0.25000,-0.098592,3793.0,1.517200e+04,,,,
4,10000,1986-05-31,10,3,3990,3.10938,1074.0,-0.222656,0.09375,-0.222656,3793.0,1.179388e+04,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3788850,93436,2024-08-31,11,3,9999,214.11000,16108365.0,-0.077390,,-0.077390,3194640.0,6.840044e+08,62634.0,14997.0,0.0,96773.0
3788851,93436,2024-09-30,11,3,9999,261.63000,16042065.0,0.221942,,0.221942,3207000.0,8.390474e+08,62634.0,14997.0,0.0,96773.0
3788852,93436,2024-10-31,11,3,9999,249.85001,19014312.0,-0.045025,,-0.045025,3210060.0,8.020335e+08,62634.0,14997.0,0.0,96773.0
3788853,93436,2024-11-30,11,3,9999,345.16000,20821313.0,0.381469,,0.381469,3210060.0,1.107984e+09,62634.0,14997.0,0.0,96773.0


In [21]:
columns_use = ["SEQ", "IB", "DVT", "SALE", "Market Cap", "RET", "PERMNO", "DATE"]
merged = merged_panel[columns_use]

In [22]:
merged.set_index(["PERMNO", "DATE"], inplace=True)
merged.sort_index(inplace=True)

In [23]:
# define helper function to make portfolio
def make_portfolio(df, metric_col, top_n, ascending , label = None):
    out_name = label or metric_col
    df = df.sort_values(["DATE", "PERMNO"]).copy()
    df["metric_prev"] = df.groupby("PERMNO")[metric_col].shift(1)
    def one_month(g):
        # For each month, drop missing metric
        g = g.dropna(subset=["metric_prev"]).copy()
        if g.empty:
            return pd.Series({out_name: np.nan})
        # Select top_n stocks based on metric_prev
        g = g.assign(rank=g["metric_prev"].rank(method="first", ascending = ascending))
        g = g.nsmallest(top_n, "rank")  # select top_n stocks
        # calculate weights for each stock
        tot = g["metric_prev"].sum()
        if pd.isna(tot) or tot == 0:
            w = np.repeat(1.0 / len(g), len(g))
        else:
            w = g["metric_prev"] / tot
        # calculate portfolio return
        rp = np.dot(w, g["RET"].values)
        return pd.Series({out_name: rp})

    ret_series = (
        df.groupby("DATE", as_index=False)
          .apply(one_month)
          .reset_index(drop=True)
          .set_index("DATE")
    )
    return ret_series  # index=DATE, col=out_name

In [24]:
# Create portfolios
mcap_ret = make_portfolio(merged_panel, "Market Cap", top_n=1000, ascending=False, label="MktCap_Portfolio")
ib_ret = make_portfolio(merged_panel, "IB", top_n=1000, ascending=False, label="Income_Portfolio")
dvt_ret = make_portfolio(merged_panel, "DVT", top_n=1000, ascending=False, label="DVT_Portfolio")
sale_ret = make_portfolio(merged_panel, "SALE", top_n=1000, ascending=False, label="Sale_Portfolio")
seq_ret = make_portfolio(merged_panel, "SEQ", top_n=1000, ascending=False, label="SEQ_Portfolio")

  .apply(one_month)
  .apply(one_month)
  .apply(one_month)
  .apply(one_month)
  .apply(one_month)


## Conclusion for start

From the above analysis, we constructed five different portfolios based on Market Cap, profitability, Dividend, Sales, and Shareholder equity. Each portfolio was formed by selecting the top 1000 stocks according to the respective metric from the merged CRSP and Compustat dataset. The returns for each portfolio were calculated on a monthly basis. This approach allows us to evaluate the performance of portfolios based on different fundamental characteristics of stocks.

## Question 1

In [25]:
# concact all portfolio returns
port_rets = pd.concat([mcap_ret, seq_ret, ib_ret, sale_ret, dvt_ret], axis=1)
port_rets

Unnamed: 0_level_0,MktCap_Portfolio,SEQ_Portfolio,Income_Portfolio,Sale_Portfolio,DVT_Portfolio
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1925-12-31,,,,,
1926-01-31,0.000278,,,,
1926-02-28,-0.033871,,,,
1926-03-31,-0.065039,,,,
1926-04-30,0.036794,,,,
...,...,...,...,...,...
2024-08-31,0.022608,0.013710,0.019780,0.013239,0.024805
2024-09-30,0.022140,0.009024,0.012360,0.012077,0.018167
2024-10-31,-0.005914,-0.001479,-0.003591,-0.013663,-0.007174
2024-11-30,0.067186,0.075411,0.066063,0.076097,0.056620


In [26]:
# load Fama-French monthly data for risk-free rate
ff = pd.read_csv("FamaFrenchMonthly_HW4.csv", parse_dates=['Date'])
ff['Date'] = pd.to_datetime(ff['Date'].astype(str), format="%Y%m") + MonthEnd(0)
ff.set_index('Date', inplace=True)
port_rets = port_rets.merge(ff[['RF', 'Mkt-RF']], how='left', left_index=True, right_index=True)
port_rets['Mkt'] = port_rets['Mkt-RF'] / 100.0 + port_rets['RF'] / 100.0
port_rets['RF'] = port_rets['RF'] / 100.0

  ff = pd.read_csv("FamaFrenchMonthly_HW4.csv", parse_dates=['Date'])


In [None]:
from scipy import stats
col_map = {
    "Reference": "MktCap_Portfolio",
    "Book":   "SEQ_Portfolio",
    "Income":    "Income_Portfolio",
    "Sales":  "Sale_Portfolio",
    "Dividends":   "DVT_Portfolio",
}

# filter date range 1962-01 to 2004-12
use_cols = list(col_map.values()) + ['RF']
mask = (port_rets.index >= pd.Timestamp("1962-01-01")) & (port_rets.index <= pd.Timestamp("2004-12-31"))
prt = port_rets.loc[mask, use_cols].copy().dropna(how="all")

# Annualization constants
K_mu  = 12.0
K_vol = np.sqrt(12.0)

def table_row(series, ref_series, rf_series):
    # align on dates and drop NaNs
    df = pd.concat([series, ref_series, rf_series], axis=1).dropna()
    df.columns = ['r', 'r_ref', 'rf']

    r   = df['r'].to_numpy(float)
    r0  = df['r_ref'].to_numpy(float)
    rf  = df['rf'].to_numpy(float)      

    # 1) annualized return
    mu_ann  = np.nanmean(r) * K_mu

    # 2) annualized vol
    vol_ann = np.nanstd(r, ddof=1) * K_vol

    # 3) Sharpe ratio
    ex      = r - rf
    mu_ex_a = np.nanmean(ex) * K_mu
    vol_ex_a= np.nanstd(ex, ddof=1) * K_vol
    sharpe  = np.nan if vol_ex_a == 0 else mu_ex_a / vol_ex_a

    # 4) Excess return vs. reference & t-stat
    diff = r - r0
    mu_excess_ann = np.nanmean(diff) * K_mu
    t_stat = stats.ttest_1samp(diff, popmean=0.0, nan_policy="omit").statistic

    return pd.Series({
        "Annualized Return (%)": 100 * mu_ann,
        "Annualized Vol (%)":    100 * vol_ann,
        "Annualized Sharpe":     sharpe,
        "Excess Return vs. Reference (%)":     100 * mu_excess_ann,
        "t-stat on Excess Return vs. Reference":  t_stat,
    })

# reference portfolio (first row)
ref_col = col_map["Reference"]
ref = prt[ref_col]
rf = prt['RF']

rows = []
for row_name, col_name in col_map.items():
    s = prt[col_name]
    met = table_row(s, ref, rf)
    if row_name == "Reference":
        met["Excess Return vs. Reference (%)"] = np.nan
        met["t-stat on Excess Return vs. Reference"] = np.nan
    met.name = row_name
    rows.append(met)

table1 = pd.DataFrame(rows).round(2)
table1

Unnamed: 0,Annualized Return (%),Annualized Vol (%),Annualized Sharpe,Excess Return vs. Reference (%),t-stat on Excess Return vs. Reference
Reference,10.63,15.18,0.33,,
Book,12.06,15.05,0.43,1.2,1.6
Income,11.79,14.67,0.42,1.17,1.86
Sales,12.32,15.74,0.43,2.02,2.43
Dividends,11.43,13.6,0.43,1.01,1.15


## <mark> Problem solution summary: <mark>

For the above questions in **problem 1**, we constructed five different portfolios based on Market Capitalization, profitability, Dividend, Sales, and Shareholders' Equity. Each portfolio was formed by selecting the top 1000 stocks according to the respective metric from the merged CRSP and Compustat dataset. The returns for each portfolio were calculated on a monthly basis. We then evaluated the performance of each portfolio by calculating key metrics such as annualized return, annualized volatility, annualized Sharpe ratio, excess return over a reference portfolio, and the t-statistic for the excess return. This analysis provides insights into how different fundamental characteristics of stocks impact portfolio performance.

## Question 2

In [28]:
# filter date range 2005-01 to 2004-12
use_cols = list(col_map.values()) + ['RF']
mask = (port_rets.index >= pd.Timestamp("2005-01-01")) & (port_rets.index <= pd.Timestamp("2024-12-31"))
prt = port_rets.loc[mask, use_cols].copy().dropna(how="all")

# reference portfolio (first row)
ref_col = col_map["Reference"]
ref = prt[ref_col]
rf = prt['RF']

rows = []
for row_name, col_name in col_map.items():
    s = prt[col_name]
    met = table_row(s, ref, rf)
    if row_name == "Reference":
        met["Excess Return vs. Reference (%)"] = np.nan
        met["t-stat on Excess Return vs. Reference"] = np.nan
    met.name = row_name
    rows.append(met)

table1_oos = pd.DataFrame(rows).round(2)
table1_oos

Unnamed: 0,Annualized Return (%),Annualized Vol (%),Annualized Sharpe,Excess Return vs. Reference (%),t-stat on Excess Return vs. Reference
Reference,12.84,14.43,0.78,,
Book,13.03,16.83,0.68,-0.29,-0.22
Income,12.66,15.51,0.71,-0.06,-0.07
Sales,13.21,16.68,0.7,0.51,0.42
Dividends,11.92,14.92,0.69,-0.88,-0.67


## Out-of-Sample Performance (2005–2024) vs. In-Sample (1962–2004)
### 1. Overall returns and volatilities
In the in-sample period (1962–2004), all characteristic-sorted portfolios (Book, Income, Sales, Dividends) outperformed the Reference portfolio, with annualized excess returns of about 1–2% and positive t-statistics. This suggests strong predictive power of firm fundamentals during the historical sample. In the out-of-sample period (2005–2024), the picture changes. Annualized returns for the characteristic portfolios remain comparable to the Reference (around 12–13%), but the excess returns over Reference shrink toward zero or turn slightly negative. For example, the Book and Income portfolios deliver almost no excess return relative to Reference (–0.29% and –0.06% respectively), and Dividends even underperform (–0.88%). Volatilities remain broadly similar, in the range of 14–17%.

### 2. Sharpe ratios
Sharpe ratios improve modestly in the out-of-sample period for the Reference portfolio (0.78 vs. 0.33 in-sample), but the characteristic portfolios do not generate clearly higher Sharpe ratios relative to Reference. This indicates that the risk-adjusted performance of fundamental signals weakened in the post-2004 era.

### 3. Excess returns vs. Reference and statistical significance
In the in-sample test, the excess returns of Book, Income, Sales, and Dividends are all positive (1–2%) and statistically meaningful (t-stats near or above 2.0 for Sales, ~1.2–1.8 for others). This supported the idea that fundamentals added value beyond simple market-cap weighting. In contrast, in the out-of-sample test, none of the characteristic portfolios achieve a statistically significant excess return relative to Reference. T-statistics cluster near zero, with the Sales portfolio being the only one marginally positive (0.51). This indicates that the predictive power of these characteristics did not hold up out-of-sample.

### Conclusion
The evidence suggests that while characteristic-based strategies (Book, Income, Sales, Dividends) appeared profitable in the historical 1962–2004 sample, their predictive power largely disappears after 2005. Returns and volatilities remain similar, but excess performance relative to the Reference portfolio is no longer significant. This highlights the risk of data-snooping or structural changes in markets: what worked in one era may not persist in the future.

## Question 3

In [29]:
import statsmodels.api as sm
from scipy import stats

# Prepare data 
# We already have port_rets containing all portfolio returns
use_cols = ['MktCap_Portfolio', 'SEQ_Portfolio', 'Income_Portfolio', 
           'Sale_Portfolio', 'DVT_Portfolio', 'RF', 'Mkt-RF']

# Define portfolio name mapping
portfolio_names = {
    'MktCap_Portfolio': 'Reference',
    'SEQ_Portfolio': 'Book', 
    'Income_Portfolio': 'Income',
    'Sale_Portfolio': 'Sales',
    'DVT_Portfolio': 'Dividends'
}

# Mkt-RF values are small (like 0.5 for 0.5%), divide by 100
if abs(port_rets['Mkt-RF'].mean()) < 1:
    port_rets['Mkt-RF'] = port_rets['Mkt-RF'] / 100

# Define CAPM regression function
def run_capm_regression(portfolio_returns, ff_data, start_date, end_date):
    # Filter time period
    mask = (portfolio_returns.index >= pd.Timestamp(start_date)) & \
           (portfolio_returns.index <= pd.Timestamp(end_date))
    data = portfolio_returns.loc[mask, use_cols].copy().dropna(how='all')
    
    results = {}
    
    for port_col, port_name in portfolio_names.items():
        # Prepare data
        df_temp = data[[port_col, 'RF', 'Mkt-RF']].dropna()
        
        if len(df_temp) == 0:
            print(f"Warning: {port_name} has no data from {start_date} to {end_date}")
            continue
            
        # Calculate excess returns
        y = df_temp[port_col] - df_temp['RF']  # Portfolio excess return
        X = df_temp['Mkt-RF']  # Market excess return (already in excess return form)
        X = sm.add_constant(X)  # Add constant term
        
        # Run OLS regression
        try:
            model = sm.OLS(y, X, missing='drop').fit()
            
            # Extract results
            alpha = model.params['const']  # Monthly alpha
            beta = model.params['Mkt-RF']  # CAPM beta
            t_alpha = model.tvalues['const']  # t-statistic for alpha
            r_squared = model.rsquared  # R-squared
            
            # Annualized returns and alpha
            ann_return = df_temp[port_col].mean() * 12 * 100  # Annualized return (%)
            ann_alpha = alpha * 12 * 100  # Annualized alpha (%)
            
            results[port_name] = {
                'Ann. Return': ann_return,
                'Beta_MKT': beta,
                'Ann. Alpha': ann_alpha,
                't(Alpha)': t_alpha,
                'R2': r_squared
            }
            
            print(f"{port_name:12} | Beta: {beta:6.2f} | Alpha: {ann_alpha:6.2f}% (t={t_alpha:5.2f})")
            
        except Exception as e:
            print(f"Regression error for {port_name}: {e}")
            continue
    
    return pd.DataFrame(results).T

In [30]:
# Run CAPM regression for both sample periods
# 1962-2004 sample period
capm_62_04 = run_capm_regression(port_rets, ff, '1962-01-01', '2004-12-31')

# 2005-2024 sample period  
capm_05_24 = run_capm_regression(port_rets, ff, '2005-01-01', '2024-12-31')

Reference    | Beta:   0.99 | Alpha:  -0.05% (t=-0.34)
Book         | Beta:   0.93 | Alpha:   1.39% (t= 1.87)
Income       | Beta:   0.92 | Alpha:   1.50% (t= 2.37)
Sales        | Beta:   0.98 | Alpha:   2.09% (t= 2.58)
Dividends    | Beta:   0.83 | Alpha:   1.82% (t= 2.23)
Reference    | Beta:   0.98 | Alpha:   0.20% (t= 1.41)
Book         | Beta:   1.10 | Alpha:  -1.30% (t=-0.98)
Income       | Beta:   1.03 | Alpha:  -0.18% (t=-0.22)
Sales        | Beta:   1.08 | Alpha:  -0.21% (t=-0.18)
Dividends    | Beta:   0.98 | Alpha:  -0.27% (t=-0.20)


In [31]:
# Format and display results
# Select required columns and rename
display_cols = ['Ann. Return', 'Beta_MKT', 'Ann. Alpha', 't(Alpha)']

print("CAPM Regression Results (1962-2004)")
table_capm_62_04 = capm_62_04[display_cols].round(2)
display(table_capm_62_04)

print("CAPM Regression Results (2005-2024)")
table_capm_05_24 = capm_05_24[display_cols].round(2)
display(table_capm_05_24)

CAPM Regression Results (1962-2004)


Unnamed: 0,Ann. Return,Beta_MKT,Ann. Alpha,t(Alpha)
Reference,10.63,0.99,-0.05,-0.34
Book,12.38,0.93,1.39,1.87
Income,12.18,0.92,1.5,2.37
Sales,12.79,0.98,2.09,2.58
Dividends,11.82,0.83,1.82,2.23


CAPM Regression Results (2005-2024)


Unnamed: 0,Ann. Return,Beta_MKT,Ann. Alpha,t(Alpha)
Reference,12.84,0.98,0.2,1.41
Book,13.13,1.1,-1.3,-0.98
Income,13.02,1.03,-0.18,-0.22
Sales,13.35,1.08,-0.21,-0.18
Dividends,11.73,0.98,-0.27,-0.2


In [32]:
# Calculate Alpha changes
if not capm_62_04.empty and not capm_05_24.empty:
    alpha_change = {}
    for port in portfolio_names.values():
        if port in capm_62_04.index and port in capm_05_24.index:
            alpha_old = capm_62_04.loc[port, 'Ann. Alpha']
            alpha_new = capm_05_24.loc[port, 'Ann. Alpha']
            alpha_change[port] = alpha_new - alpha_old
    
    print(f"\nAlpha changes (2005-2024 vs 1962-2004):")
    for port, change in alpha_change.items():
        print(f"   - {port}: {change:+.2f}%")




Alpha changes (2005-2024 vs 1962-2004):
   - Reference: +0.25%
   - Book: -2.69%
   - Income: -1.68%
   - Sales: -2.29%
   - Dividends: -2.09%


## CAPM Regression Analysis Summary

This analysis evaluates the performance of fundamental indexing strategies through CAPM regression, examining both in-sample (1962-2004) and out-of-sample (2005-2024) periods with a focus on beta coefficients relative to the MKT-RF factor and alpha generation.

### In-Sample Period Performance (1962-2004)
During the original sample period, all fundamental portfolios demonstrated statistically significant positive alphas, with most achieving significance at the 5% level or higher. The sales-weighted portfolio led with a 2.09% annualized alpha (t=2.58, significant at 1% level), followed by income and dividends portfolios generating 1.50% and 1.82% alphas respectively (both significant at 5% level). The book value portfolio produced a 1.39% alpha with marginal significance at the 10% level. Concurrently, these portfolios exhibited defensive characteristics with beta coefficients consistently below 1, with the dividends portfolio showing the lowest market sensitivity at 0.83 beta.

### Out-of-Sample Performance Deterioration (2005-2024)
A dramatic reversal occurred in the out-of-sample period, with all fundamental strategies showing negative alphas and complete loss of statistical significance. The book portfolio experienced the most severe deterioration, declining from +1.39% to -1.30% alpha, while the sales portfolio fell from 2.09% to -0.21%. Notably, risk characteristics also shifted substantially, with most portfolios showing increased beta coefficients above 1, indicating heightened market sensitivity without corresponding return compensation.

### Evolution of Risk-Return Profiles
The comparison between periods reveals important patterns in risk-return dynamics. During the in-sample period, fundamental portfolios achieved superior risk-adjusted returns through lower beta exposures. However, in the out-of-sample period, despite increased beta levels indicating higher risk, alphas turned negative, suggesting substantial deterioration in risk-adjusted performance. This pattern points toward improving market efficiency, where previously existing pricing anomalies may have been identified and arbitraged away by market participants.

### Practical Implications and Investment Recommendations
The findings carry significant implications for investment practice. First, strategy evaluation relying solely on historical backtests may be limited, emphasizing the critical importance of out-of-sample validation. Second, continuing improvements in market efficiency require investors to maintain healthy skepticism toward any strategy claiming persistent excess returns. Finally, traditional market-cap weighting demonstrated better stability in the out-of-sample period, offering investors a relatively reliable benchmark option.

In conclusion, the superior performance of fundamental indexing strategies documented in the original sample period failed to persist out-of-sample. This finding underscores the importance of out-of-sample validation in financial empirical research and provides valuable economic insights into the evolution of market efficiency.

## Question 4

We now run a Fama-French 3-factor regression for each of the five portfolios
(Reference, Book, Income, Sales, Dividends). For each month t we estimate:
$$
(R_{it} - R_{F,t}) = \alpha_i + \beta_{MKT}(MKT_t - RF_t) + \beta_{SMB}SMB_t + \beta_{HML}HML_t + \varepsilon_t
$$
From this regression we report:
- Arithmetic annualized return of the raw portfolio (not excess), in %
- $\beta_{MKT}, \beta_{SMB}, \beta_{HML}$
- FF3 $\alpha$ (annualized), in %
- t-stat of the $\alpha$

We compute results separately for:
1. 1962-01 through 2004-12
2. 2005-01 through 2024-12


In [None]:
# FF3 regressions (MKT-RF, SMB, HML) in the same style as our CAPM block
import statsmodels.api as sm
from scipy import stats

def run_ff3_regression(port_rets, ff, start_date, end_date):
  
    # Subset the date range (month-end index)
    mask = (port_rets.index >= pd.Timestamp(start_date)) & (port_rets.index <= pd.Timestamp(end_date))
    pr = port_rets.loc[mask, list(portfolio_names.keys())].copy()

    # Pull factors from ff and convert to decimals
    fac = ff.loc[pr.index, ['Mkt-RF', 'SMB', 'HML', 'RF']].copy()
    fac[['Mkt-RF', 'SMB', 'HML', 'RF']] = fac[['Mkt-RF', 'SMB', 'HML', 'RF']] / 100.0

    results = {}
    for col in portfolio_names.keys():
        # Align and drop NaNs
        df = pd.concat([pr[col], fac], axis=1).dropna()
        if df.empty:
            continue

        rp = df[col]            # raw portfolio return (decimal)
        rf = df['RF']
        y  = rp - rf            # excess return of the portfolio
        X  = sm.add_constant(df[['Mkt-RF', 'SMB', 'HML']])

        model = sm.OLS(y, X).fit()

        # Betas and alpha (monthly), then annualize alpha *12 and report in %
        beta_mkt = model.params['Mkt-RF']
        beta_smb = model.params['SMB']
        beta_hml = model.params['HML']
        alpha_m  = model.params['const']             # monthly alpha in decimal
        alpha_y_pct = alpha_m * 12 * 100             # annualized alpha in %

        # t-stat on alpha: monthly t; scaling both coef & se by 12 keeps t the same
        t_alpha = model.tvalues['const']

        # Arithmetic annualized return (not geometric), in %
        ann_return_pct = rp.mean() * 12 * 100

        results[portfolio_names[col]] = {
            'Ann. Return (%)': ann_return_pct,
            'Beta_MKT': beta_mkt,
            'Beta_SMB': beta_smb,
            'Beta_HML': beta_hml,
            'FF3 Ann. Alpha (%)': alpha_y_pct,
            't(Alpha)': t_alpha,
            'R2': model.rsquared
        }

    return pd.DataFrame(results).T



In [59]:
# Run for both periods
ff3_62_04 = run_ff3_regression(port_rets, ff, '1962-01-01', '2004-12-31')
ff3_05_24 = run_ff3_regression(port_rets, ff, '2005-01-01', '2024-12-31')

# Format and display (two decimals, like Q1–Q3)
display_cols_q4 = ['Ann. Return (%)', 'Beta_MKT', 'Beta_SMB', 'Beta_HML', 'FF3 Ann. Alpha (%)', 't(Alpha)']

print("FF3 Regression Results (1962–2004)")
display(ff3_62_04[display_cols_q4].round(2))

print("FF3 Regression Results (2005–2024)")
display(ff3_05_24[display_cols_q4].round(2))

FF3 Regression Results (1962–2004)


Unnamed: 0,Ann. Return (%),Beta_MKT,Beta_SMB,Beta_HML,FF3 Ann. Alpha (%),t(Alpha)
Reference,10.63,1.0,-0.08,-0.01,0.18,2.58
Book,12.38,1.03,-0.07,0.34,-1.01,-1.77
Income,12.18,1.01,-0.12,0.26,-0.19,-0.41
Sales,12.79,1.07,0.06,0.4,-1.0,-1.66
Dividends,11.82,0.95,-0.14,0.39,-0.82,-1.51


FF3 Regression Results (2005–2024)


Unnamed: 0,Ann. Return (%),Beta_MKT,Beta_SMB,Beta_HML,FF3 Ann. Alpha (%),t(Alpha)
Reference,12.84,1.0,-0.06,-0.01,0.03,0.56
Book,13.13,1.06,0.0,0.42,-0.43,-0.58
Income,13.02,1.02,-0.06,0.22,0.26,0.44
Sales,13.35,1.03,0.08,0.3,0.74,0.91
Dividends,11.73,0.95,-0.11,0.39,0.2,0.22


### Question 5 

**Out-of-sample CAPM performance.**  
When we move from the original 1962–2004 sample to 2005–2024, CAPM alphas that were positive and sometimes significant in-sample shrink sharply and lose statistical meaning. This pattern suggests that the earlier “abnormal” returns of the fundamental portfolios were not persistent and may have reflected exposure to risk factors omitted by the one-factor model rather than genuine mispricing.

**Adding FF3 factors.**  
Within the 1962–2004 period, introducing SMB and HML absorbs almost all of the CAPM alpha. The Book, Income, Sales, and Dividends portfolios load positively on HML (value) and weakly on SMB (size), and once these exposures are controlled for, their alphas become economically and statistically trivial. This mirrors Arnott, Hsu & Moore (2005), who reported that fundamental-weighted indexes earned roughly 2 % annual CAPM alpha that disappeared under FF3 regression. The incremental returns were largely compensation for value-style risk, not unexplained alpha.

**Implications for return sources.**  
The extra performance of portfolios weighted by fundamentals—book value, income, sales, dividends—arises from tilting toward companies with high fundamental strength relative to price. In FF3 terms, this means long exposure to value (HML) and mild exposure to small (SMB) stocks. The returns are therefore systematic rather than skill-based; they capture persistent risk premia associated with value-biased holdings rather than pure market inefficiency.

**Lessons on “Fundamental Indexation.”**  
The analysis teaches that “Fundamental Indexation” mainly re-expresses the classic value factor in a transparent, rules-based way. Its historical outperformance over cap-weighted indexes stems from rebalancing away from overpriced growth stocks toward under-priced, fundamentally large ones. Once FF3 controls are added, the abnormal return vanishes—indicating that fundamental weighting changes factor exposure, not efficiency. In essence, these portfolios are value-tilted passive strategies, not free lunches.
