# 1. READING - DFA’s Strategy

## 1. Investment philosophy.

In 100 words or less, describe DFA’s belief about how to find premium in the market.

<span style="color: blue; font-size:0.85em">


DFA believes in passive investment and market efficiency, that no one has the ability to consistently beat the market by picking stocks in the long run. DFA finds premium in the market via 1. cooperating and sharing profit with researechers 2. expanding client base by cooperating with registered investment advisors (RIAs) to reach retail investors.

To what degree does their strategy rely on individual equity analysis? Macroeconomic fundamentals? Efficient markets?

<span style="color: blue; font-size:0.85em">

• <em>Individual equity analysis:</em> Minimal. DFA does not do discretionary fundamental stock selection.

• <em>Macroeconomic views:</em> Minimal. No top‑down timing/forecast overlays.

• <em>Efficient markets:</em> Central. Portfolios target systematic factors (size, value) documented by research, while skilled trading supplies price‑sensitive execution and liquidity to reduce costs. DFA or Fama-Frech Three-Factor Model believes that value stocks yield higher returns than growth stocks because they are riskier in a rational, efficient market. 

Are DFA’s funds active or passive?


<span style="color: blue; font-size:0.85em">

Passive.

What do DFA and others mean by a “value” stock? And a “growth” stock?


<span style="color: blue; font-size:0.85em">

Value Stock refers to those with high book-to-market BE/ME ratio. Growth stock refers to those with lose BE/ME.

## 2. Challenges for DFA’s view.
What challenge did DFA’s model see in the 1980’s?


<span style="color: blue; font-size:0.85em">

After the deep recession, large scale stocks had been the main drivers of the boom of S&P, while small stocks continue to lag. Fama and French also wrote a paper to explain the profitablity of small stocks have been poor during 1980s to early 1990s.

Although DFA's small stock funds beat the relative benchmarks and their competitors in the area, their perfomances lagged behind those who invested in large stocks and the S&P itself.

And in the 1990’s?


<span style="color: blue; font-size:0.85em">

Value stock rosed steadily in this decade. However, light asset Tech stocks with high market value performance very well in during that period, too. DFA was avoiding growth stocks and missed the growth entirely. They face pressure from investors who wants to benefit from the growth of Tech stocks.

## 3. The market.
Exhibit 3 has data regarding a universe of 5,020 firms. How many are considered ``large cap”? What percent of the market value do they account for?



<span style="color: blue; font-size:0.85em">

“Large cap” in Exhibit 3 comprises <strong>207</strong> companies (out of 5,020) and accounts for about <strong>70%</strong> of total market value.

Exhibit 6 shows that the U.S. value factor (HML) has underperformed the broader U.S. equity market in 1926-2001, including every subsample except 1963-1981. So why should an investor be interested in this value factor?

<span style="color: blue; font-size:0.85em">

Even if U.S. HML underperforms the market in some subperiods, it has delivered a positive long‑run average with lower volatility than the market and low correlation to it. Adding HML exposure can improve diversification and expected Sharpe; globally, the value effect appears robust across many countries. As a risk premium, periods of underperformance are exactly what make the premium persist.

# 3. CAPM

DFA believes that premia in stocks and stock portfolios is related to the three factors. 

Let's test `25` equity portfolios that span a wide range of size and value measures.

#### Footnote
For more on the portfolio construction, see the description at Ken French's data library. 
https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/tw_5_ports.html

#### Portfolios
Monthly **total** return data on `25` equity portfolios sorted by their size-value characteristics. Denote these as $\vec{r}^{i}$, for $n=1, \ldots, 25$.
- Note that while the factors were given as excess returns, the portfolios are total returns.
- For this entire problem, focus on the 1981-Present subsample.

### 1. Summary Statistics. 

For each portfolio, 
- Use the Risk-Free rate column in the factors tab to convert these total returns to excess returns.
- Calculate the (annualized) univariate statistics from `1.1`.

In [15]:
# Libraries

import pandas as pd
import numpy as np
import statsmodels.api as sm

In [10]:
# Utility functions
def to_decimal_if_percent(s: pd.Series) -> pd.Series:
    """
    If values look like percentages (e.g., 1.2 = 1.2%), convert to decimals (0.012).
    Heuristic: if median absolute value > 0.5, assume percent.
    """
    s_num = pd.to_numeric(s, errors="coerce")
    if s_num.notna().any() and s_num.abs().median() > 0.5:
        return s_num / 100.0
    return s_num

def annualize_stats(monthly_mean: pd.Series, monthly_std: pd.Series) -> pd.DataFrame:
    """
    Given monthly mean/vol (std), return a DataFrame with annualized stats and Sharpe.
    """
    mean_ann = monthly_mean * 12
    vol_ann = monthly_std * np.sqrt(12)
    sharpe_ann = mean_ann / vol_ann
    out = pd.DataFrame({
        "mean_ann": mean_ann,
        "vol_ann": vol_ann,
        "sharpe_ann": sharpe_ann,
        "mean_monthly": monthly_mean,
        "vol_monthly": monthly_std,
        "n_obs": monthly_mean.index.map(lambda c: np.nan)  # placeholder; we'll fill actual counts later
    })
    return out

In [18]:
# Read data from Excel file
DATA_PATH = "data/dfa_analysis_data.xlsx"
FACTORS_SHEET = "factors"
PORTFOLIOS_SHEET = "portfolios (total returns)"

# Subset start date
SUB_START = "1981-01-01"

# Load data
factors = pd.read_excel(DATA_PATH, sheet_name=FACTORS_SHEET)
ports = pd.read_excel(DATA_PATH, sheet_name=PORTFOLIOS_SHEET)

# Ensure datetime index
factors["Date"] = pd.to_datetime(factors["Date"])
factors = factors.set_index("Date").sort_index()

ports["Date"] = pd.to_datetime(ports["Date"])
ports = ports.set_index("Date").sort_index()

# Get portfolio column names
portfolio_cols = list(ports.columns)

# Combine datasets and subset to desired date range
data = ports.join(factors[["RF"]], how="inner")
data = data.loc[data.index >= SUB_START].copy()

# Convert to excess returns
excess_ports = data[portfolio_cols].sub(data["RF"], axis=0)

# Summary statistics
mean_m = excess_ports.mean(skipna=True)
std_m = excess_ports.std(ddof=1, skipna=True)


# Annualize stats
summary = annualize_stats(mean_m, std_m)


# Display results
print("\nSummary statistics for 25 Size–Value portfolios (excess returns), 1981–present:")
summary.round(4)



Summary statistics for 25 Size–Value portfolios (excess returns), 1981–present:


Unnamed: 0,mean_ann,vol_ann,sharpe_ann,mean_monthly,vol_monthly,n_obs
SMALL LoBM,0.0117,0.2717,0.0431,0.001,0.0784,
ME1 BM2,0.0884,0.2354,0.3756,0.0074,0.068,
ME1 BM3,0.0902,0.2008,0.4493,0.0075,0.058,
ME1 BM4,0.1125,0.194,0.58,0.0094,0.056,
SMALL HiBM,0.1273,0.2084,0.611,0.0106,0.0601,
ME2 BM1,0.0609,0.2447,0.249,0.0051,0.0706,
ME2 BM2,0.0984,0.2054,0.479,0.0082,0.0593,
ME2 BM3,0.1052,0.1864,0.564,0.0088,0.0538,
ME2 BM4,0.1081,0.1819,0.5942,0.009,0.0525,
ME2 BM5,0.1132,0.2137,0.5298,0.0094,0.0617,


### 2. CAPM

The Capital Asset Pricing Model (CAPM) asserts that an asset (or portfolio's) expected excess return is completely a function of its beta to the equity market index (`SPY`, or in this case, `MKT`.) 

Specifically, it asserts that, for any excess return, $\tilde{r}^{i}$, its mean is proportional to the mean excess return of the market, $\tilde{r}^{\text{mkt}}$, where the proporitonality is the regression beta of $\tilde{r}^{i}$ on $\tilde{r}^{\text{mkt}}$.

$$
\mathbb{E}\left[\tilde{r}_{t}^{i}\right] = \beta^{i,\text{mkt}}\; \mathbb{E}\left[\tilde{r}_{t}^{\text{mkt}}\right]
$$

Let's examine whether that seems plausible.

For each of the $n=25$ test portfolios, run the CAPM time-series regression:

$$
\tilde{r}_{t}^{i} = \alpha^i + \beta^{i,\text{mkt}}\; \tilde{r}_{t}^{\text{mkt}} + \epsilon_{t}^{i}
$$

So you are running 25 separate regressions, each using the $T$-sized sample of time-series data.

* Report the betas and alphas for each test asset.

* Report the mean-absolute-error of the CAPM:
$$\text{MAE} = \frac{1}{n}\sum_{i=1}^n \left|\alpha_i\right|$$

If the CAPM were true, what would we expect of the MAE?

- Report the estimated $\beta^{i,\text{mkt}}$, Treynor Ratio, $\alpha^i$, and Information Ratio for each of the $n$ regressions.

- If the CAPM model were true, what would be true of the Treynor Ratios, alphas, and Information Ratios?

In [None]:
results = []
for col in excess_ports.columns:
    # y = Montly excess returns
    y = excess_ports[col].dropna()

    # x = (Mkt-RF)
    x_raw = factors.loc[:, "Mkt-RF"]
    x = x_raw.reindex(y.index).dropna()
    y = y.reindex(x.index) # align y with x

    # OLS: y_t = alpha + beta * (Mkt-RF)_t + e_t
    X = sm.add_constant(x)
    model = sm.OLS(y, X).fit()

    alpha_m = model.params["const"] 
    beta = model.params["Mkt-RF"]
    r2 = model.rsquared

    # Fitted values and residuals
    y_hat = model.fittedvalues
    resid = model.resid

    # Mean Absolute Error
    mae_m = (y - y_hat).abs().mean()

    # Annualized metrics
    alpha_ann = alpha_m * 12.0
    resid_std_ann = resid.std(ddof=1) * np.sqrt(12.0)
    info_ratio_ann = (alpha_ann / resid_std_ann) if resid_std_ann > 0 else np.nan

    mean_excess_ann = y.mean() * 12.0
    treynor_ann = (mean_excess_ann / beta) if beta != 0 else np.nan

    results.append({
        "portfolio": col,
        "alpha_m": alpha_m,
        "alpha_ann": alpha_ann,
        "beta": beta,
        "R2": r2,
        "MAE_m": mae_m,
        "Treynor_ann": treynor_ann,
        "InfoRatio_ann": info_ratio_ann,
        "T": int(model.nobs),
    })

capm_summary = pd.DataFrame(results).set_index("portfolio").sort_index()
print("\nCAPM results (using existing `excess_ports`):")
capm_summary.round(4)


CAPM results (using existing `excess_ports`):


Unnamed: 0_level_0,alpha_m,alpha_ann,beta,R2,MAE_m,Treynor_ann,InfoRatio_ann,Nobs
portfolio,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
BIG HiBM,0.0014,0.0164,1.026,0.6087,0.0267,0.1009,0.1282,536
BIG LoBM,0.0009,0.0103,0.9955,0.8935,0.012,0.0953,0.1926,536
ME1 BM2,-0.0009,-0.0106,1.1658,0.59,0.0297,0.0759,-0.0705,536
ME1 BM3,0.0001,0.001,1.0495,0.6571,0.0257,0.086,0.0088,536
ME1 BM4,0.0025,0.0295,0.9773,0.6105,0.0252,0.1151,0.2435,536
ME2 BM1,-0.0044,-0.0524,1.3341,0.7154,0.0279,0.0457,-0.4018,536
ME2 BM2,0.0001,0.0016,1.139,0.7401,0.0227,0.0864,0.0151,536
ME2 BM3,0.0014,0.0171,1.0357,0.7426,0.0208,0.1015,0.1812,536
ME2 BM4,0.0021,0.0251,0.9765,0.6937,0.0217,0.1107,0.2493,536
ME2 BM5,0.0016,0.0188,1.1108,0.6505,0.0268,0.1019,0.1488,536


<span style="color: blue; font-size:0.85em">

<em>If CAPM were True:</em>

•  We expect portfolios to have similar Treynor ratios; Alphas ≈ 0; Information Ratios ≈ 0.

•  We expect MAE to be > 0 even if the CAPM is true, because idiosyncratic (residual) risk remains. As MAE reflects residual volatility, not the credibility of the model.

### 3. Cross-sectional Estimation

Let's test the CAPM directly. We already have what we need:

- The dependent variable, (y): mean excess returns from each of the $n=25$ portfolios.
- The regressor, (x): the market beta from each of the $n=25$ time-series regressions.

Then we can estimate the following equation:

$$
\underbrace{\mathbb{E}\left[\tilde{r}^{i}\right]}_{n\times 1\text{ data}} = \textcolor{ForestGreen}{\underbrace{\eta}_{\text{regression intercept}}} + \underbrace{{\beta}^{i,\text{mkt}};}_{n\times 1\text{ data}}~ \textcolor{ForestGreen}{\underbrace{\lambda_{\text{mkt}}}_{\text{regression estimate}}} + \textcolor{ForestGreen}{\underbrace{\upsilon}_{n\times 1\text{ residuals}}}
$$

Note that
- we use sample means as estimates of $\mathbb{E}\left[\tilde{r}^{i}\right]$. 
- this is a weird regression! The regressors are the betas from the time-series regressions we already ran!
- this is a single regression, where we are combining evidence across all $n=25$ series. Thus, it is a cross-sectional regression!
- the notation is trying to emphasize that the intercept is different than the time-series $\alpha$ and that the regressor coefficient is different than the time-series betas.

Report
- the R-squared of this regression.
- the intercept, $\eta$. 
- the regression coefficient, $\lambda_{\text{mkt}}$.

What would these three statistics be if the CAPM were completely accurate?

In [23]:
# 1) Get portfolio betas 
betas = capm_summary['beta'].copy()

# 2) Dependent variable: mean monthly excess return for each portfolio
mean_m_cs = excess_ports.mean(skipna=True)
y_cs = mean_m_cs.reindex(betas.index)  # align to same portfolios

# 3) Cross-sectional OLS:  E[r_i] = a + b * beta_i + error_i
X_cs = sm.add_constant(betas.rename('beta'))
cs_model = sm.OLS(y_cs, X_cs).fit()

a_hat_m = cs_model.params['const']     # intercept (monthly)
b_hat_m = cs_model.params['beta']      # slope (monthly)
R2_cs   = cs_model.rsquared

# 4) For reference: sample mean market excess return (monthly)
mkt_m = factors.loc[excess_ports.index, "Mkt-RF"].dropna()
mkt_mean_m = mkt_m.mean()

# 5) Annualized versions (×12 for means/slopes)
a_hat_ann = a_hat_m * 12.0
b_hat_ann = b_hat_m * 12.0
mkt_mean_ann = mkt_mean_m * 12.0

print("\nCross-sectional CAPM regression (monthly units):  E[r_i] = a + b * beta_i")
print(f"R^2 (cross-section): {R2_cs:.4f}")
print(f"Intercept a (monthly): {a_hat_m:.6f}   | annualized: {a_hat_ann:.6f}")
print(f"Slope b (monthly):     {b_hat_m:.6f}   | annualized: {b_hat_ann:.6f}")
print(f"Reference: sample mean market excess (monthly): {mkt_mean_m:.6f}  | annualized: {mkt_mean_ann:.6f}")



Cross-sectional CAPM regression (monthly units):  E[r_i] = a + b * beta_i
R^2 (cross-section): 0.3132
Intercept a (monthly): 0.017153   | annualized: 0.205842
Slope b (monthly):     -0.008826   | annualized: -0.105912
Reference: sample mean market excess (monthly): 0.007082  | annualized: 0.084981


### 4. Conclusion

Broadly speaking, do these results support DFA's belieef in size and value portfolios containing premia unrelated to the market premium?

<span style="color: blue; font-size:0.85em">

Yes. The cross-sectional CAPM on the 25 size-value portfolios (1981–present) shows a weak fit (R² ≈ 0.31), a large positive intercept (≈ 1.7% per month, ~20.6%/yr), and a negative slope (≈ −0.9% per month) despite the market’s positive mean excess return (~0.7% per month). 

This contradicts the CAPM Security Market Line (which requires a = 0 and a positive slope equal to the market mean), indicating that market beta alone cannot explain the cross-section of returns. The evidence is therefore consistent with distinct size and value-growth effect , in line with DFA’s view.