# Risk, Return, and Equilibrium Empirical Test

*Author: Dacian Peng (彭德鑫)*

*Scripted in: 2022-07*

Data source: *Wind Financial Terminal*

Formulae comply with $\LaTeX$ standards

Time consumptions are measured on AMD Ryzen™ 7 6800H

Reference:

Fama, E. F. and J. D. MacBeth (1973). "Risk, Return, and Equilibrium Empirical Test." Journal of Political Economy 81: 607-636.


# Part I. Paper

### CAPM

If we accept the CAPM (Tobin $1958$, Markowitz $1959$, Fama $1965b$ ), we accept the following:

**A perfect capital market**

"The capital market is assumed to be perfect in the sense that investors are price takers and there are neither transactions costs nor information costs."

**Two-parameter (now we call it mean-variance model) return distributions are normal**

"Distribution of one-period percentage returns on all assets and portfolios are assumed to be normal or to conform to some other two-parameter member of the symmetric stable class."

**Investor risk aversion**

"Investors are assumed to be risk averse and to behave as if they choose among portfolios on the basis of maximum expected utility."

**Efficient set theorm**

We can get the *Efficient set theorm* from the above 3 conditions, that is, "The optimal portfolio for any investor must be efficient in the sense that no other portfolio with the same or higher expected return has lower dispersion of return." (that is, maxmize mean-variance)


### If CAPM holds

According to **Efficient set theorm**, we can infer that

"In the portfolio model the investor looks at individual assets only in terms of their contributions to the expected value and dispersion, or risk, of his portfolio return. With normal return distributions the risk of portfolio $p$ is measured by the standard deviation, $\sigma\left(\widetilde{R}_{p}\right)$, of its return, $\widetilde{R}_{p},{ }^{2}$ and the risk of an asset for an investor who holds $p$ is the contribution of the asset to $\sigma\left(\widetilde{R}_{p}\right)$."

We use math to describe this quote.

$$E\left(\widetilde{R}_{m}\right)=\sum_{i=1}^{N} x_{i m} E\left(\widetilde{R}_{i}\right) \quad s.t.\ \sigma\left(\widetilde{R}_{p}\right)=\sigma\left(\widetilde{R}_{m}\right) and \sum_{i=1}^{N} x_{i m}=1$$





Use Lagrangian methods, we can get $$E\left(\widetilde{R}_{i}\right)-E\left(\widetilde{R}_{m}\right)=S_{m}\left[\frac{\sum_{j=1}^{N} x_{j m} \sigma_{i j}}{\sigma\left(\widetilde{R}_{m}\right)}-\sigma\left(\widetilde{R}_{m}\right)\right]\tag{1}$$

where $S_{m}$ is the rate of change of $E\left(\widetilde{R}_{p}\right)$ with respect to a change in $\sigma\left(\widetilde{R}_{p}\right)$ at the point on the efficient set corresponding to portfolio $m$.

"The equation says that the difference between the expected return on the asset and the expected return on the portfolio is proportional to the difference between the risk of the asset and the risk of the portfolio. The proportionality factor is $S_{m}$, the slope of the efficient set at the point corresponding to the portfolio $m$. And the risk of the asset is its contribution to total portfolio risk, $\sigma\left(\widetilde{R}_{m}\right)$."



Let's transpose some terms, we will get $$E\left(\widetilde{R}_{i}\right)=\left[E\left(\widetilde{R}_{m}\right)-S_{m} \sigma\left(\widetilde{R}_{m}\right)\right]+S_{m}\frac{cov(\widetilde{R}_{i},\widetilde{R}_{m})}{\sigma\left(\widetilde{R}_{m}\right)}$$

where we have $$\beta_{i} \equiv \frac{\operatorname{cov}\left(\widetilde{R}_{i}, \widetilde{R}_{m}\right)}{\sigma^{2}\left(\widetilde{R}_{m}\right)}=\frac{\sum_{j=1}^{N} x_{j m} \sigma_{i j}}{\sigma^{2}\left(\widetilde{R}_{m}\right)}=\frac{\operatorname{cov}\left(\widetilde{R}_{i}, \widetilde{R}_{m}\right) / \sigma\left(\widetilde{R}_{m}\right)}{\sigma\left(\widetilde{R}_{m}\right)}\tag{3}$$

**NOTE**
1. here we write (3) prior to (2)

2. there are different methods to get $β$, you can use both the above method and linear regression. I think it's better to make some brief statements to prove these two methods are equivalent.

------

Let's assume we have sample dots $(x_1,y_1),(x_2,y_2),...(x_n,y_n)$, and want to fit $y=a+bx$

These two equations always hold

$$\operatorname{σ}(x)=\frac{1}{n-1} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2} \text { and } \operatorname{cov}(x, y)=\frac{1}{n-1} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)$$ 

So $$\hat{b}=\frac{1}{n-1}\sum_{i=1}^{n}slope_i=\frac{1}{n-1}\sum_{i=1}^{n}\frac{(y_i-\bar{y})}{(x_i-\bar{x})}=\frac{\operatorname{cov}(x, y)}{σ(x)}$$

------

Use equations (1) and (3), we get $$E\left(\widetilde{R}_{i}\right)=\left[E\left(\widetilde{R}_{m}\right)-S_{m} \sigma\left(\widetilde{R}_{m}\right)\right]+S_{m} \sigma\left(\widetilde{R}_{m}\right) \beta_{i}\tag{2}$$


### Test the Expected Return

If we assume an asset with no systematic risk, then $β\equiv0$ ,bring it into (2), we get $$E\left(\widetilde{R}_{0}\right) \equiv E\left(\widetilde{R}_{m}\right)-S_{m} \sigma\left(\widetilde{R}_{m}\right)\tag{4}$$

Let's transpose some terms, we will get

$$S_{m}=\frac{E\left(\widetilde{R}_{m}\right)-E\left(\widetilde{R}_{0}\right)}{\sigma\left(\widetilde{R}_{m}\right)}\tag{5}$$

so that (2) can be rewritten

$$E\left(\widetilde{R}_{i}\right)=E\left(\widetilde{R}_{0}\right)+\left[E\left(\widetilde{R}_{m}\right)-E\left(\widetilde{R}_{0}\right)\right] \beta_{i}\tag{6}$$

Here we have three testable implications:    

C1:   Linear relationship between the expected return on a security and its risk in any efficient portfolio $m$.

C2:   No risk factors other than $β_i$  exists.

C3:   Higher return means higher risk, i.e. $R_m$ > $R_0$.


### In Market Equilibrium

Without market equilibrium held, we can not say C1 or C2 holds

We should note that

1. efficient captial market

2. homogeneous expectation

3. short-selling allowed

Then, we can figure the whole market one big investor, and apply methods which are previously only available to individual investors

### A Stochastic Model for Returns (just daunting, it's easy actually)

Famous Fama-MacBeth two-pass regression is here!

A generalization for (6)

$$\widetilde{R}_{i t}=\widetilde{\gamma}_{0 t}+\widetilde{\gamma}_{1 t} \beta_{i}+\widetilde{\gamma}_{2 t} \beta_{i}{ }^{2}+\widetilde{\gamma}_{3 t} s_{i}+\widetilde{\eta}_{i t}\tag{7}$$

Market efficiency in the two-parameter model requires these

For C1, $E\left(\widetilde\gamma_{2 t}\right)=0$

(In fact, we can generalize the equation for $β_{i}^{3}$, $β_{i}^{\frac{1}{2}}$ etc...)

For C2, $E\left(\widetilde\gamma_{3 t}\right)=0$

For C3, check (6), $E\left(\widetilde\gamma_{1 t}\right)=E\left(R_{m t}\right)-E\left(R_{f t}\right)>0$

are fair games

### Sharpe (1963) and Lintner (1965)

The original two-parameter "CAPM" of Sharpe (1963) and Lintner (1965) is with the assumption that there is unrestricted riskless borrowing and lending at the known rate.

Since $\beta_{f}=0, E\left(\widetilde{\gamma}_{0 t}\right)=R_{f t}$ and market efficiency requires that $\widetilde{\gamma}_{0 t}-R_{f t}$ be a fair game.

Here we have another testable assumption

$$\widetilde{\gamma}_{0 t}-R_{f t}=0$$

### Summary for Part I.

$$\widetilde{R}_{i t}=\widetilde{\gamma}_{0 t}+\widetilde{\gamma}_{1 t} \beta_{i}+\widetilde{\gamma}_{2 t} \beta_{i}{ }^{2}+\widetilde{\gamma}_{3 t} ε_{i}+\widetilde{\eta}_{i t}\tag{7}$$


C1 (linearity) $\cdots E\left(\widetilde{\gamma}_{2 t}\right)=\mathbf{0}$

C2 (no systematic effects of non- $\beta$ risk) $\cdots E\left(\widetilde{\gamma}_{3 t}\right)=\mathbf{0}$

C3 (positive expected return-risk tradeoff) $$E\left(\widetilde{\gamma}_{1 t}\right)=E\left(\widetilde{R}_{m t}\right)-E\left(\widetilde{R}_{0 t}\right)>\mathbf{0}$$

Sharpe - Lintner (S-L) Hypothesis $\cdots E\left(\widetilde{\gamma}_{\theta t}\right)=R_{ft}$

### Previous Work

|                1969 |                    1972 |             1970 |                       1972 |
| ------------------: | ----------------------: | ---------------: | -------------------------: |
|             Douglas |      Miller and Scholes | Friend and Blume | Black, Jensen, and Scholes |
| Refute condition C2 | Support Douglas’s test | Average $\tilde{\gamma}_{0 t}$ is systematically greater than $R_{f t}$ | the same with Friend and Blume |

# Part II. Empirical test on China stock market

Risk free rate is token from 1 year fixed-term deposit rate

In [1]:
import time
import platform
import statsmodels
import scipy

import pandas as pd
import numpy as np
import pickle
import statsmodels.api as sm

from scipy import stats
from functools import reduce
from itertools import combinations
from pandas import to_datetime as dt


pd.DataFrame(index=[''], columns=['Last Run Time', 'Python', 'pandas', 'numpy', 'scipy', 'statsmodels'], data=[
             [time.asctime(), platform.python_version(), pd.__version__, np.__version__, scipy.__version__, statsmodels.__version__]])

Unnamed: 0,Last Run Time,Python,pandas,numpy,scipy,statsmodels
,Thu Jul 7 13:46:24 2022,3.10.4,1.4.2,1.22.4,1.8.1,0.13.2


## Data Preprocessing

Skip loading data part and use pickle to directly load the processed data

In [2]:
%%time
data = pickle.load(open('data/Full_frames','rb'))
close = data.close.unstack().T
CSI300 = pickle.load(open('data/CSI300','rb'))
r_f = pickle.load(open('data/r_f','rb'))

CPU times: total: 1.11 s
Wall time: 1.78 s


In [3]:
filter_num = 10
# Note, the original method adjusted $β_{p,t}$ monthly to exclude the delisted stocks.
# Here we removed such stocks in the data preprocessing part (by using stocks traded {{(1-1/filter_num)*100}}% out of full time span)
close = close.loc[:,pd.isna(close).sum()<len(close)/filter_num]
close = close.fillna(method='bfill').groupby(pd.Grouper(freq='M')).apply(lambda x: x.dropna(axis=1).iloc[-1]).unstack().to_period('M')
R_i = close.pct_change().dropna()
R_m = CSI300.pct_change().dropna()

In [4]:
co_time = reduce(np.intersect1d,[R_m.index,R_i.index,r_f.index])

In [5]:
R_m = R_m.loc[co_time]
R_i = R_i.loc[co_time]
r_f = r_f.loc[co_time]

In [6]:
strat_time = '2015-1'
time_spliter_1 = '2017-1'
time_spliter_2 = '2019-1'
end_time = '2022-1'

first_time_span = np.logical_and(R_m.index.to_timestamp() >= dt(strat_time), R_m.index.to_timestamp() <= dt(time_spliter_1))
second_time_span = np.logical_and(R_m.index.to_timestamp() > dt(time_spliter_1), R_m.index.to_timestamp() <= dt(time_spliter_2))
third_time_span = np.logical_and(R_m.index.to_timestamp() > dt(time_spliter_2), R_m.index.to_timestamp() <= dt(end_time))


In [7]:
first_time_span_R_m = R_m.loc[first_time_span]
first_time_span_R_i = R_i.loc[first_time_span]
first_time_span_r_f = r_f.loc[first_time_span]

second_time_span_R_m = R_m.loc[second_time_span]
second_time_span_R_i = R_i.loc[second_time_span]
second_time_span_r_f = r_f.loc[second_time_span]

third_time_span_R_m = R_m.loc[third_time_span]
third_time_span_R_i = R_i.loc[third_time_span]
third_time_span_r_f = r_f.loc[third_time_span]

## Divide into groups

Estimate $β_i$ in the first time_span

In [8]:
first_time_span_σ_R_m_squared = np.var(first_time_span_R_m)

In [9]:
first_time_span_β_i = first_time_span_R_i.apply(lambda one_stock:np.cov(first_time_span_R_m, one_stock)[0][1])
first_time_span_β_i /= first_time_span_σ_R_m_squared

In [10]:
# remove pass to inspect
first_time_span_β_i
pass

Rank into 20 portfolios

In [11]:
initial_ranks = pd.qcut(first_time_span_β_i, 20, labels=False)

Get portfolio $β_{p,t}$ (it is $β_{p,t-1}$ in paper, but actually, the denoted variable in code has the same meaning)

In [12]:
β_i = pd.DataFrame(first_time_span_β_i, columns=['β_i'])
β_i['rank'] = initial_ranks
β_p = β_i.groupby('rank').mean().β_i

In [13]:
# remove pass to inspect
β_p
pass

Note, the original method adjusted $β_{p,t}$ monthly to exclude the delisted stocks. Here we removed such stocks in the data preprocessing part (by using stocks traded {{(1-1/filter_num)*100}}% out of full time span)

So, we update $β_{p,t}$ yearly instead

Note, in the paper, the portfolio is adjusted monthly because of stocks' delisting. But β estimates are updated yearly.

Here the portfolio is adjusted with a yearly basis of β estimates.

Non-β risk

Let's review the market model

$$\widetilde{R}_{i t}=a_{i}+\beta_{i} \widetilde{R}_{m t}+\widetilde{\epsilon}_{i t}\tag{8}$$

Then we get

$$
\sigma^{2}\left(\widetilde{R}_{i}\right)=\beta_{i}^{2} \sigma^{2}\left(\widetilde{R}_{m}\right)+\sigma^{2}\left(\widetilde{\epsilon}_{i}\right)+2 \beta_{i} \operatorname{cov}\left(\widetilde{R}_{m}, \widetilde{\epsilon}_{i}\right)
\tag{9}$$

where,

$\sigma^{2}\left(\widetilde{R}_{i}\right)$ measures total risk

$\beta_{i}^{2} \sigma^{2}\left(\widetilde{R}_{m}\right)$ measures β risk

$ \operatorname{cov}\left(\widetilde{R}_{m}, \widetilde{\epsilon}_{i}\right) \equiv 0$

so,

$\sigma\left(\widetilde{\epsilon}_{i}\right)$ measures non-β risk

$\bar{s}_{p, t-1}\left(\hat{\epsilon}_{i}\right)$ is likewise the average of $s\left(\hat{\epsilon}_{i}\right)$ for securities in portfolio $p$. The $s\left(\hat{\epsilon}_{i}\right)$ are computed from data for the same time_span as the component $\hat{\beta}_{i}$ of $\hat{\beta}_{p, t-1}$, and like these $\hat{\beta}_{i}$, they are updated annually.

Also, the $\bar{s}_{p, t}\left(\hat{\epsilon}_{i}\right)$ in code has the same meaning as $\bar{s}_{p, t-1}\left(\hat{\epsilon}_{i}\right)$

In [14]:
portfolio_nums = 20
β_p_t = pd.DataFrame(index=third_time_span_R_i.index, columns=np.arange(portfolio_nums))
s_ε_p_t = β_p_t.copy()
s_ε_i_t_std = β_p_t.copy()
R_p_t = s_ε_p_t.copy()

estimate_time_spot = third_time_span_R_i.index[::12].append(pd.Index([dt('2022-02').to_period('M')]))
last_end_time = third_time_span_R_i.index[0]

for end_time in estimate_time_spot:

    rolling_time_slicer = third_time_span_R_i.index <= end_time
    rolling_R_m = pd.concat([second_time_span_R_m, third_time_span_R_m[rolling_time_slicer]])
    rolling_R_i = pd.concat([second_time_span_R_i, third_time_span_R_i[rolling_time_slicer]])
    rolling_r_f = pd.concat([second_time_span_r_f, third_time_span_r_f[rolling_time_slicer]])

    rolling_σ_R_m_squared = np.var(rolling_R_m)

    rolling_β_i = rolling_R_i.apply(lambda one_stock: np.cov(rolling_R_m, one_stock)[0][1])
    rolling_β_i /= rolling_σ_R_m_squared

    rolling_ranks = pd.qcut(rolling_β_i, 20, duplicates='drop', labels=False)

    rolling_β_p_t = rolling_β_i.groupby(rolling_ranks.values).mean()

    rolling_ε_i_t = rolling_R_i.apply(lambda one_stock: sm.OLS(one_stock,sm.add_constant(rolling_R_m-rolling_r_f)).fit().resid)
    rolling_ε_p_t = rolling_ε_i_t.groupby(rolling_ranks.values,axis=1).mean()
    rolling_ε_i_t_std = rolling_ε_i_t.groupby(rolling_ranks.values,axis=1).std()
    rolling_R_p_t = rolling_R_i.groupby(rolling_ranks.values,axis=1).mean()

 
    time_slicer = np.logical_and(β_p_t.index >= last_end_time, β_p_t.index <= end_time)
    rolling_time_slicer = np.logical_and(rolling_ε_p_t.index >= last_end_time, rolling_ε_p_t.index <= end_time)
    last_end_time = end_time

    β_p_t[time_slicer] = rolling_β_p_t
    s_ε_p_t[time_slicer] = rolling_ε_p_t[rolling_time_slicer]
    s_ε_i_t_std[time_slicer] = rolling_ε_i_t_std[rolling_time_slicer]
    R_p_t[time_slicer] = rolling_R_p_t[rolling_time_slicer]

β_p_t = β_p_t.stack().astype(float)
s_ε_p_t = s_ε_p_t.stack().astype(float)
s_ε_i_t_std = s_ε_i_t_std.stack().astype(float)
R_p_t = R_p_t.stack().astype(float)

In [15]:
analysis_co_time = np.intersect1d(R_m.index,β_p_t.unstack().index)

analysis_R_m = pd.Series(np.repeat(R_m[analysis_co_time], 20).values, index=β_p_t.index)
analysis_r_f = pd.Series(np.repeat(r_f[analysis_co_time], 20).values, index=β_p_t.index)
data = pd.concat([R_p_t, β_p_t, β_p_t**2, s_ε_p_t, s_ε_i_t_std, analysis_R_m, analysis_r_f], axis=1)
data['const'] = 1
data.columns = [ 'R_p', 'β', 'β_squared', 'ε_p', 'ε_i_std', 'R_m', 'r_f', 'const']
data.index.names = ['time', 'portfolio']

In [16]:
data

Unnamed: 0_level_0,Unnamed: 1_level_0,R_p,β,β_squared,ε_p,ε_i_std,R_m,r_f,const
time,portfolio,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2019-02,0,0.159332,0.195633,0.038272,0.126680,0.089176,0.146093,0.00125,1
2019-02,1,0.160791,0.477769,0.228263,0.102116,0.066019,0.146093,0.00125,1
2019-02,2,0.160404,0.581532,0.338179,0.087495,0.080543,0.146093,0.00125,1
2019-02,3,0.171511,0.645488,0.416654,0.087987,0.075783,0.146093,0.00125,1
2019-02,4,0.166387,0.715000,0.511224,0.079047,0.062872,0.146093,0.00125,1
...,...,...,...,...,...,...,...,...,...
2022-01,15,-0.123984,1.110933,1.234171,-0.038073,0.087064,-0.076229,0.00125,1
2022-01,16,-0.101567,1.176856,1.384991,-0.007439,0.084498,-0.076229,0.00125,1
2022-01,17,-0.112727,1.272809,1.620043,-0.013147,0.087585,-0.076229,0.00125,1
2022-01,18,-0.104992,1.446125,2.091277,0.005371,0.082343,-0.076229,0.00125,1


## Table 2

**$r\left(R_{p}, R_{m}\right)^{2}$ not included**

In [17]:
def get_table_2(start_time,end_time):
    full_time = data.index.get_level_values(0)
    time_span = np.logical_and(full_time >= dt(start_time).to_period('M'), full_time <= dt(end_time).to_period('M'))
    table = pd.DataFrame(index=['β_p', 's(β_p)', 's(R_p)', 's(ε_p)', 'bar_s_p(ε_i)', 's(ε_p)/bar_s_p(ε_i)'], columns=np.arange(portfolio_nums))
    table.loc['β_p'] = data[time_span].unstack().β.mean()
    table.loc['s(β_p)'] = data[time_span].unstack().β.std()
    table.loc['s(R_p)'] = data[time_span].unstack().R_p.std()
    table.loc['s(ε_p)'] = data[time_span].unstack().ε_p.std()
    table.loc['bar_s_p(ε_i)'] = data[time_span].unstack().ε_i_std.mean()
    table.loc['s(ε_p)/bar_s_p(ε_i)'] = table.loc['s(ε_p)']/table.loc['bar_s_p(ε_i)']
    return table

Before Pandemic

In [18]:
get_table_2('2019-2','2020-2')

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
β_p,0.196167,0.475872,0.58053,0.645178,0.713277,0.762017,0.808582,0.857068,0.911636,0.965895,1.011405,1.049687,1.09809,1.155596,1.210375,1.271478,1.339382,1.429548,1.578825,2.084402
s(β_p),0.001928,0.00684,0.003611,0.001117,0.00621,0.006494,0.006441,0.008321,0.013802,0.017809,0.019734,0.019646,0.021178,0.021055,0.016639,0.015843,0.01596,0.00658,0.006498,0.038564
s(R_p),0.065079,0.054367,0.056578,0.059052,0.058046,0.062882,0.063555,0.065931,0.07234,0.070696,0.077146,0.071656,0.07728,0.084359,0.081188,0.084695,0.086093,0.10638,0.111771,0.159233
s(ε_p),0.062195,0.041324,0.038,0.040129,0.033959,0.0351,0.035233,0.036109,0.043211,0.036131,0.040409,0.03575,0.034928,0.035283,0.030451,0.036169,0.038985,0.046499,0.051826,0.079707
bar_s_p(ε_i),0.133241,0.090136,0.094315,0.099123,0.080942,0.082387,0.073155,0.074073,0.101111,0.086554,0.076565,0.081147,0.086652,0.080877,0.097004,0.100802,0.084202,0.093169,0.105858,0.159879
s(ε_p)/bar_s_p(ε_i),0.46679,0.458463,0.402901,0.40484,0.419549,0.426043,0.48162,0.487483,0.427365,0.417436,0.527772,0.440555,0.40308,0.436257,0.313914,0.35881,0.462993,0.499087,0.48958,0.498545


In Pandemic

In [19]:
get_table_2('2020-3','2021-2')

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
β_p,0.194826,0.446751,0.559939,0.632746,0.683968,0.731443,0.77748,0.819872,0.857038,0.897583,0.936588,0.975236,1.019207,1.07766,1.145687,1.209765,1.277146,1.396411,1.547943,1.9441
s(β_p),0.026876,0.022018,0.029699,0.030183,0.029937,0.031044,0.033479,0.032912,0.030011,0.031319,0.03166,0.031398,0.029091,0.027228,0.032253,0.031124,0.031583,0.038925,0.032064,0.041402
s(R_p),0.051377,0.047197,0.045574,0.043222,0.044935,0.056464,0.050679,0.060602,0.056461,0.05952,0.057292,0.067274,0.062367,0.06722,0.075359,0.063678,0.078962,0.089962,0.104908,0.151189
s(ε_p),0.053429,0.048896,0.039396,0.037404,0.035688,0.045802,0.040782,0.045308,0.044215,0.044986,0.044527,0.041804,0.038321,0.038992,0.042063,0.033541,0.036525,0.044448,0.051266,0.08722
bar_s_p(ε_i),0.143287,0.102715,0.108051,0.110087,0.085355,0.099634,0.098511,0.096831,0.109813,0.096763,0.117155,0.095737,0.101396,0.108286,0.102265,0.109889,0.114125,0.131715,0.138711,0.16123
s(ε_p)/bar_s_p(ε_i),0.372879,0.476035,0.364607,0.339772,0.418117,0.4597,0.413987,0.467911,0.402635,0.464914,0.38007,0.436651,0.37793,0.360086,0.411318,0.305228,0.320044,0.337452,0.369587,0.540968


Post Pandemic

In [20]:
get_table_2('2021-1','2022-2')

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
β_p,0.116644,0.382702,0.473544,0.544945,0.596882,0.641138,0.680091,0.724133,0.769739,0.806477,0.844489,0.8839,0.934582,0.998454,1.051866,1.119226,1.185272,1.283181,1.454669,1.823662
s(β_p),0.025822,0.021154,0.028534,0.028999,0.028763,0.029826,0.032166,0.031621,0.028833,0.03009,0.030418,0.030166,0.02795,0.02616,0.030987,0.029903,0.030344,0.037397,0.030806,0.039778
s(R_p),0.074007,0.063552,0.070523,0.053331,0.06729,0.054949,0.050531,0.054558,0.056302,0.055331,0.047582,0.055551,0.050471,0.059531,0.055331,0.061947,0.053077,0.059869,0.054927,0.062975
s(ε_p),0.075099,0.066519,0.075399,0.057313,0.06603,0.055575,0.054489,0.054206,0.055351,0.053899,0.049527,0.052452,0.049125,0.050023,0.050363,0.055521,0.043858,0.047036,0.034081,0.044875
bar_s_p(ε_i),0.141859,0.168146,0.178299,0.105935,0.138619,0.126495,0.099412,0.117143,0.138849,0.11818,0.111883,0.105498,0.115805,0.123159,0.115846,0.117774,0.12095,0.123477,0.138807,0.127004
s(ε_p)/bar_s_p(ε_i),0.529391,0.395601,0.422883,0.541022,0.476342,0.439345,0.548114,0.462731,0.39864,0.456072,0.442669,0.497185,0.424206,0.40617,0.434743,0.471422,0.362611,0.38093,0.245528,0.353337


## Table 3 

**(panel D, and $r^2, ρ$ not included)**

Famous Fama-Macbeth regression (1973)

$R_{p t}=\widehat{\gamma}_{0 t}+\widehat{\gamma}_{1 t} \widehat{\beta}_{p, t-1}+\widehat{\gamma}_{2 t} \widehat{\beta}_{p, t-1}^{2}+\widehat{\gamma}_{3 t} \bar{s}_{p, t-1}\left(\widehat{\epsilon}_{i}\right)+\widehat{\eta}_{p t}$,
$p=1,2, \ldots, 20 .$

In [21]:
R_p_advanced = data.groupby('portfolio').R_p.shift(-1)
# advancing R_p one time_span means lagging other variables one time_span
data['R_p'] = R_p_advanced
data = data.dropna()

In [22]:
data

Unnamed: 0_level_0,Unnamed: 1_level_0,R_p,β,β_squared,ε_p,ε_i_std,R_m,r_f,const
time,portfolio,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2019-02,0,0.125723,0.195633,0.038272,0.126680,0.089176,0.146093,0.00125,1
2019-02,1,0.082270,0.477769,0.228263,0.102116,0.066019,0.146093,0.00125,1
2019-02,2,0.074955,0.581532,0.338179,0.087495,0.080543,0.146093,0.00125,1
2019-02,3,0.069643,0.645488,0.416654,0.087987,0.075783,0.146093,0.00125,1
2019-02,4,0.068787,0.715000,0.511224,0.079047,0.062872,0.146093,0.00125,1
...,...,...,...,...,...,...,...,...,...
2021-12,15,-0.123984,1.110933,1.234171,0.023402,0.112066,0.022423,0.00125,1
2021-12,16,-0.101567,1.176856,1.384991,0.057329,0.169822,0.022423,0.00125,1
2021-12,17,-0.112727,1.272809,1.620043,0.049047,0.155949,0.022423,0.00125,1
2021-12,18,-0.104992,1.446125,2.091277,0.047225,0.110740,0.022423,0.00125,1


In [23]:
def fmreg(data, formula):
    result = sm.formula.ols(formula, data=data).fit().params[:]
    result.index = ['γ_0','γ_1','γ_2','γ_3']
    return result

params = data.groupby('time').apply(fmreg, 'R_p ~ β + β_squared + ε_p')
params.head()

Unnamed: 0_level_0,γ_0,γ_1,γ_2,γ_3
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-02,0.288287,-0.280493,0.167974,-1.032684
2019-03,-0.042432,0.013517,0.000113,-0.141174
2019-04,0.08218,-0.220565,0.060703,-0.227822
2019-05,-0.039406,0.056425,-0.017638,-0.223604
2019-06,-0.003664,-0.022087,0.009152,0.078605


Review our assumptions

$$\widetilde{R}_{i t}=\widetilde{\gamma}_{0 t}+\widetilde{\gamma}_{1 t} \beta_{i}+\widetilde{\gamma}_{2 t} \beta_{i}{ }^{2}+\widetilde{\gamma}_{3 t} ε_{i}+\widetilde{\eta}_{i t}\tag{7}$$


C1 (linearity) $\cdots E\left(\widetilde{\gamma}_{2 t}\right)=\mathbf{0}$

C2 (no systematic effects of non- $\beta$ risk) $\cdots E\left(\widetilde{\gamma}_{3 t}\right)=\mathbf{0}$

C3 (positive expected return-risk tradeoff) $$E\left(\widetilde{\gamma}_{1 t}\right)=E\left(\widetilde{R}_{m t}\right)-E\left(\widetilde{R}_{0 t}\right)>\mathbf{0}$$

Sharpe - Lintner (S-L) Hypothesis $\cdots E\left(\widetilde{\gamma}_{0,t}\right)=R_{ft}$

In [24]:
def get_table_3(start_time, end_time):
    full_time = params.index
    time_span = np.logical_and(full_time >= dt(start_time).to_period(
        'M'), full_time <= dt(end_time).to_period('M'))
    table = pd.DataFrame(columns=['γ_0', 'γ_1', 'γ_2', 'γ_3', 'γ_0-r_f', 's(γ_0)', 's(γ_1)', 's(γ_2)', 's(γ_3)',
                         't(γ_0)', 't(γ_1)', 't(γ_2)', 't(γ_3)', 't(γ_0-r_f)'], index=['0'])

    table['γ_0'] = params.loc[time_span].γ_0.mean()
    table['γ_1'] = params.loc[time_span].γ_1.mean()
    table['γ_2'] = params.loc[time_span].γ_2.mean()
    table['γ_3'] = params.loc[time_span].γ_3.mean()
    r_f_in_time_span = r_f.loc[np.intersect1d(r_f.index, full_time[time_span])]
    γ_0_minus_r_f = params.loc[time_span].γ_0 - r_f_in_time_span
    table['γ_0-r_f'] = γ_0_minus_r_f.mean()

    table['s(γ_0)'] = params.loc[time_span].γ_0.std()
    table['s(γ_1)'] = params.loc[time_span].γ_1.std()
    table['s(γ_2)'] = params.loc[time_span].γ_2.std()
    table['s(γ_3)'] = params.loc[time_span].γ_3.std()

    table['t(γ_0)'] = stats.ttest_ind(params.loc[time_span].γ_0, r_f_in_time_span.values)[0]
    table['t(γ_1)'] = stats.ttest_1samp(params.loc[time_span].γ_1, 0, alternative='greater')[0]
    table['t(γ_2)'] = stats.ttest_1samp(params.loc[time_span].γ_2, 0)[0]
    table['t(γ_3)'] = stats.ttest_1samp(params.loc[time_span].γ_3, 0)[0]
    table['t(γ_0-r_f)'] = stats.ttest_1samp(params.loc[time_span].γ_0 - r_f_in_time_span.values, 0)[0]
    
    return table

get_table_3('2019-2', '2022-2')


Unnamed: 0,γ_0,γ_1,γ_2,γ_3,γ_0-r_f,s(γ_0),s(γ_1),s(γ_2),s(γ_3),t(γ_0),t(γ_1),t(γ_2),t(γ_3),t(γ_0-r_f)
0,0.030207,-0.041984,0.020351,-0.049683,0.028957,0.07899,0.110017,0.046806,0.35566,2.168785,-2.257664,2.572257,-0.826427,2.168785


In [25]:
time_span_set = list(combinations(third_time_span_R_i.index[::12].append(pd.Index([dt('2022-02').to_period('M')])),2))

table = pd.DataFrame(columns=['γ_0', 'γ_1', 'γ_2', 'γ_3', 'γ_0-r_f', 's(γ_0)', 's(γ_1)', 's(γ_2)', 's(γ_3)',
                         't(γ_0)', 't(γ_1)', 't(γ_2)', 't(γ_3)', 't(γ_0-r_f)'], index=time_span_set)
                         
for rank, time_span in enumerate(time_span_set):
    result = get_table_3(str(time_span[0]), str(time_span[1]))
    table.iloc[rank] = result.values

table

Unnamed: 0,γ_0,γ_1,γ_2,γ_3,γ_0-r_f,s(γ_0),s(γ_1),s(γ_2),s(γ_3),t(γ_0),t(γ_1),t(γ_2),t(γ_3),t(γ_0-r_f)
"(2019-02, 2020-02)",0.050366,-0.081624,0.034198,-0.162286,0.049116,0.086969,0.118171,0.050788,0.495918,2.036233,-2.490433,2.427805,-1.179893,2.036233
"(2019-02, 2021-02)",0.02726,-0.042797,0.02466,-0.082055,0.02601,0.080944,0.113317,0.05187,0.403979,1.606639,-1.888401,2.377125,-1.015587,1.606639
"(2019-02, 2022-02)",0.030207,-0.041984,0.020351,-0.049683,0.028957,0.07899,0.110017,0.046806,0.35566,2.168785,-2.257664,2.572257,-0.826427,2.168785
"(2020-02, 2021-02)",0.011235,-0.018075,0.018459,-0.025848,0.009985,0.073417,0.110558,0.053085,0.279634,0.49039,-0.58947,1.253709,-0.333273,0.49039
"(2020-02, 2022-02)",0.022687,-0.027586,0.014598,-0.001023,0.021437,0.074691,0.107169,0.043981,0.238279,1.376473,-1.23449,1.591767,-0.020596,1.376473
"(2021-02, 2022-02)",0.0408,-0.045201,0.009967,0.068027,0.03955,0.074346,0.103115,0.028826,0.210027,1.764343,-1.453878,1.146748,1.074246,1.764343


C1 do not hold. No linear evidence found

C2 holds for the pandemic time and the post pandemic time. During the pandemic, $β$ is more likely to be a determining factor for asset returns, espically 2020-2022

C3 do not hold. More risky the asset is, generally speaking, more chance to face a loss. (during pandemic with normal distribution assumption, there stands a chance for risk-return trade off, but not significant)

S-L Hypothesis holds only for the pandamic time. During the pandemic, excess asset return is more likely to be identical to risk free rate

However, Fama (1965a) & Blume (1970) suggests that distributions of common stock returns are “thick-tailed” relative to the normal distribution.

If t-statistics are interpreted bad under the assumption of normality, we have more statistical power if use thick-tailed assumption.


## Table 4

**$ρ$ not included**

In [26]:
def fmreg_table_4(data, formula):
    result = sm.formula.ols(formula, data=data).fit().params[:]
    result.index = ['γ_0','γ_1']
    return result

params_table_4 = data.groupby('time').apply(fmreg_table_4, 'R_p ~ β')
params_table_4.head()

Unnamed: 0_level_0,γ_0,γ_1
time,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-02,0.048065,0.063002
2019-03,-0.050463,0.012001
2019-04,0.027166,-0.086083
2019-05,-0.028463,0.019756
2019-06,-0.015028,-0.003947


In [27]:
def get_table_4(start_time, end_time):
    full_time = R_m.index
    time_span = np.logical_and(full_time >= dt(start_time).to_period(
        'M'), full_time <= dt(end_time).to_period('M'))

    params_time = params_table_4.index
    params_time_span = np.logical_and(params_time >= dt(start_time).to_period(
        'M'), params_time <= dt(end_time).to_period('M'))

    table = pd.DataFrame(columns=['R_m', 'R_m-R_f', 'γ_1', 'γ_0', 'R_f', '(R_m-R_f)/s(R_m)', 'γ_1/s(R_m)', 's(R_m)', 's(γ_0)',
                         's(γ_1)', 's(R_f)', 't(R_m)', 't(R_m-R_f)', 't(γ_1)', 't(γ_0)'], index=['0'])

    table['R_m'] = R_m.loc[time_span].mean()
    table['R_m-R_f'] = (R_m.loc[time_span] - r_f.loc[time_span]).mean()
    table['γ_1'] = params_table_4.loc[params_time_span].γ_1.mean()
    table['γ_0'] = params_table_4.loc[params_time_span].γ_0.mean()
    table['R_f'] = r_f.loc[time_span].mean()
    table['(R_m-R_f)/s(R_m)'] = table['R_m-R_f']/R_m.loc[time_span].std()
    table['γ_1/s(R_m)'] = table['γ_1']/R_m.loc[time_span].std()
    table['s(R_m)'] = R_m.loc[time_span].std()
    table['s(γ_0)'] = params_table_4.loc[params_time_span].γ_0.std()
    table['s(γ_1)'] = params_table_4.loc[params_time_span].γ_1.std()
    table['s(R_f)'] = r_f.loc[time_span].std()
    table['t(R_m)'] = stats.ttest_1samp(R_m.loc[time_span], 0)[0]
    table['t(R_m-R_f)'] = stats.ttest_1samp(R_m.loc[time_span] - r_f.loc[time_span], 0)[0]
    table['t(γ_1)'] = stats.ttest_1samp(params_table_4.loc[params_time_span].γ_1, 0)[0]
    table['t(γ_0)'] = stats.ttest_1samp(params_table_4.loc[params_time_span].γ_0, 0)[0]
    
    return table

get_table_4('2019-2', '2022-2')


Unnamed: 0,R_m,R_m-R_f,γ_1,γ_0,R_f,(R_m-R_f)/s(R_m),γ_1/s(R_m),s(R_m),s(γ_0),s(γ_1),s(R_f),t(R_m),t(R_m-R_f),t(γ_1),t(γ_0)
0,0.010963,0.009713,0.000283,0.009637,0.00125,0.191095,0.00556,0.05083,0.053842,0.063187,2.1983149999999996e-19,1.311972,1.162387,0.026462,1.058853


In [28]:
time_span_set = list(combinations(third_time_span_R_i.index[::12].append(pd.Index([dt('2022-02').to_period('M')])),2))

table = pd.DataFrame(columns=['R_m', 'R_m-R_f', 'γ_1', 'γ_0', 'R_f', '(R_m-R_f)/s(R_m)', 'γ_1/s(R_m)', 's(R_m)', 's(γ_0)',
                         's(γ_1)', 's(R_f)', 't(R_m)', 't(R_m-R_f)', 't(γ_1)', 't(γ_0)'], index=time_span_set)
                         
for rank, time_span in enumerate(time_span_set):
    result = get_table_4(str(time_span[0]), str(time_span[1]))
    table.iloc[rank] = result.values

table

Unnamed: 0,R_m,R_m-R_f,γ_1,γ_0,R_f,(R_m-R_f)/s(R_m),γ_1/s(R_m),s(R_m),s(γ_0),s(γ_1),s(R_f),t(R_m),t(R_m-R_f),t(γ_1),t(γ_0)
"(2019-02, 2020-02)",0.017387,0.016137,-0.008712,0.010387,0.00125,0.298403,-0.1611,0.054078,0.032814,0.044407,0.0,1.159247,1.075906,-0.707362,1.141297
"(2019-02, 2021-02)",0.021949,0.020699,0.008854,0.000884,0.00125,0.391277,0.167371,0.0529,0.047832,0.065503,0.0,2.074531,1.956384,0.675847,0.092446
"(2019-02, 2022-02)",0.010963,0.009713,0.000283,0.009637,0.00125,0.191095,0.00556,0.05083,0.053842,0.063187,0.0,1.311972,1.162387,0.026462,1.058853
"(2020-02, 2021-02)",0.023595,0.022345,0.019914,-0.006073,0.00125,0.424823,0.378605,0.052599,0.058652,0.082089,0.0,1.617405,1.53172,0.874684,-0.373334
"(2020-02, 2022-02)",0.006547,0.005297,0.002062,0.010271,0.00125,0.108099,0.042084,0.048999,0.062603,0.072844,0.0,0.66805,0.540496,0.13576,0.786799
"(2021-02, 2022-02)",-0.011224,-0.012474,-0.026272,0.036667,0.00125,-0.335128,-0.70582,0.037221,0.063143,0.05407,0.0,-1.087235,-1.208319,-1.611493,1.925961


Some perspective on the behavior of the market during different periods and on the interpretation of the coefficients in the risk-return regressions can be obtained from the above table

e.g. negative market return in post pandemic year

The Fama (1973) result showed "Trade-off of average return for risk between common stocks and short-term bonds has been more consistently large through time than the trade-off of average return for risk among common stocks."

Here we can not say the same conclusion. The pandemic narrowed the gap between.