In [38]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import warnings
import statsmodels.api as sm

#import the data
df_f=pd.read_excel('/Users/luxueqi/Desktop/factor_pricing_data.xlsx',sheet_name='factors (excess returns)',index_col=0)
df_p=pd.read_excel('/Users/luxueqi/Desktop/factor_pricing_data.xlsx',sheet_name='portfolios (excess returns)',index_col=0)
df_rf=pd.read_excel('/Users/luxueqi/Desktop/factor_pricing_data.xlsx',sheet_name='risk-free rate',index_col=0)


## Part 3

### Question 1

In [39]:
#equity market index
mkt = df_f[["MKT"]]
n=12
#but here we're gonna to display all of them
names=df_p.columns.tolist()

#CAPM time-series regression
#from TA session:
def reg(name):
    
    y = df_p[name]
    model = sm.OLS(y, sm.add_constant(mkt)).fit()

    summary = dict()
    summary["alpha"] = model.params["const"] * 12
    summary["beta"] = model.params["MKT"]
    summary["information ratio"] = (model.params.iloc[0] / model.resid.std()) * np.sqrt(12)
    summary["treynor ratio"] = (y.mean() / summary["beta"]) * 12
    
    return(pd.DataFrame(summary, index=[name]))

#running the separate regressions:
df=reg(names[0])
for i in range(48):
    df1=reg(names[i+1])
    df =pd.concat([df, df1], axis=0)

### Question 2

In [40]:
# report the data:
display(df)

Unnamed: 0,alpha,beta,information ratio,treynor ratio
Agric,0.020689,0.798755,0.115872,0.110571
Food,0.04677,0.583783,0.390816,0.164785
Soda,0.046911,0.705939,0.246519,0.151122
Beer,0.061657,0.627174,0.445491,0.18298
Smoke,0.077661,0.618298,0.386822,0.210275
Toys,-0.036077,1.120257,-0.204076,0.052466
Fun,0.006761,1.303659,0.040342,0.089856
Books,-0.015862,1.061521,-0.130391,0.069727
Hshld,0.022189,0.687313,0.203497,0.116954
Clths,0.004032,1.072981,0.027323,0.088428


### Question 3

If CAPM is true, then we know that any mean value of the excess return of an asset is proportional to the mean excess return of the market, where the ratio is beta. And by the definition of the Treynor Ratio, it should be equal to the mean excess return of the market(unchanging in these regressions). Thus, Treynor Ratios should be the same for all assets. 

As for alpha, it represents the return above what is predicted by the CAPM, which will vanish when the CAPM holds. Mathematically, we get alphas by finding the difference between the mean excess return of an asset and the product of beta and the expectation of the market's excess return. Hence, we'll finally have 0s.

Information ratios are also 0s. Given that alphas are 0s when CAPM holds and information ratios equal to alphas over tracking errors, information ratios should also be zeros. 

However, looking at what we got in question 2, none satisfies the above.

### Question 4

In [41]:
# mean-absolute-error of the estimated alphas
df['absolute alpha'] = df["alpha"].abs()
MAE=df['absolute alpha'].mean()
print(MAE)

0.020171975258631777


According to question 3, if CAPM is true, alphas are zeros(or close to zeros), so MAE should be small. Our MAE is 0.0202, not significantly small enough to support the pricing model. And it might imply the presence of other risk factors. 

## Part 4

### Question 1

In [49]:
#X now includes multiple factors
X=df_f[["MKT","SMB","HML","UMD"]]

def multi_reg(name):
    
    y = df_p[name]
    model = sm.OLS(y, sm.add_constant(X)).fit()

    summary = dict()
    
    #report estimated α and r-squared
    summary["Alpha"] = model.params["const"] * 12
    summary["R-Squared"] = model.rsquared
    
    return(pd.DataFrame(summary, index=[name]))

df_m=multi_reg(names[0])
for i in range(48):
    df2=multi_reg(names[i+1])
    df_m =pd.concat([df_m, df2], axis=0)
display(df_m)

Unnamed: 0,Alpha,R-Squared
Agric,0.00911,0.359935
Food,0.031202,0.425322
Soda,0.040552,0.282455
Beer,0.046922,0.377426
Smoke,0.062947,0.236922
Toys,-0.023239,0.536379
Fun,0.026688,0.627993
Books,-0.020611,0.692769
Hshld,0.015642,0.516465
Clths,0.013322,0.595241


### Question 2

In [50]:
# mean-absolute-error of the estimated alphas
df_m['absolute alpha'] = df_m["Alpha"].abs()
MAE_multi=df_m['absolute alpha'].mean()
print(MAE_multi)

0.021471995808584488


Similarly, alphas should be 0 or close to 0 if this pricing model works. The value we got here is 0.0215, so the multi-factors model does not fully capture the returns. Moreover, compared with the CAPM, the MAE of this model is greater, indicating that some factors(or at least one of them) included in this multi-factors model make the pricing model even worse.