# Exercises of Week 4

*These exercises will not be graded. Solutions will be made available, but it is strongly advised that you try on your own first.*

*Make sure that any function you write has a docstring, and comments where appropriate.*

## Question 1

In this exercise you will download returns on 49 industry portfolios and create sorted portfolios from them to analyze if there is an anomaly or mispricing. The exercises will ask you to form $P=7$ portfolios sorted on momentum, and compute abnormal returns based on the Fama-Frech three factor model. 

Feel free to explore other options and see how they impact the results! In particular, you can consider the following choices:
* *Characteristic*: size, value, momentum, reversal, or ...
* *Number of sorts*: recommended to start with univariate sort.
* *Number of portfolios $P$*.
* *Value weighted or equal weighted portfolio returns*.
* *Asset pricing model*: consider for example CAPM or Fama-French three factor model (FF3)

**1.1.** Import the relevant packages:

In [1]:
import pandas as pd
import pandas_datareader.data as web
import pandas_datareader 
import numpy as np
import matplotlib.pyplot as plt
import time
%matplotlib inline

**1.2.** Download monthly value weighted returns on the 49 industry portfolios from [Kenneth French's website](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) for the period January 2010 to September 2021: 
download the data set `'49_industry_portfolios'` from source `'famafrench'` via `pandas-datareader`. The output of DataReader will be a `dict` object ([documentation](https://docs.python.org/3/library/stdtypes.html?highlight=dict#dict)). Use `dict['DESCR']` to see the data description. Index into the `dict` object to select the monthly value weighted returns and store it in a new DataFrame: use `dict[key]`, where `key` will be the index number as given in the data description.

*Note:* The column headers of the returns DataFrame are all strings of length 5. An additional space is added if the industry name has less than 5 letters, e.g. 'Food ' instead of 'Food'. The spaces can be removed using the following code: `R.columns = [x.strip(' ') for x in R.columns]`, assuming that your DataFrame is called `R`. The function `strip(' ')` removes any leading and trailing spaces. It is not necessary to do this for the exercise, but avoids unexpected behavior when trying to select a column.

In [2]:
# Check available datasets
# pandas_datareader.famafrench.get_available_datasets()

In [3]:
startdate = pd.datetime(2000, 1, 1)
enddate = pd.datetime(2021, 10, 30)  # Up to end of last month

df = pd.DataFrame() # Empty DataFrame
print("Finished downloading data for:")

dict1 = web.DataReader('49_industry_portfolios', 'famafrench', start=startdate, end=enddate)
dict1
# 2nd way
# pandas_datareader.famafrench.FamaFrenchReader(symbols, start=None, end=None, retry_count=3, pause=0.1, timeout=30, session=None, freq=None)

  startdate = pd.datetime(2000, 1, 1)
  enddate = pd.datetime(2021, 10, 30)  # Up to end of last month


Finished downloading data for:


{0:          Agric  Food   Soda   Beer   Smoke  Toys   Fun    Books  Hshld  Clths  \
 Date                                                                            
 2000-01  -4.50 -10.33  19.70  -2.28  -8.62 -13.81   3.76  -0.62  -6.51 -11.50   
 2000-02   8.16  -7.08  -8.28 -11.64  -4.01   0.42  -2.31  -0.61 -11.57 -12.73   
 2000-03   4.26  10.76  -0.29   0.13   5.11   7.69  10.70  13.12 -14.26  25.38   
 2000-04  -7.61  -4.08  -0.80   3.68   3.79   0.27   2.35  -7.54   4.29   4.30   
 2000-05  -2.47  18.06  -7.24  11.87  19.51   1.15   1.26  -6.34   2.79  -5.04   
 ...        ...    ...    ...    ...    ...    ...    ...    ...    ...    ...   
 2021-06  -2.35  -3.06  -1.45  -0.98   1.99  12.01   1.56  -0.23  -0.26   8.82   
 2021-07  -3.06  -2.29   4.53   1.09   0.96  -3.07  -7.45  -2.55   3.91   4.54   
 2021-08   2.62  -0.48  -0.09  -1.67   3.47  -9.69  10.28   0.47   0.03  -0.63   
 2021-09  -4.14  -2.21  -5.95  -2.97  -7.05  -7.82   2.17   1.27  -4.18  -9.11   
 2021-10   2.

In [4]:
dict1.keys()

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 'DESCR'])

In [5]:
dict1['DESCR']

'49 industry portfolios\n----------------------\n\nThis file was created by CMPT_IND_RETS using the 202110 CRSP database. It contains value- and equal-weighted returns for 49 industry portfolios. The portfolios are constructed at the end of June. The annual returns are from January to December. Missing data are indicated by -99.99 or -999. Copyright 2021 Kenneth R. French\n\n  0 : Average Value Weighted Returns -- Monthly (262 rows x 49 cols)\n  1 : Average Equal Weighted Returns -- Monthly (262 rows x 49 cols)\n  2 : Average Value Weighted Returns -- Annual (21 rows x 49 cols)\n  3 : Average Equal Weighted Returns -- Annual (21 rows x 49 cols)\n  4 : Number of Firms in Portfolios (262 rows x 49 cols)\n  5 : Average Firm Size (262 rows x 49 cols)\n  6 : Sum of BE / Sum of ME (22 rows x 49 cols)\n  7 : Value-Weighted Average of BE/ME (22 rows x 49 cols)'

**1.3.** We will need size information to compute value weighted portfolio returns. For that, we will calculate size per industry as the product of the average firm size in the industry times the number of firms in the industry. Both, average size and number of firms dataframes, are available in the same `dict` object as the returns. Also store size in a new DataFrame. 

In [6]:
size_per_industry = dict1[5]*dict1[4];

**1.4.** Before sorting and grouping the assets into portfolios, we need to compute or read the characteristic(s). Momentum is defined as the return from month $t-12$ to $t-2$.

In [7]:
r = np.log(1+ dict1[0]/100)*100

In [8]:
r.rolling(2).sum()

Unnamed: 0_level_0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01,,,,,,,,,,,...,,,,,,,,,,
2000-02,3.239749,-18.246520,9.338870,-14.681475,-13.106972,-14.442482,1.353937,-1.233798,-19.027461,-25.833106,...,-27.566260,-15.546785,9.060443,-19.686387,-20.494740,-14.574106,-21.198539,1.552227,-0.697829,-2.133457
2000-03,12.015902,2.876423,-8.933394,-12.245165,0.891107,7.827775,7.828267,11.716034,-27.680963,9.001552,...,-1.751428,6.608222,8.538590,9.426296,2.007324,2.019941,6.944946,5.471975,20.068119,-4.193720
2000-04,-3.743384,6.053984,-1.093638,3.743820,8.703668,7.678290,12.488178,4.488495,-11.184543,26.828012,...,3.149379,15.527470,4.159501,7.672694,18.573602,11.119570,20.194868,-0.553215,1.912893,6.817440
2000-05,-10.416160,12.436711,-8.318685,14.830634,21.542930,1.413074,3.574941,-14.389305,6.952318,-0.961326,...,-10.934495,-0.276706,-1.342775,-8.850078,-0.552345,4.758396,6.654363,-6.527969,-18.863785,18.776985
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-06,-8.693334,0.756567,0.509843,0.985625,3.989929,14.239408,0.684238,2.628481,0.029242,7.880865,...,-7.249836,-2.267316,-0.772450,0.801593,-2.030858,-0.249481,-0.569979,0.276944,6.066382,1.036007
2021-07,-5.485848,-5.424423,2.969777,0.099269,2.925880,8.223685,-6.194158,-2.813341,3.575157,12.892454,...,-5.086204,-9.408862,0.515354,2.868981,3.628709,-3.193194,-3.328503,3.176826,3.312402,-3.848731
2021-08,-0.521530,-2.797784,4.340352,-0.599999,4.366574,-13.310311,2.043125,-2.114177,3.865491,3.807966,...,8.064469,-5.384771,3.791708,2.153034,3.880332,1.464893,1.992115,7.088797,7.616730,2.085427
2021-09,-1.641873,-2.715942,-6.224404,-4.699099,-3.899694,-18.334899,11.932031,1.730902,-4.239880,-10.184013,...,2.548921,-4.151607,-3.584534,-1.417675,-0.661455,2.561614,-1.960727,1.821054,0.115893,-2.692519


In [9]:
# Average Value Weighted Returns
dict1_avwr = dict1[0]; dict1_avwr
r = np.log(1+dict1_avwr/100)*100;r

mom = r.shift(2); mom.head(12) # Shift 2는 데이터 들을 두 달 뒤로 옮기는 것/ 맨 뒤에꺼는 없어짐

Unnamed: 0_level_0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2000-01,,,,,,,,,,,...,,,,,,,,,,
2000-02,,,,,,,,,,,...,,,,,,,,,,
2000-03,-4.604394,-10.903392,17.981843,-2.306394,-9.014355,-14.861602,3.691035,-0.62193,-6.731571,-12.216763,...,-14.098769,-10.048354,7.315752,-15.677078,-7.677306,-2.583076,-7.547814,-0.692392,-7.042246,0.039992
2000-04,7.844143,-7.343128,-8.642973,-12.375081,-4.092617,0.41912,-2.337099,-0.611868,-12.295891,-13.616342,...,-13.46749,-5.498431,1.744691,-4.009309,-12.817434,-11.99103,-13.650724,2.244619,6.344417,-2.173449
2000-05,4.171759,10.219551,-0.290421,0.129916,4.983723,7.408654,10.165365,12.327902,-15.385072,22.617894,...,11.716062,12.106653,6.793898,13.435605,14.824758,14.010971,20.59567,3.227356,13.723702,-2.020271
2000-06,-7.915144,-4.165568,-0.803217,3.613905,3.719944,0.269636,2.322813,-7.839407,4.200529,4.210118,...,-8.566683,3.420817,-2.634398,-5.762911,3.748844,-2.891401,-0.400802,-3.780571,-11.810809,8.837711
2000-07,-2.501016,16.602278,-7.515467,11.21673,17.822986,1.143438,1.252128,-6.549898,2.751789,-5.171444,...,-2.367813,-3.697524,1.291623,-3.087166,-4.301189,7.649797,7.055166,-2.747398,-7.052977,9.939274
2000-08,-0.944446,2.459505,1.054421,5.87405,2.917038,-1.694272,0.28958,-0.030005,-2.562555,-7.010063,...,-3.23166,-1.969263,-2.091725,-2.16323,-3.832511,-10.402807,-2.490764,-1.065658,13.549197,2.273949
2000-09,-3.138747,-1.602776,12.398598,6.363186,-4.572985,-10.56939,-0.702461,-1.126319,-2.675474,5.808021,...,-0.020002,6.391333,3.9413,-2.11215,-1.979463,8.06579,9.830569,3.004412,7.973497,-1.653597
2000-10,-1.959065,-2.747398,-2.245013,-11.473768,17.49612,2.088048,5.760831,2.488772,3.440143,2.498525,...,1.567648,-3.635283,9.866817,-5.403389,-0.692392,9.721745,4.171759,0.069976,17.689017,12.795336


In [10]:
# mom = r.shift(2).rolling(2).sum() ; mom.head(20)

In [11]:
# mom = r.shift(2).rolling(11).sum() ; mom.head(20)

In [12]:
mom = r.shift(2).rolling(11).sum()[12:] ; mom.head(20)

Unnamed: 0_level_0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2001-01,-17.637308,18.036377,28.297054,19.081487,54.787961,-15.918909,-10.789728,-9.578373,-21.350889,4.053025,...,-56.105607,4.006354,20.134188,-24.709199,-1.936516,10.589405,27.921869,0.707385,-1.409818,12.593869
2001-02,-1.575124,34.879804,5.385682,18.78428,79.519783,1.265506,-9.945193,0.683606,-11.517913,30.982395,...,-24.98194,16.260207,24.978233,-4.812416,8.102676,22.676222,41.964152,5.023326,21.392614,11.295999
2001-03,-3.450966,37.19881,16.243936,26.292843,83.71235,10.322843,8.075185,4.076444,-3.272976,48.174055,...,-4.664508,27.264258,21.335645,8.682446,17.087599,36.598483,44.611066,7.189964,23.509882,9.543377
2001-04,-13.544662,28.063361,23.505618,19.709075,87.785796,7.821773,-7.07225,-8.481722,11.026222,13.113153,...,-22.048188,8.254747,9.748694,-11.164354,2.222833,17.920272,23.302864,5.756423,-6.819279,12.805899
2001-05,-2.99454,29.430125,10.094859,6.23461,83.955791,6.728756,-19.299778,-7.438085,-2.451647,7.695777,...,-17.01314,3.687382,6.62018,-7.45233,-6.665868,19.056357,22.65822,8.005324,-8.865735,-4.380843
2001-06,-2.289556,10.797371,19.894049,-7.20668,71.572153,5.455234,-5.615123,4.047956,-3.448918,19.502159,...,-7.870117,10.447525,13.181945,0.494803,4.251541,14.129158,18.306188,14.530458,10.267312,-1.709729
2001-07,6.55449,10.94362,18.639428,-8.478286,70.899735,16.471214,3.717183,7.014422,4.353905,30.106834,...,-0.351667,16.914125,14.520843,4.28473,12.18865,28.895369,23.129535,20.093452,0.264481,-2.830354
2001-08,13.13338,11.843935,0.784651,-19.907663,74.901089,29.62687,2.184858,8.828371,7.875786,21.818301,...,-1.124802,7.363405,8.467394,3.865099,10.314818,21.646235,15.37722,20.422846,-9.984714,-1.899362
2001-09,14.972373,18.195591,2.054929,-7.637079,47.764891,24.348468,-11.718673,5.868492,8.684105,19.908042,...,3.529885,10.818526,-1.079935,14.53722,14.823455,12.760982,9.052448,20.292853,-32.278125,-24.78729
2001-10,15.987514,15.592075,16.469806,-7.032048,51.453336,25.787199,-2.132876,1.681977,11.464431,17.624394,...,15.348547,7.700187,-9.765354,3.278045,14.4289,2.810935,-1.562599,21.822058,-32.288929,-29.397834


**1.5.** Now that we have a characteristic, we can form the sorted portfolios. Create $P=7$ portfolios sorted on your momentum. Compute the value weighted portfolio returns and store them in a new DataFrame `R_portf`.

*Note*: If you have portfolios sorted on momentum or reversal, these are resorted and grouped each month. If you have portfolios based on accouting numbers, these are resorted annually.

In [13]:
import numpy as np
P = 7
q = np.linspace(0, 1, P+1);q 
# array([0.        , 0.14285714, 0.28571429, 0.42857143, 0.57142857, 0.71428571, 0.85714286, 1.])
R_portf = pd.DataFrame(columns = range(1, P + 1), index = mom .index)

for t in mom.index:
    bm_t 


NameError: name 'bm_t' is not defined

**1.6.** Compute the average return over time for each portfolio. Is there a pattern?

**1.7.** Now that we have our portfolios, we can set up the factor model. The monthly factors of the Fama-French three factor model (FF3) can be obtained from [Kenneth French's website](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html). 

Download the data set `'F-F_Research_Data_Factors'` from source `'famafrench'` via `pandas-datareader`. The output of DataReader is again a `dict` object.

In [None]:
import pandas_datareader.data as web
factors = web.DataReader('F-F_Research_Data_Factors','famafrench', start =None, end= None)
factors

{0:          Mkt-RF   SMB    HML    RF
 Date                              
 2016-11    4.86  5.67   8.21  0.01
 2016-12    1.82  0.09   3.60  0.03
 2017-01    1.94 -1.13  -2.74  0.04
 2017-02    3.57 -2.04  -1.67  0.04
 2017-03    0.17  1.13  -3.33  0.03
 2017-04    1.09  0.72  -2.13  0.05
 2017-05    1.06 -2.52  -3.75  0.06
 2017-06    0.78  2.23   1.49  0.06
 2017-07    1.87 -1.46  -0.22  0.07
 2017-08    0.16 -1.65  -2.07  0.09
 2017-09    2.51  4.45   3.09  0.09
 2017-10    2.25 -1.93   0.22  0.09
 2017-11    3.12 -0.56  -0.05  0.08
 2017-12    1.06 -1.32   0.03  0.09
 2018-01    5.58 -3.18  -1.36  0.11
 2018-02   -3.65  0.26  -1.03  0.11
 2018-03   -2.35  4.06  -0.23  0.12
 2018-04    0.29  1.10   0.48  0.14
 2018-05    2.65  5.31  -3.13  0.14
 2018-06    0.48  1.13  -2.33  0.14
 2018-07    3.19 -2.25   0.45  0.16
 2018-08    3.44  1.14  -3.98  0.16
 2018-09    0.06 -2.30  -1.70  0.15
 2018-10   -7.68 -4.82   3.43  0.19
 2018-11    1.69 -0.68   0.26  0.18
 2018-12   -9.55 -2.42  -

**1.8.** Obtain the portfolios' excess returns by deducting the risk-free rate ('RF' in the DataFrame of the FF3 factors) from the portfolios returns.

In [None]:
factors['DESCR']

'F-F Research Data Factors\n-------------------------\n\nThis file was created by CMPT_ME_BEME_RETS using the 202109 CRSP database. The 1-month TBill return is from Ibbotson and Associates, Inc. Copyright 2021 Kenneth R. French\n\n  0 : (59 rows x 4 cols)\n  1 : Annual Factors: January-December (5 rows x 4 cols)'

In [None]:
factors[0]['Mkt-RF']

Date
2016-11     4.86
2016-12     1.82
2017-01     1.94
2017-02     3.57
2017-03     0.17
2017-04     1.09
2017-05     1.06
2017-06     0.78
2017-07     1.87
2017-08     0.16
2017-09     2.51
2017-10     2.25
2017-11     3.12
2017-12     1.06
2018-01     5.58
2018-02    -3.65
2018-03    -2.35
2018-04     0.29
2018-05     2.65
2018-06     0.48
2018-07     3.19
2018-08     3.44
2018-09     0.06
2018-10    -7.68
2018-11     1.69
2018-12    -9.55
2019-01     8.41
2019-02     3.40
2019-03     1.10
2019-04     3.96
2019-05    -6.94
2019-06     6.93
2019-07     1.19
2019-08    -2.58
2019-09     1.43
2019-10     2.06
2019-11     3.87
2019-12     2.77
2020-01    -0.11
2020-02    -8.13
2020-03   -13.38
2020-04    13.65
2020-05     5.58
2020-06     2.46
2020-07     5.77
2020-08     7.63
2020-09    -3.63
2020-10    -2.10
2020-11    12.47
2020-12     4.63
2021-01    -0.03
2021-02     2.78
2021-03     3.08
2021-04     4.93
2021-05     0.29
2021-06     2.75
2021-07     1.27
2021-08     2.90
2021-09  

**1.9.** Estimate the $\alpha_p$ and $\beta_p$ by using the following $P$ time series regressions:

$$
R_{pt}-R_{f,t} = \alpha_p + \beta_p' f_t + \varepsilon_{pt},
$$

where $\varepsilon_{pt} \sim N(0,\sigma^2_p)$, for $t = 1,\ldots,T$, $p = 1,\ldots,P$, where $f_t$ are your risk factors. In the case of the FF3 model, we have $f_t = (R_{mt}-R_{f,t},SMB_{t},HML_{t})'$.

Store the alpha estimates and $t$ statistics in a new DataFrame.

In [None]:
from sklearn.linear_model import LinearRegression

# Set up linear regression model
linear_model = LinearRegression()
X, y   = beta, factors[0]['Mkt-RF'] 
linear_model.fit(X,y) # Fitting the linear model with data beta(X) and meanret(y)


**1.10.** Inspect the alphas and $t$ statistics. Are there significant abnormal returns?

**1.11.** Make a scatter plot of the average return against the expected return (as implied by the asset pricing model).