# Empirical Asset Pricing - PS3

Maximilian Huber

## Task 1
Let me load the data into an array of DataFrames:

In [61]:
using CSV, DataFrames, Distributions, Plots, GLM; gr();

In [65]:
files = ["Global", "Europe", "Japan", "Asia_Pacific_ex_Japan", "North_America"]
abbrev = ['G', 'E', 'J', 'A', 'N']

data = [CSV.read("./Data/" * file * "_5_Factors.csv", delim=',', 
        types=[Date, Float64, Float64, Float64, Float64, Float64, Float64], 
        dateformat = DateFormat("yyyymm"), nullable=false) for file in files];

In [267]:
factor_names = [:MKT, :SMB, :HML, :RMW, :CMA, :RF]
data = DataFrame()
input = DataFrame()

for (i, file) in enumerate(files)
    
    input = CSV.read("./Data/" * file * "_5_Factors.csv", delim=',', 
        types=[Date, Float64, Float64, Float64, Float64, Float64, Float64], 
        dateformat = DateFormat("yyyymm"), nullable=false)
    
    names!(input, vcat(:Date, Symbol.(abbrev[i] .* '_' .* string.(factor_names))))
    
    data = hcat(data, input[:, 2:end])
end

data = hcat(input[[:Date]], data)
head(data)

Unnamed: 0,Date,G_MKT,G_SMB,G_HML,G_RMW,G_CMA,G_RF,E_MKT,E_SMB,E_HML,E_RMW,E_CMA,E_RF,J_MKT,J_SMB,J_HML,J_RMW,J_CMA,J_RF,A_MKT,A_SMB,A_HML,A_RMW,A_CMA,A_RF,N_MKT,N_SMB,N_HML,N_RMW,N_CMA,N_RF
1,1990-07-01,0.86,0.82,-0.25,0.17,1.56,0.68,4.52,0.41,-1.43,0.22,1.21,0.68,0.1,6.32,3.69,1.06,0.24,0.68,4.2,-2.93,-1.36,1.42,0.9,0.68,-1.51,-2.5,-0.9,0.53,2.59,0.68
2,1990-08-01,-10.82,-1.57,0.6,-0.22,0.99,0.66,-11.03,0.02,0.25,-1.06,1.46,0.66,-11.88,-5.0,0.26,1.28,-0.96,0.66,-8.68,3.76,1.71,1.12,0.67,0.66,-9.63,-2.56,0.49,-2.01,3.28,0.66
3,1990-09-01,-11.97,1.16,0.8,0.03,2.12,0.6,-12.28,1.71,0.84,-0.28,1.72,0.6,-17.38,0.67,-0.11,-1.29,-0.11,0.6,-8.8,3.7,-0.14,0.87,4.17,0.6,-6.02,-2.73,-0.13,1.28,4.23,0.6
4,1990-10-01,9.56,-7.58,-4.24,2.6,1.22,0.68,6.49,-2.61,-0.67,1.06,-0.79,0.68,24.9,0.8,-3.87,0.39,4.75,0.68,-1.95,-4.76,-1.52,0.17,-2.78,0.68,-2.0,-4.62,-1.67,4.18,0.87,0.68
5,1990-11-01,-3.86,1.37,1.14,1.47,-2.35,0.57,-0.43,-2.74,0.87,0.13,-0.47,0.57,-14.12,-5.34,-0.18,3.05,-2.18,0.57,-2.98,-1.59,-0.82,3.04,0.56,0.57,5.9,0.01,-1.42,0.26,-4.57,0.57
6,1990-12-01,1.1,-0.95,-1.6,1.17,-0.33,0.6,-1.55,0.93,0.0,0.92,0.27,0.6,1.93,-6.16,-3.65,0.85,1.87,0.6,-1.1,-2.77,-1.25,-0.19,-1.49,0.6,2.54,1.58,-0.92,1.24,-2.77,0.6


### (a)
The factors from Kenneth French's [website](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) are not excess returns, as described [here](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_factors.html). So I deduct the region interest rate from the four factors in each region. This mimics a strategy where exchange rate risk is hedged.

In [268]:
data_excess = copy(data)

for (i, region) in enumerate(abbrev)
    for factor in factor_names[2:end-1]
        data_excess[Symbol(region * '_' * string(factor))] -= data_excess[Symbol(region * '_' * string(:RF))]
    end
end

#### (i)
Now I regress the returns of the 20 test assets on the five factor model:
$$R_t^{ei}=\alpha_i + \beta'_i f_t + \epsilon_t^i$$

In [296]:
test_assets = Symbol.([region * '_' * string(factor) for factor in factor_names[1:end-1], region in abbrev[2:end]][:])

function run_regression(field)
    formula = @formula($(field) ~ G_MKT + G_SMB + G_HML + G_RMW + G_CMA)
    model = lm(formula, data_excess)
    return DataFrame(hcat(field, coef(model)[1], stderr(model)[1], coef(model)[1]/stderr(model)[1], model))
end

table = vcat(run_regression.(test_assets)...)
names!(table, [:test_asset, :α, :stderr_α, :t_α, :model])
table[[:test_asset, :α, :stderr_α, :t_α]]

Unnamed: 0,test_asset,α,stderr_α,t_α
1,E_MKT,-0.0527406,0.103399,-0.51007
2,E_SMB,-0.0542852,0.0856373,-0.633897
3,E_HML,-0.0074165,0.0732932,-0.10119
4,E_RMW,0.135247,0.0619364,2.18364
5,E_CMA,-0.0326146,0.0624436,-0.522306
6,J_MKT,-0.338468,0.226448,-1.49468
7,J_SMB,-0.0342311,0.155456,-0.220198
8,J_HML,0.115394,0.127051,0.908249
9,J_RMW,-0.172486,0.104072,-1.65737
10,J_CMA,-0.107845,0.11338,-0.951178


The question is whether a model of global risk factors is sufficient, or there are regional differences. If the former were true, there should be no significant alphas. But there are quite a few. For example, the European profitability factor has a highly positive alpha, indicating that there is an unexplained component of risk.
#### (ii)

In [297]:
gl_factor_names = [:G_MKT, :G_SMB, :G_HML, :G_RMW, :G_CMA]

T = size(data_excess, 1)
N = 20
K = 5
Ωhat = cov(Matrix(data_excess[gl_factor_names])) * (T-1) / T
Σhat = cov(hcat([residuals(table[:model][i]) for (i, asset) in enumerate(test_assets)]...))

F = ((T - N - K)/N * 
    (1 + mean(Matrix(data_excess[gl_factor_names]), 1) * Ωhat^-1 * mean(Matrix(data_excess[gl_factor_names]), 1)')^(-1) *
    table[:α]' * Σhat^-1 * table[:α])[1]

2.6894189576509064

In [302]:
Σhat

20×20 Array{Float64,2}:
  3.35236    -0.275107     0.266992   …  -0.0741044   -0.466731    0.526357 
 -0.275107    2.29956     -0.0568875      0.351297    -0.0354403   0.0906464
  0.266992   -0.0568875    1.6844        -0.836264     0.391716   -0.299323 
 -0.279696   -0.267391    -0.744031       0.268297    -0.45946     0.0817957
 -0.221894   -0.220436     0.524989      -0.515375     0.165598   -0.702777 
 -3.07408     0.759712    -0.202249   …  -0.358532     1.18144    -0.723059 
 -0.827339   -0.15525      0.342935      -0.510536     0.556562   -0.645266 
 -0.197058    0.0870831   -0.534456      -1.22348      0.157795   -0.798429 
  0.732467   -0.0495245    0.183722       0.314644    -0.976024    0.724148 
 -0.428955   -0.00696507  -0.244895      -0.614252     0.617787   -1.31399  
  0.142322   -0.758483    -0.295289   …  -0.302268     0.388499    0.28283  
 -0.267575    0.941168    -0.244362       0.32537      0.0404561   0.0748535
 -0.871485   -0.408756    -0.34289       -0.623701  

Its p-value is:

In [298]:
1-cdf(FDist(N, T-N-1), F)

0.00015880338622564771

This is a very strong rejection of the proposed global risk factor model.
### (b)
I run the cross-sectional regression by GLS:
$$E_T(R^{ei}) = \alpha_i + \lambda' \beta_i + \epsilon_i$$
Let me retrieve the betas and calculate the left hand side:

In [299]:
data_cross = DataFrame(vcat(mean(Matrix(data_excess[test_assets]), 1), hcat([coef(table[:model][i])[2:end] for (i, asset) in enumerate(test_assets)]...))')
names!(data_cross, vcat(:avg_ret, Symbol.("β_" .* string.(gl_factor_names))))

head(data_cross)

Unnamed: 0,avg_ret,β_G_MKT,β_G_SMB,β_G_HML,β_G_RMW,β_G_CMA
1,0.513455,1.08426,0.0924975,0.256601,0.250233,-0.188875
2,-0.145,-0.0384141,0.803325,-0.0279854,0.0789456,0.0918722
3,0.117636,0.109139,-0.0336364,0.925307,-0.114759,-0.0739904
4,0.173636,0.000485881,0.157154,-0.325776,0.669801,0.0974019
5,-0.00966667,-0.00572113,0.0301252,0.0845757,0.0686101,0.681279
6,0.0978788,1.04622,0.352398,-0.546137,-0.0176344,0.914165


In [300]:
β = Matrix(data_cross[:, 2:end])

λhat = (β'*Σhat^-1*β)^-1 * β'*Σhat^-1*data_cross[:avg_ret]
αhat = data_cross[:avg_ret] .- β * λhat
σsq_λ = 1/T * (β'*Σhat^-1*β)^-1
cov_α = 1/T * (Σhat - β*(β'*Σhat^-1*β)^(-1)*β')

table = DataFrame(hcat(test_assets, αhat, diag(cov_α), αhat ./ diag(cov_α)))
names!(table, [:test_asset, :α, :stderror_α, :t_α])

Unnamed: 0,test_asset,α,stderror_α,t_α
1,E_MKT,-0.084158,0.00957295,-8.79122
2,E_SMB,-0.0412104,0.0061533,-6.69729
3,E_HML,0.00865785,0.00485403,1.78364
4,E_RMW,0.153296,0.00344613,44.4834
5,E_CMA,-0.0152562,0.0035571,-4.28895
6,J_MKT,-0.37898,0.0474877,-7.98058
7,J_SMB,-0.0156652,0.0222042,-0.705505
8,J_HML,0.137443,0.0151016,9.1012
9,J_RMW,-0.157819,0.0100869,-15.646
10,J_CMA,-0.108423,0.0119172,-9.09804


The GLS regression residuals show some massive alphas. And the joint test without Shanken correction rejects clearly:

In [301]:
J = T * α'*Σhat^-1*α
1-cdf(Chisq(N - 1), J)

3.0877074230772905e-9

The Shanken corrected version rejects too:

In [309]:
Σfhat = cov(Matrix(data_excess[[:G_MKT, :G_SMB, :G_HML, :G_RMW, :G_CMA]]))
J = T * (1 + λhat'*Σfhat^-1*λhat) * αhat'*Σhat^-1*αhat
1-cdf(Chisq(N - K), J)

8.110014933093712e-5

The Shanken correction alleviates the earlier raised issue of generated regressors. The rejection is less stark, because the uncertainty about the regressors widens the confidence intervals.