# Empirical Asset Pricing - PS4

Maximilian Huber

## Task 1
Let me load the data into an array of DataFrames:

In [1]:
using CSV, DataFrames, Query, Plots, Optim; gr();

I load the data, drop rows with a missing value and split the sample in to the two managers.

In [2]:
data = (CSV.read("./Data/PS4data.csv", delim=',', nullable=true,
        types=[String, String, Int64, Int64, Int64, Float64, Float64, Int64, Float64, Float64, Float64, Float64, Int64, Float64, Float64, Float64, Float64]))
dropmissing!(data)

for col in names(data)
   data[col] = Missings.coalesce.(data[col], 0)
end

DFA = data[data[:mgrno] .== 23000, 4:end]
VAN = data[data[:mgrno] .== 90457, 4:end];

### Preliminaries for (a)
Let me try to understand the data set. Since $\sum_{n=0}^N w_i(n) = 1$, $\sum_{n=1}^N w_i(n) = 1 - w_i(0)$, and hence $w_i(0) \Big(1 + \frac{\sum_{n=1}^N w_i(n)}{w_i(0)}\Big) = 1$. Therefore, cash holdings (or whatever the outside asset is) are:

In [3]:
w_DFA0 = 1 / (1 + sum(DFA[:rweight]))

0.10284923721109153

In [4]:
w_VAN0 = 1 / (1 + sum(VAN[:rweight]))

0.1383314470907214

In [5]:
DFA[:weight] = DFA[:rweight] * w_DFA0
VAN[:weight] = VAN[:rweight] * w_VAN0;

Does one fund invest in some assets that the other fund does not? I.e. $\mathcal{N}_{VAN} = \mathcal{N}_{DFA}$?

In [6]:
[length(DFA[:permno]), length(VAN[:permno])] .- length(union(DFA[:permno], VAN[:permno]))

2-element Array{Int64,1}:
 -229
  -78

No, and neither is a superset of the other!

I assume that I can infere $\mathcal{N}_{i}$ just by looking at the current portfolio.

### (a)

In [7]:
active_share(W) = 1/2 * sum(abs.(W[:weight] .- exp.(W[:LNme]) ./ sum(exp.(W[:LNme]))))

active_share(DFA)

0.4535664834349477

In [8]:
active_share(VAN)

0.09184197495623975

DFA is much more active than Vanguard! The latter offers more products that track market-cap-weighted indices.

### (b) - (e)
I loosely follow the Bruce Hansen's [textbook](https://www.ssc.wisc.edu/~bhansen/econometrics/Econometrics.pdf) for the GMM estimation, except I alter some notation a bit. GMM solves:
$$\underset{\theta}{\min}Q_n(\theta)$$
optimizes $Q(\theta) = \frac{1}{2} \, g_n(\theta)' \, \mathcal{W} \, g_n(\theta)$ where $g_n(\theta) = \frac{1}{n}\sum_{t=1}^n g(w_t;\theta)$

In the just-identified IV-GMM case $g(w;\theta)=\Big(w_{lhs} - w_{reg}'\cdot\theta\Big)\cdot w_{iv}\in\mathbb{R}^k$, where $k$ is the number of regressors and also instrumental variables.

In my application the difference between regressors and instruments is just swapping out $LNme$ with $IVme$.

Since I have to do an non-linear GMM estimation in (f) I do not follow Hansen's simple derivations of the GMM in linear models, but set up the optimizition problem generically, using a barrier method optimizer with L-BFGS in the inner problem and auto-differentiation.

The main issue is the correct covariance estimation for the estimated $\hat\theta$.

$$\hat\Omega=\frac{1}{n}\sum_t g(w_t;\tilde\theta)\,g(w_t;\tilde\theta)'$$

No matter whether the constraint is binding or not, the efficient weighting matrix is $\mathcal{W} = \hat\Omega^{-1}$, see BH chapter 12.14 "Restricted GMM".

If $\hat \theta$ is unconstrained, then $\hat V_\theta = \Big(\hat G'\hat\Omega^{-1}\hat G\Big)^{-1}$, where $\hat G = \frac{1}{n}\sum_t \frac{\partial g}{\partial\theta'} (w_t;\hat \theta)$.

But $\hat \theta$ is fulfilling the restriction with equality, I need to correct for that: $\hat V_{\theta,constr} = \hat V_\theta - \hat V_\theta R \Big(R'\hat V_\theta R\Big)^{-1} R' \hat V_\theta$, where R is Jacobian of the restriction $r(\theta) = c$.

In [9]:
#operates on a single observation, w[1] = lhs, w[2:k] = regressors
function g(w, θ)
    k = ceil(Int64, length(w)/2)
    (w[1] - w[2:k]' * θ) * w[k+1:end]
end

function gn(w, θ)
    avg_g = zeros(length(θ))
    for t in 1:size(w, 1)
        avg_g += g(w[t, :], θ)
    end
    
    return avg_g/size(w, 1)
end

function gn_wrapper(W)
    N = size(W, 1)
    w = hcat(W[:LNrweight], 
        Matrix(W[[:LNme, :LNbe, :profit, :Gat, :divA_be, :beta]]), ones(N),
        Matrix(W[[:IVme, :LNbe, :profit, :Gat, :divA_be, :beta]]), ones(N))
    return θ -> gn(w, θ)
end

function Qn(θ, gn_wrapped)
    1/2 * (gn_wrapped(θ)' * gn_wrapped(θ))[1]
end

function Qn(θ, gn_wrapped, W)
    1/2 * (gn_wrapped(θ)' * W * gn_wrapped(θ))[1]
end

function Ωhat(W, θ)
    N = size(W, 1)
    w = hcat(W[:LNrweight], 
        
        Matrix(W[[:LNme, :LNbe, :profit, :Gat, :divA_be, :beta]]), ones(N),
        Matrix(W[[:IVme, :LNbe, :profit, :Gat, :divA_be, :beta]]), ones(N))
    
    result = zeros(Float64, length(θ), length(θ))
    for i in 1:N
        result .+= g(w[i,:], θ) * g(w[i,:], θ)'
    end
    return result/N
end

Ωhat (generic function with 1 method)

In [10]:
function eff_IV_GMM(W)
    #first stage
    initial_θ = 0.5 * ones(7)
    lower = -Inf * ones(length(initial_θ))
    upper = vcat(1, Inf * ones(length(initial_θ) - 1))
    
    gn_wrapped = gn_wrapper(W)
    obj = OnceDifferentiable(θ -> Qn(θ, gn_wrapped), initial_θ; autodiff = :forward)
    first_θ = Optim.minimizer(optimize(obj, initial_θ, lower, upper, Fminbox{LBFGS}()))

    #second stage
    obj = OnceDifferentiable(θ -> Qn(θ, gn_wrapped, Ωhat(W, first_θ)^(-1)), first_θ; autodiff = :forward)
    #obj.f(first_θ)
    second_θ = Optim.minimizer(optimize(obj, first_θ, lower, upper, 
        Fminbox{LBFGS}(), optimizer_o = Optim.Options(iterations = 100)))
    
    #asymptotic variance estimation
    G = ForwardDiff.jacobian(gn_wrapped, second_θ)
    Vhat = (G' * Ωhat(W, second_θ)^(-1) * G)^(-1)

    #correction in case of binding constraint
    if second_θ[1] ≈ 1
        R = vcat(1, zeros(length(initial_θ) - 1))
        Vhat = Vhat - Vhat * R * (R'*Vhat*R)^(-1) * R' * Vhat
    end
    
    #return point estimates and sample std errors
    return [second_θ, Vhat/size(W, 1)] 
end

eff_IV_GMM (generic function with 1 method)

In [11]:
table = DataFrame(); table[:coefficient] = [:LNme, :LNbe, :profit, :Gat, :divA_be, :beta, :constant]; 
result = eff_IV_GMM(VAN)
table[:VAN_θ] = result[1]; table[:VAN_θ_stderr] = sqrt.(diag(result[2]))
result = eff_IV_GMM(DFA)
table[:DFA_θ] = result[1]; table[:DFA_θ_stderr] = sqrt.(diag(result[2]))

table

Unnamed: 0,coefficient,VAN_θ,VAN_θ_stderr,DFA_θ,DFA_θ_stderr
1,LNme,1.0,0.0,0.459983,0.0672507
2,LNbe,0.171374,0.00898222,0.29349,0.0643391
3,profit,-0.231178,0.0567067,1.47977,0.152321
4,Gat,0.472081,0.0644015,-1.02469,0.197912
5,divA_be,10.3605,0.723394,-3.16116,0.783266
6,beta,0.192432,0.0237746,0.360018,0.0503243
7,constant,-16.7798,0.0645728,-12.5886,0.12422


<b>Vanguard</b>'s coefficient on market equity of $1.0$ implies that it is an index fund. The only caveat is that the unconstrained estimate is around $1.5$.  But other factors seem important too, in fact all factors are significantly different from zero. They buy stocks of firms, which invest, pay out dividends, but are not necessarily profitable. 

A price elasticity of one would be a coefficient of $-1$ on $LNme$, Vanguard has a negative price elasticity.

<b>DFA</b> has a much lower coefficient on market equity, it seems rather inelastic (it has a more elastic demand than Vanguard, though) and not much like an index fund. The coefficients on profit, investment, dividend, and beta are much higher in magnitude. They buy stocks of profitable firms, which do not invest, and do not pay high dividends (because the presumable buy back shares).

DFA does not seem like an index fund, but also not like a value (book-to-market equity) fund, because then the coefficients on market equity and book equity should be the same value with opposing signs (with the plus on the $LNbe$). 

### (f)

The moment condition is $\mathbb{E}\Big[\epsilon \bigm \lvert w_{iv}\Big] = 1$, following [Hansen and Singleton (1982)](https://www.jstor.org/stable/1911873?seq=1#page_scan_tab_contents) this implies $\mathbb{E}\Big[(\epsilon - 1) \otimes w_{iv}\Big] = 0$.

In [12]:
function g(w, θ)
    k = ceil(Int64, length(w)/2)
    
    (exp(w[1]) / exp(w[2:k]' * θ) - 1) * w[k+1:end]
end

g (generic function with 1 method)

In [13]:
table = DataFrame(); table[:coefficient] = [:LNme, :LNbe, :profit, :Gat, :divA_be, :beta, :constant]; 
result = eff_IV_GMM(VAN)
table[:VAN_θ] = result[1]; table[:VAN_θ_stderr] = sqrt.(diag(result[2]))
result = eff_IV_GMM(DFA)
table[:DFA_θ] = result[1]; table[:DFA_θ_stderr] = sqrt.(diag(result[2]))

table

Unnamed: 0,coefficient,VAN_θ,VAN_θ_stderr,DFA_θ,DFA_θ_stderr
1,LNme,1.0,0.0,0.42585,0.0492237
2,LNbe,0.181452,0.00658487,0.280827,0.043454
3,profit,-0.0781669,0.0346332,0.781778,0.112201
4,Gat,0.270234,0.0441748,-0.546936,0.121708
5,divA_be,5.33577,0.362254,-2.73919,0.495627
6,beta,0.128455,0.0155075,0.344674,0.032936
7,constant,-16.6013,0.0475195,-11.5816,0.10159


Yes, the estimates differ. Evidently <b>Vanguard</b>'s coefficient on profits, investment, dividends, and beta decreased in magnitude, book equity stayed almost constant. 

<b>DFA</b> has a similar picture of shrinking magnitudes, with two exceptions: book equity and beta.

## Task 2

## Abstract:

## Abstract: