# Empirical IO PS 2
Maximilian Huber

This code is stored at: https://github.com/MaximilianJHuber/NYU/blob/master/EmpIO/PS2.ipynb.
The notation follows along the lines of Berry, Levinsohn, Pakes (1995): http://www.tcd.ie/Economics/staff/ppwalsh/papers/BLP.pdf

In [1]:
using DataFrames
using GLM
using Optim
using LaTeXStrings

data = readtable("data_ps2.txt", header=false, separator=',')
rename!(data, names(data), [:car, :year, :firm, :price, :quantity, :weight, :hp, :ac, :nest3, :nest4]);

A car can change characteristics over the years:

In [2]:
data[data[:car] .== 91, :]

Unnamed: 0,car,year,firm,price,quantity,weight,hp,ac,nest3,nest4
1,91,1990,19,5995,52409.0,1620,55,0,1,1
2,91,1991,19,6807,55056.0,1620,55,1,1,1
3,91,1992,19,8219,66046.0,1576,55,1,1,1


Therefore, one can argue that product and time dimensions collapse into one. But for the time being lets treat good $j$ to be a _car_, time $t$ to be a _year_ and characteristics $x$ to be _weight_, _horse power_ and _air conditioning_ and finally the price $p$ be _price_.

## Part A: Logit

This part follows BLP (1995) section 6.1.

### 1
Agent $i$ derives utility from good $j$ at time $t$ in the following manner:
$$u_{ijt} = \delta_{jt}^* + \epsilon_{ijt} \quad \text{and} \quad \delta_{jt}^* = x'_{jt}\beta_t - \alpha_t p_{jt}+ \xi_{jt} \quad \forall t \in T$$
where $\epsilon_{ijt}$ is an i.i.d. extrem value.
This logit model has market shares:
$$s_{jt} = \frac{e^{\delta_{jt}^*}}{1 + \sum_{k=1}^J e^{\delta_{kt}^*}}$$
Taking logs yields:
$$\log s_{jt} - \log s_{0t} = \delta_{jt}^* = x'_{jt}\beta_t - \alpha_t p_{jt} + \xi_{jt} $$
where $\xi_{jt}$ is the unobservable good and time specific utility.

This equation will be estimated year-by-year.

### 2 
If I had chosen instead to pool the years the derivation would not change, but panel logit relies on the time-independence of $\xi_{jt}$, which is very implausible.

### 3 
Now I estimate the model with GMM using the BLP instrument for $p$:

#### Data Preparation
Market shares are calculated with an assumed market size of 100 million:

In [3]:
data[:share] = data[:quantity] / 1e8;

Instruments for the price are constructed by looking at the good's competitors, as defined by the goods produced by other firms, avaiable in the same year. I average of the characteristics of those competing goods.

In [4]:
function competitor_mean_characteristic(good, char::Symbol)
    mean(data[(data[:firm] .!= good[:firm]) .* (data[:year] .== good[:year]), char]) #same year, different firm
end

competitor_mean_characteristic (generic function with 1 method)

In [5]:
data[:comp_weight] = [competitor_mean_characteristic(good, :weight) for good in eachrow(data)]
data[:comp_hp]     = [competitor_mean_characteristic(good, :hp) for good in eachrow(data)]
data[:comp_ac]     = [competitor_mean_characteristic(good, :ac) for good in eachrow(data)];

I normalize weight, horse power and price after creating the instruments:

In [6]:
data[:weight] = data[:weight] / mean(data[:weight])
data[:hp]     = data[:hp] / mean(data[:hp])
data[:price]  = data[:price] / mean(data[:price]);

And the left hand side of the model is:

In [7]:
data[:LHS] = zeros(size(data)[1])

for y in [1990, 1991, 1992]
    data[data[:year] .== y, :LHS] = 
        log(data[data[:year] .== y, :share]) - log(1 - sum(data[data[:year] .== y, :share]))
end

In [8]:
head(data)

Unnamed: 0,car,year,firm,price,quantity,weight,hp,ac,nest3,nest4,share,comp_weight,comp_hp,comp_ac,LHS
1,91,1990,19,0.2970233932704021,52409.0,0.5559956020578457,0.4081228050300215,0,1,1,0.00052409,2884.78125,131.59375,0.4270833333333333,-7.462877776838659
2,35,1990,7,0.4307953969117842,17122.0,0.7619198991163071,0.7420414636909483,0,2,2,0.00017122,2936.9590163934427,134.9262295081967,0.4508196721311475,-8.581591922822227
3,61,1990,16,0.4382271748918609,65590.0,0.8974867280131275,0.6900985612325818,0,1,1,0.0006559,2934.913043478261,136.40869565217392,0.5043478260869565,-7.238532863836876
4,52,1990,10,0.5360293731096715,49877.0,0.8662548762925941,0.6826781465956724,0,1,1,0.00049877,2938.9133858267714,135.03149606299212,0.4724409448818897,-7.51239613448586
5,26,1990,4,0.6837235741670641,35944.0,0.7488780269692712,0.8607680978814999,0,2,3,0.00035944,2935.769841269841,134.31746031746033,0.4682539682539682,-7.839993937374661
6,54,1990,11,0.8420204451426995,4640.0,0.9376419659395274,0.9498130735244136,1,2,2,4.64e-05,2924.186046511628,133.50387596899225,0.4496124031007752,-9.887241742904356


#### GMM Estimation
I estimate $\hat{\theta}$ using the following GMM procedure:
$$\max_\theta Q_n\big(\theta \big)$$
where $Q_n\big(\theta\big) = -\frac{1}{2}g_n\big(\theta\big)'\,\hat{W}\,g_n\big(\theta\big)$ and $g_n\big(\theta\big) = \frac{1}{n}\sum_{r=1}^R g\big(w_r; \theta\big)$. $g\big(w_r; \theta\big)$ is the residual $\xi_{jt}$ from the model above calculated using $w_r$, a row of data.

The 2SLS estimator is a GMM estimator for the linear model and is even efficient if we have conditional homoskedasticity, see Hansen chapter 11.

In [9]:
table3 = DataFrame()
table3[:year] = [1990, 1991, 1992]

table3 = hcat(table3, convert(DataFrame, hcat(
    [begin
        X = convert(Array{Float64}, data[data[:year] .== year, [:weight, :hp, :ac, :price]])
        Z = convert(Array{Float64}, data[data[:year] .== year, [:weight, :hp, :ac, :comp_weight, :comp_hp, :comp_ac]])
        y = convert(Array{Float64}, data[data[:year] .== year, [:LHS]])
        ((X'Z * (Z'Z)^(-1) * Z'X) \ (X'Z * (Z'Z)^(-1) * Z'y))[:,1]
    end for year in [1990, 1991, 1992]]...)')
)

rename!(table3, names(table3)[2:end], [:βw, :βhp, :βac, :α])
table3[:α] = - table3[:α]

table3

Unnamed: 0,year,βw,βhp,βac,α
1,1990,-0.0840589642371107,-21.290509729697305,-4.129767308892385,-16.714196999654447
2,1991,-1.4700778232438163,-14.870662851337608,-1.2811553140174523,-9.323690458487931
3,1992,-1.6934071890953748,-13.299647960191333,-1.1806367646892164,-7.481351947986721
