# Fin 704: Econometric Theory and Applications
We will use Jupyter notebooks for some examples. This is one such notebook. As in any Python notebook, there are two types of cells, this one is a Markdown cell with text. The next rectangle is a code cell (as seen from square bracket to the left of the cell). Code cells can be executed by placing cursor in the cell and pressing Shift and Enter keys.

We start by first importing libraries that are useful for mathematical calculations and graphics. NumPy is a library with support for multi-dimensional arrays and matrices and other mathematical functions. Matplotlib is a plotting library.

In [40]:
import numpy as np
import matplotlib.pyplot as plt

We next download a zip file and unzip it to get the required data file.

In [41]:
from urllib.request import urlretrieve
from zipfile import ZipFile
url = "https://www.ssc.wisc.edu/~bhansen/econometrics/Econometrics%20Data.zip"
filename = "Econometrics_Data.zip"
urlretrieve(url, filename)
ZipFile(filename).extract("cps09mar/cps09mar.txt")

'/mnt/c/Users/Anand/Documents/Stevens/Teaching/Fin 704/Code/cps09mar/cps09mar.txt'

We next import pandas library for data manipulation and analysis and use it to read the data set. We then use head function to see the first few rows of the data.

In [42]:
import pandas as pd
df = pd.read_csv("cps09mar/cps09mar.txt",sep = "\t", header=None)
df.columns = ["age", "female", "hisp", "education", "earnings", "hours", "week", "union", "uncov", "region", "race", "marital"]
df.head()

Unnamed: 0,age,female,hisp,education,earnings,hours,week,union,uncov,region,race,marital
0,52,0,0,12,146000,45,52,0,0,1,1,1
1,38,0,0,18,50000,45,52,0,0,1,1,1
2,38,0,0,14,32000,40,51,0,0,1,1,1
3,41,1,0,13,47000,40,52,0,0,1,1,1
4,42,0,0,13,161525,50,52,1,0,1,1,1


The shape function gives the number of observtions and the number of variables.

In [43]:
df.shape

(50742, 12)

The function describe can be used to see summary statistics for the variables in the data.

In [44]:
df.describe()

Unnamed: 0,age,female,hisp,education,earnings,hours,week,union,uncov,region,race,marital
count,50742.0,50742.0,50742.0,50742.0,50742.0,50742.0,50742.0,50742.0,50742.0,50742.0,50742.0,50742.0
mean,42.131725,0.425722,0.148792,13.924619,55091.530685,43.827244,51.879272,0.021521,0.002207,2.635627,1.433507,2.763174
std,11.48762,0.494457,0.355887,2.744447,52222.071166,7.704467,0.598646,0.145113,0.04693,1.060051,1.31743,2.503158
min,15.0,0.0,0.0,0.0,1.0,36.0,48.0,0.0,0.0,1.0,1.0,1.0
25%,33.0,0.0,0.0,12.0,28000.0,40.0,52.0,0.0,0.0,2.0,1.0,1.0
50%,42.0,0.0,0.0,13.0,42000.0,40.0,52.0,0.0,0.0,3.0,1.0,1.0
75%,51.0,1.0,0.0,16.0,65000.0,45.0,52.0,0.0,0.0,4.0,1.0,5.0
max,85.0,1.0,1.0,20.0,561087.0,99.0,52.0,1.0,1.0,4.0,21.0,7.0


We consider the subsample of 19-80 year old males with 16 years of education and create a binary variable married.

In [45]:
df = df[(df.female == 0) & (df.age >= 19) & (df.age <= 80) & (df.education ==16)]
df['married'] = (df.marital == 1) | (df.marital == 2) | (df.marital == 3) | (df.marital == 4)
df.shape

(6441, 13)

## Logit with Maximum LikeLihood Estimation
The following loop updates  intercept a and slope b until log likelihood is maximized. The likelihood and first and second derivatives expressions are for the case of two parameters.

In [46]:
iter = 0
a = df['married'].mean()
#a = 0
b = 0
df['z'] = pd.eval('(a + b * df.age) * (2 * df.married - 1)')
LL =  pd.eval('log(1 /(1 + exp(-df.z)))').sum()
while True:
    print('Iteration ',iter,': a = ',a,', b = ',b,', Log Likelihood = ',LL)
    dLL_da =  pd.eval('(2 * df.married - 1) /(1 + exp(df.z))').sum()
    dLL_db =  pd.eval('(2 * df.married - 1) * df.age /(1 + exp(df.z))').sum()
    d2LL_da2 = pd.eval('- exp(df.z) /(1 + exp(df.z)) ** 2').sum()
    d2LL_db2 = pd.eval('- exp(df.z) * df.age ** 2 /(1 + exp(df.z)) ** 2').sum()
    d2LL_dadb = pd.eval('- exp(df.z) * df.age /(1 + exp(df.z)) ** 2').sum()
    newa = a - (d2LL_db2 * dLL_da - d2LL_dadb * dLL_db) / (d2LL_da2 * d2LL_db2 - d2LL_dadb ** 2)
    newb = b - (d2LL_da2 * dLL_db - d2LL_dadb * dLL_da) / (d2LL_da2 * d2LL_db2 - d2LL_dadb ** 2)
    df['z'] = pd.eval('(newa + newb * df.age) * (2 * df.married - 1)')
    newLL =  pd.eval('log(1 /(1 + exp(-df.z)))').sum()
    converged = (abs(newa -a) < 0.00001) & (abs(newb -b) < 0.00001) & (abs(newLL -LL) < 0.00001)
    a = newa
    b = newb
    LL = newLL
    iter = iter + 1
    if converged:
        break
df = df.drop('z', axis=1)

Iteration  0 : a =  0.759819903741655 , b =  0 , Log Likelihood =  -3647.0531829162856
Iteration  1 : a =  -0.8105948053138902 , b =  0.04558019729979336 , Log Likelihood =  -3354.2026116182747
Iteration  2 : a =  -1.2580144914461049 , b =  0.05904278567450624 , Log Likelihood =  -3337.731477769431
Iteration  3 : a =  -1.3030094665018335 , b =  0.06041337682892486 , Log Likelihood =  -3337.5912701515545
Iteration  4 : a =  -1.3034425884927716 , b =  0.06042666673485541 , Log Likelihood =  -3337.5912570116316
