# Fin 704: Econometric Theory and Applications
We will use Jupyter notebooks for some examples. This is one such notebook. As in any Python notebook, there are two types of cells, this one is a Markdown cell with text. The next rectangle is a code cell (as seen from square bracket to the left of the cell). Code cells can be executed by placing cursor in the cell and pressing Shift and Enter keys.

We start by first importing libraries that are useful for mathematical calculations and graphics. NumPy is a library with support for multi-dimensional arrays and matrices and other mathematical functions. Matplotlib is a plotting library.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

We next download a zip file and unzip it to get the required data file.

In [None]:
from urllib.request import urlretrieve
from zipfile import ZipFile
url = "https://www.ssc.wisc.edu/~bhansen/econometrics/Econometrics%20Data.zip"
filename = "Econometrics_Data.zip"
urlretrieve(url, filename)
ZipFile(filename).extract("cps09mar/cps09mar.txt")

We next import pandas library for data manipulation and analysis and use it to read the data set. We then use head function to see the first few rows of the data.

In [None]:
import pandas as pd
df = pd.read_csv("cps09mar/cps09mar.txt",sep = "\t", header=None)
df.columns = ["age", "female", "hisp", "education", "earnings", "hours", "week", "union", "uncov", "region", "race", "marital"]
df.head()

The shape function gives the number of observtions and the number of variables.

In [None]:
df.shape

The function describe can be used to see summary statistics for the variables in the data.

In [None]:
df.describe()

We consider the subsample of 19-80 year old males with 16 years of education and create a binary variable married.

In [None]:
df = df[(df.female == 0) & (df.age >= 19) & (df.age <= 80) & (df.education ==16)]
df['married'] = (df.marital == 1) | (df.marital == 2) | (df.marital == 3) | (df.marital == 4)
df.shape

## Logit with Maximum LikeLihood Estimation
The following loop updates  intercept a and slope b until log likelihood is maximized. The likelihood and first and second derivatives expressions are for the case of two parameters.

In [None]:
iter = 0
a = df['married'].mean()
#a = 0
b = 0
df['z'] = pd.eval('(a + b * df.age) * (2 * df.married - 1)')
LL =  pd.eval('log(1 /(1 + exp(-df.z)))').sum()
while True:
    print('Iteration ',iter,': a = ',a,', b = ',b,', Log Likelihood = ',LL)
    dLL_da =  pd.eval('(2 * df.married - 1) /(1 + exp(df.z))').sum()
    dLL_db =  pd.eval('(2 * df.married - 1) * df.age /(1 + exp(df.z))').sum()
    d2LL_da2 = pd.eval('- exp(df.z) /(1 + exp(df.z)) ** 2').sum()
    d2LL_db2 = pd.eval('- exp(df.z) * df.age ** 2 /(1 + exp(df.z)) ** 2').sum()
    d2LL_dadb = pd.eval('- exp(df.z) * df.age /(1 + exp(df.z)) ** 2').sum()
    newa = a - (d2LL_db2 * dLL_da - d2LL_dadb * dLL_db) / (d2LL_da2 * d2LL_db2 - d2LL_dadb ** 2)
    newb = b - (d2LL_da2 * dLL_db - d2LL_dadb * dLL_da) / (d2LL_da2 * d2LL_db2 - d2LL_dadb ** 2)
    df['z'] = pd.eval('(newa + newb * df.age) * (2 * df.married - 1)')
    newLL =  pd.eval('log(1 /(1 + exp(-df.z)))').sum()
    converged = (abs(newa -a) < 0.00001) & (abs(newb -b) < 0.00001) & (abs(newLL -LL) < 0.00001)
    a = newa
    b = newb
    LL = newLL
    iter = iter + 1
    if converged:
        break
df = df.drop('z', axis=1)