### Problem Set 3:
   Tom Curran
   
   MAC30100 Winter 2018
   
   January 24, 2018
#### Question 2

Linear regression and MLE (4 points). You can do maximum likelihood estimation as a way to estimate parameters in regression analysis. Assume the following linear regression model for determining what effects the number of weeks that an individual i is sick during the year (sicki).

$$
sick_{i} = \beta_{0} + \beta_{1}age_i + \beta_2children_i + \beta_3tempwinter_i + \epsilon_i \\ where \  \epsilon \sim N(0, \sigma^2)
$$

The parameters (β0, β1, β2, β3, σ2) are the parameters of the model that we want to estimate. The variable agei gives the age of individual i at the end of 2016 (including fractions of a year). The variable childreni states how many chil- dren individual i had at the end of 2016. And the variable temp winteri is the average temperature during the months of January, February, and Decem- ber 2016 for individual i. The data for this model are in the file sick.txt, which contains comma-separated values of 200 individuals for four variables (sicki, agei, childreni, temp winteri) with variable labels in the first row.
***

Estimate the parameters of the model (β0, β1, β2, β3) by GMM by solving the minimization problem of the GMM criterion function. Use the identity matrix as the estimator for the optimal weighting matrix. Treat each of the 200 values of the variable sicki as your data moments m(xi) (200 data moments). Treat the predicted or expected sick values from your model as your model moments (200 model moments),

$$
m(x_{i} | \beta_{0} \beta_{1} \beta_{2} \beta_{3}) = \beta_{0} + \beta_{1}age_i + \beta_2children_i + \beta_3tempwinter_i 
$$


where xi is short hand for the data. Let the error function of the moments be the simple difference (not percent difference) of the data moments from

$$
e(x_{i} | \beta_{0} \beta_{1} \beta_{2} \beta_{3}) = \beta_{0} + \beta_{1}age_i + \beta_2children_i + \beta_3tempwinter_i = \epsilon_{i}
$$

Use these error functions in your criterion function to estimate the modelparameters (β0, β1, β2, β3) by GMM. This is a more general version of what OLS does. It minimizes the distance between the model moments and the data moments. It minimizes the sum of squared error terms. Report yourestimates and report the value of your GMM criterion function. In this case, the GMM criterion function value evaluated at the optimal parameter values is simply the sum of squared errors.

In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt
import scipy.stats as sts
import statsmodels.formula.api as smf
%matplotlib inline

sickdf = pd.read_csv("sick.txt")
sickdf.info()
sickdf.describe()
sick_ols = smf.ols('sick ~age + children + avgtemp_winter', data = sickdf).fit().summary()
sick_ols

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
sick              200 non-null float64
age               200 non-null float64
children          200 non-null float64
avgtemp_winter    200 non-null float64
dtypes: float64(4)
memory usage: 6.3 KB


0,1,2,3
Dep. Variable:,sick,R-squared:,1.0
Model:,OLS,Adj. R-squared:,1.0
Method:,Least Squares,F-statistic:,1815000.0
Date:,"Wed, 24 Jan 2018",Prob (F-statistic):,0.0
Time:,01:58:49,Log-Likelihood:,876.87
No. Observations:,200,AIC:,-1746.0
Df Residuals:,196,BIC:,-1733.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2516,0.001,254.032,0.000,0.250,0.254
age,0.0129,6.49e-05,199.257,0.000,0.013,0.013
children,0.4005,0.001,643.790,0.000,0.399,0.402
avgtemp_winter,-0.0100,4.51e-05,-221.388,0.000,-0.010,-0.010

0,1,2,3
Omnibus:,24.095,Durbin-Watson:,1.997
Prob(Omnibus):,0.0,Jarque-Bera (JB):,7.115
Skew:,-0.002,Prob(JB):,0.0285
Kurtosis:,2.076,Cond. No.,290.0


In [19]:
def reg_moments(coef, variables):
    b0, b1, b2, b3, simple = coef
    age, children, wintertemp = variables
    pred_sick = b0 + b1 * age + b2 * children + b3*wintertemp
    return(pred_sick)

In [20]:
def reg_errors(coef, variables):
    
    simple  = coef[-1]
    
    actual_sick, age, children, wintertemp = variables
    
    pass_to_reg = age, children, wintertemp
    
    data_moments = actual_sick
    
    model_moments = reg_moments(coef, pass_to_reg)
    
    
    if simple:
        errorVector = model_moments - data_moments
    else:
        errorVector = (model_moments - data_moments) / data_moments
    
    return(errorVector)

In [27]:
def reg_criteria(params, *args):
    b0, b1, b2, b3, simple = params
    actual_sick, age, children, wintertemp, w = args
    variables = actual_sick, age, children, wintertemp
    error = reg_errors(params, variables)
    critical_value = np.dot(np.dot(error.T, w), error)
    return(critical_value)

In [29]:
#guess intial parameter values for beta coefficients

init_b0 = .5
init_b1 = .1
init_b2 = .25
init_b3 = .01
simple = False

sickvals = sickdf.sick
age = sickdf.age
children = sickdf.children
wintertemp = sickdf.avgtemp_winter

init_params = np.array([init_b0, init_b1, init_b2, init_b3, simple])

w_hat = np.eye(200)

gmm_reg_args = sickvals, age, children, wintertemp, w_hat

bounds = ((None, None), (None, None), (None, None), (None, None))

results_reg = opt.minimize(reg_criteria, init_params, args = (gmm_reg_args))

results_reg.x

array([  2.51640676e-01,   1.29335332e-02,   4.00500811e-01,
        -9.99166694e-03,   5.12291907e-05])

In [32]:
gmm_b0, gmm_b1, gmm_b2, gmm_b3, crit_function_value = results_reg.x

print("Estimates of Model Parameters: ")
print("Beta 0: ", gmm_b0)
print("Beta 1: ", gmm_b1)
print("Beta 2: ", gmm_b2)
print("Beta 3: ", gmm_b3)
print("Critical Function Value: ", crit_function_value)

Estimates of Model Parameters: 
Beta 0:  0.251640676039
Beta 1:  0.0129335332464
Beta 2:  0.400500810529
Beta 3:  -0.00999166694281
Critical Function Value:  5.12291907469e-05
