# **Econometrics 2**

### PC-Lab Session 3: ML and GMM Estimation

**Author:** [Anthony Strittmatter](http://www.anthonystrittmatter.com)

We evaluate a voter mobilization experiment. The data contains 473,857 registered eligible voters from Iowa and Michigan. Randomly selected 14,599 electorates received get-out-the-vote phone calls before the 2002 midterm elections, but only 6,088 electorates were available on-call.


We investigate the effects of answering the get-out-the-vote (GOTV) call on the probability to vote. We have access to the data set "*gotv.csv*". This data contains the following variables:

- *vote02*: Voted in 2002
- *call*: Received a GOTV call
- *contact*: Answered a GOTV call
- *vote00*: Voted in 2000
- *vote98*: Voted in 1998
- *female*: Female dummy
- *newreg*: First-time voter
- *age*: Age (in years)


# Preparation of Data and Packages

## Load Packages

In [14]:
########################  Load Packages  ########################

# List of required packages
pkgs <- c('psych', 'ggplot2', 'dplyr', 'corrplot', 'gmm', 'sandwich', 'lmtest', 'sem')

# Load packages
for(pkg in pkgs){
    library(pkg, character.only = TRUE)
}

set.seed(1001) # set starting value for random number generator

print('All packages successfully installed and loaded.')

[1] "All packages successfully installed and loaded."


## Load Data

In [2]:
############## Load Data ##############
setwd("C:/Users/user/Dropbox/Emetrics2/Exercise 3")
# Load data frame
df <- read.csv("gotv.csv",header=TRUE, sep=",")

# Outcome
vote02 <- as.matrix(df[,1])

# Called Voter by Phone
call <- as.matrix(df[,2])
# Reached Voter by Phone
contact <- as.matrix(df[,3])

# Covariates
covariates <- as.matrix(df[,c(4:ncol(df))])

print('Data is loaded.')

[1] "Data is loaded."


# **Exercises:**

## Descriptive Statistics

Plot the descriptive statistics of the variables in the data set.

In [3]:
############## Descriptive Statistics ##############

# Use the describe() function


#####################################################

## Correlation Matrix

Report the correlations between the variables in the data set.

In [4]:
############## Correlation ##############

# Use the cor() and corrplot() functions


#########################################

# Linear OLS Model

Estimate the effect of *contact* on *vote02* using an univariate OLS regression. Compute homo- and heteroskedastic standard errors. Interpret the estimated coefficients.

In [5]:
############## Univariate OLS estimator ##############

# Use the lm() and coeftest( ,vcov. = vcovHC) functions


######################################################

Run a multivariate OLS regression controlling for *vote00*, *vote98*, *female*, *newreg*, and *age*. How does the effect of *contact* on *vote02* change?

In [6]:
############## Multivariate OLS estimator ##############


########################################################

# Maximum Likelihood (ML)

## Probit

Estimate the effect of *contact* on *vote02* using a Probit model without control variables, considering the latent model
\begin{equation*}
vote02^* =\beta_0+\beta_1 contact + u.
\end{equation*}

In [7]:
############## Probit without Covariates ##############

# Use the glm() funstion with family = binomial(link = "probit")


#######################################################

Estimate the average marginal effect
\begin{equation*}
AME = \frac{1}{N} \sum_{i=1}^{N} ( \Phi(\beta_0+\beta_1) - \Phi(\beta_0)).
\end{equation*}
How do the results differ from OLS?

In [8]:
############## Average Marginal Effects ##############

# The function pnorm() returns the cdf of the normal distribution


######################################################

Estimate a multivariate Probit model and the AME. How do the results differ from OLS?

In [9]:
############## Multivariate Probit Model ##############

## Probit with Covariates


########################################################

## Logit

Estimate a multivariate Logit model and the AME. How do the results differ from Probit?

In [10]:
############## Multivariate Logit Model ##############

# The function plogis() returns the cdf of the logistic distribution


########################################################

# GMM estimator

The probability to answer the phone (landline) depends on the probability to be at home, which might be correlated with the probability to vote. For examples, employed electorates might be more difficult to reach by phone. At the same time, they have a lower probability to vote, because the election is on a working day. This would creeate a positive omitted variable bias.


To overcome this potential disadvantage, we apply a 2SLS model. The instrument is the randomised *call*. Being called has by definition a positive impact on the probability to answer the call. However, being called is unlikely to affect the probability of voting unless the call is answered.


In the first place, investigate the first stage power of the instrument with and without covariates.

In [11]:
############## First Stage ##############

# Univeriate OLS


# Multivariate OLS


##########################################

Replicate the OLS results using the GMM estimator

In [12]:
############## GMM estimator ##############

# Replicate OLS using the gmm() package

############################################

Estimate the 2SLS model using the GMM package. What is the policy conclusion?

In [13]:
############## 2SLS Estimator ##############


#############################################