Regression with Binary Variables
===
---
* Linear Probability Model
* Probit Regression
* Logit Regression
---

In [1]:
library(tidyverse)

─ Attaching packages ──────────────────── tidyverse 1.2.1 ─
✔ ggplot2 2.2.1     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.1     ✔ stringr 1.4.0
✔ readr   1.1.1     ✔ forcats 0.3.0
“package ‘stringr’ was built under R version 3.5.2”─ Conflicts ───────────────────── tidyverse_conflicts() ─
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


In [2]:
charity <- read.csv("/Users/tino/Desktop/TA-Econometrics-II/datasets/0507/charity.csv")

In [4]:
library(stargazer)
library(sandwich)
library(lmtest)

Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric



## Questions
---
1. Does the person who responded to most recent mailing has higher probability to respond with gift? Specify a regression model and test the hypothesis.
* Control other variables, answer question (1) and report the results from LPM, Logit and Probit model in a table.
* Using the results from LPM, what is the predicted probability of responding with gift for a person who ...
* Calculate the predicted probability of responding with gift for same person considered in (4) for the probit and logit models.
* **(PS4) For the logit only, test the null hypothesis that black, white and "others" are equally likely to be insured (after controlling for the other variables of the model). Specify the null and alternative hypotheses and use F test as well as LR test. Can you reject this null hypothesis at a 5% significance level?**

### Question 1

In [6]:
q1 <- lm(respond ~ resplast, data = charity)
cov <- vcovHC(q1, type = "HC1")
robust_se <- sqrt(diag(cov))
stargazer(q1, type = "text",
          se = list(NULL, robust_se))


                        Dependent variable:    
                    ---------------------------
                              respond          
-----------------------------------------------
resplast                     0.343***          
                              (0.015)          
                                               
Constant                     0.285***          
                              (0.009)          
                                               
-----------------------------------------------
Observations                   4,268           
R2                             0.109           
Adjusted R2                    0.109           
Residual Std. Error      0.462 (df = 4266)     
F Statistic          524.394*** (df = 1; 4266) 
Note:               *p<0.1; **p<0.05; ***p<0.01


In [9]:
t.test(charity$respond ~ charity$resplast, var.eq = T)


	Two Sample t-test

data:  charity$respond by charity$resplast
t = -22.9, df = 4266, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3728561 -0.3140478
sample estimates:
mean in group 0 mean in group 1 
      0.2849595       0.6284115 


### Question 2

In [17]:
LPM <- lm(respond ~ resplast + weekslast + propresp + mailsyear + giftlast, data = charity)
probit <- glm(respond ~ resplast + weekslast + propresp + mailsyear + giftlast, family = binomial(link = "probit"), data = charity)
logit <- glm(respond ~ resplast + weekslast + propresp + mailsyear + giftlast, family = binomial(link = "logit"), data = charity)

cov2 <- vcovHC(LPM, type = "HC1")
robust_se2 <- sqrt(diag(cov2))

stargazer(LPM, probit, logit, type = "text", title = "Binary Dependent Variable", se = list(robust_se2, NULL, NULL)) 


Binary Dependent Variable
                                  Dependent variable:              
                    -----------------------------------------------
                                        respond                    
                               OLS              probit    logistic 
                               (1)               (2)        (3)    
-------------------------------------------------------------------
resplast                    0.067***           0.127**    0.192**  
                             (0.021)           (0.057)    (0.094)  
                                                                   
weekslast                   -0.001***         -0.005***  -0.008*** 
                            (0.0002)           (0.001)    (0.001)  
                                                                   
propresp                    0.650***           1.848***   3.034*** 
                             (0.039)           (0.114)    (0.191)  
                     

### Question 5

In [21]:
insurance <- read.csv("/Users/tino/Desktop/TA-Econometrics-II/problem set/PS4/insurance.csv")
library(car)

In [22]:
# F test
logit <- glm(insured~selfemp+age+deg_nd+deg_ged+deg_hs+deg_ba+deg_ma+deg_phd+familysz+race_bl+race_wht+male+married, family = binomial(link = "logit"), data = insurance)
linearHypothesis(logit, c("race_bl = 0", "race_wht = 0"))

Res.Df,Df,Chisq,Pr(>Chisq)
8790,,,
8788,2.0,13.85698,0.0009794776


In [24]:
# LR test
logit <- glm(insured~selfemp+age+deg_nd+deg_ged+deg_hs+deg_ba+deg_ma+deg_phd+familysz+race_bl+race_wht+male+married, family = binomial(link = "logit"), data = insurance)
logit_H0 <- glm(insured~selfemp+age+deg_nd+deg_ged+deg_hs+deg_ba+deg_ma+deg_phd+familysz+male+married, family = binomial(link = "logit"), data = insurance)
lrtest(logit_H0, logit)

#Df,LogLik,Df,Chisq,Pr(>Chisq)
12,-3785.859,,,
14,-3779.132,2.0,13.45389,0.001198189


---
## 1. Linear Probability Model

In [7]:
LPM <- lm(respond ~ resplast + weekslast + propresp + mailsyear + giftlast, data = charity)
stargazer(LPM, type = "text", title = "LPM")


LPM
                        Dependent variable:    
                    ---------------------------
                              respond          
-----------------------------------------------
resplast                     0.067***          
                              (0.019)          
                                               
weekslast                    -0.001***         
                             (0.0002)          
                                               
propresp                     0.650***          
                              (0.037)          
                                               
mailsyear                    0.052***          
                              (0.010)          
                                               
giftlast                     0.0001**          
                             (0.00004)         
                                               
Constant                       0.017           
                              (0.03

---
## 2. Probit Regression

In [9]:
probit <- glm(respond ~ resplast + weekslast + propresp + mailsyear + giftlast, family = binomial(link = "probit"), data = charity)
stargazer(probit, type = "text", title = "Probit")

“glm.fit: fitted probabilities numerically 0 or 1 occurred”


Probit
                      Dependent variable:    
                  ---------------------------
                            respond          
---------------------------------------------
resplast                    0.127**          
                            (0.057)          
                                             
weekslast                  -0.005***         
                            (0.001)          
                                             
propresp                   1.848***          
                            (0.114)          
                                             
mailsyear                  0.146***          
                            (0.032)          
                                             
giftlast                     0.001           
                            (0.001)          
                                             
Constant                   -1.296***         
                            (0.114)          
                          

---
## 3. Logit Regression

In [17]:
logit <- glm(respond ~ resplast + weekslast + propresp + mailsyear + giftlast, family = binomial(link = "logit"), data = charity)
stargazer(logit, type = "text", title = "Logit")


Logit
                      Dependent variable:    
                  ---------------------------
                            respond          
---------------------------------------------
resplast                    0.192**          
                            (0.094)          
                                             
weekslast                  -0.008***         
                            (0.001)          
                                             
propresp                   3.034***          
                            (0.191)          
                                             
mailsyear                  0.244***          
                            (0.054)          
                                             
giftlast                     0.002           
                            (0.002)          
                                             
Constant                   -2.117***         
                            (0.193)          
                           

In [15]:
stargazer(LPM, probit, logit, type = "text", title = "Binary Depedent Variable")


Binary Depedent Variable
                                  Dependent variable:              
                    -----------------------------------------------
                                        respond                    
                               OLS              probit    logistic 
                               (1)               (2)        (3)    
-------------------------------------------------------------------
resplast                    0.067***           0.127**    0.192**  
                             (0.019)           (0.057)    (0.094)  
                                                                   
weekslast                   -0.001***         -0.005***  -0.008*** 
                            (0.0002)           (0.001)    (0.001)  
                                                                   
propresp                    0.650***           1.848***   3.034*** 
                             (0.037)           (0.114)    (0.191)  
                      

In [19]:
library(car)
linearHypothesis(logit, c("resplast = 0", "giftlast = 0"))

Loading required package: carData

Attaching package: ‘car’

The following object is masked from ‘package:dplyr’:

    recode

The following object is masked from ‘package:purrr’:

    some



Res.Df,Df,Chisq,Pr(>Chisq)
4264,,,
4262,2.0,5.804156,0.054909


In [20]:
logit <- glm(respond ~ resplast + weekslast + propresp + mailsyear + giftlast, family = binomial(link = "logit"), data = charity)
logit_H0 <- glm(respond ~ weekslast + propresp + mailsyear, family = binomial(link = "logit"), data = charity)

In [23]:
library(lmtest)
lrtest(logit, logit_H0)

Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric



#Df,LogLik,Df,Chisq,Pr(>Chisq)
6,-2378.337,,,
4,-2383.267,-2.0,9.861033,0.007222773
