# Why Do People Vote?

### The Study
"In August 2006 three researchers (Alan Gerber and Donald Green of Yale University, and Christopher Larimer of the University of Northern Iowa) carried out a large scale field experiment in Michigan, USA to test the hypothesis that one of the reasons people vote is social, or extrinsic, pressure." 

"The researchers grouped about 344,000 voters into different groups randomly - about 191,000 voters were a "control" group, and the rest were categorized into one of four "treatment" groups. These five groups correspond to five binary variables in the dataset."

(source: MITx)

### The Variables

**"Civic Duty" (variable civicduty)** group members were sent a note saying "DO YOUR CIVIC DUTY - VOTE!"

**"Hawthorne Effect" (variable hawthorne)** group members were sent a the "Civic Duty" message plus the message "YOU ARE BEING STUDIED" and told their voting behavior would be examined via public records.

**"Self" (variable self)** group members were sent the "Civic Duty" message, in addition to recent voting records for everyone in their home. Furthermore, an updated message stated that updated voting records for the household would be sent after the election.

**"Neighbors" (variable neighbors)** group members received the same message the "Self" group, accompanied not only by the household voting records, but also those of neighbors.  This was done to maximize social pressure.

**"Control" (variable control)** group members received and represented the typical voter.

Additional variables include **sex** (0 for male, 1 for female), **yob (year of birth)**, and **the dependent variable voting** (1 if they voted, 0 otherwise).

(source: MITx)

In [148]:
gerber = read.csv("gerber.csv")

### Exploratory Data Analysis

In [5]:
str(gerber)

'data.frame':	344084 obs. of  8 variables:
 $ sex      : int  0 1 1 1 0 1 0 0 1 0 ...
 $ yob      : int  1941 1947 1982 1950 1951 1959 1956 1981 1968 1967 ...
 $ voting   : int  0 0 1 1 1 1 1 0 0 0 ...
 $ hawthorne: int  0 0 1 1 1 0 0 0 0 0 ...
 $ civicduty: int  1 1 0 0 0 0 0 0 0 0 ...
 $ neighbors: int  0 0 0 0 0 0 0 0 0 0 ...
 $ self     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ control  : int  0 0 0 0 0 1 1 1 1 1 ...


In [7]:
summary(gerber)

      sex              yob           voting         hawthorne    
 Min.   :0.0000   Min.   :1900   Min.   :0.0000   Min.   :0.000  
 1st Qu.:0.0000   1st Qu.:1947   1st Qu.:0.0000   1st Qu.:0.000  
 Median :0.0000   Median :1956   Median :0.0000   Median :0.000  
 Mean   :0.4993   Mean   :1956   Mean   :0.3159   Mean   :0.111  
 3rd Qu.:1.0000   3rd Qu.:1965   3rd Qu.:1.0000   3rd Qu.:0.000  
 Max.   :1.0000   Max.   :1986   Max.   :1.0000   Max.   :1.000  
   civicduty        neighbors          self           control      
 Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :0.000   Median :0.0000   Median :1.0000  
 Mean   :0.1111   Mean   :0.111   Mean   :0.1111   Mean   :0.5558  
 3rd Qu.:0.0000   3rd Qu.:0.000   3rd Qu.:0.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.000   Max.   :1.0000   Max.   :1.0000  

Only 31.59% of individuals in the data set voted.

In [145]:
voter_percent = function(df, x, y, table_cell){
    
    # Input: Dataframe as variable, x and y as column
    # names (strings), and the number of the cell of interest
    # as an int [1,4].
    #
    # Output: Yields the value of a single 
    # cell in a proportionate table as a percentage.
    
    base_table = table(df[[x]], df[[y]])
    cross_table = prop.table(base_table, margin=1)
    return (round((cross_table[[table_cell]]*100),2))
}

In [146]:
columns = c(colnames(gerber)[4:8])
for (i in columns){
    print(paste(voter_percent(gerber, i, "voting", 4), "% of the ", i ," group voted.", sep=""))
}


[1] "32.24% of the hawthorne group voted."
[1] "31.45% of the civicduty group voted."
[1] "37.79% of the neighbors group voted."
[1] "34.52% of the self group voted."
[1] "29.66% of the control group voted."


The data above indicate the largest percentage of study participants who actually voted were in the "neighbors" cohort. This suggests heightened social pressure may increase one's propensity to vote.  

In [150]:
logit1 = glm(voting ~ civicduty + hawthorne + self + neighbors, data=gerber, family='binomial')

In [151]:
summary(logit1)


Call:
glm(formula = voting ~ civicduty + hawthorne + self + neighbors, 
    family = "binomial", data = gerber)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9744  -0.8691  -0.8389   1.4586   1.5590  

Coefficients:
             Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -0.863358   0.005006 -172.459  < 2e-16 ***
civicduty    0.084368   0.012100    6.972 3.12e-12 ***
hawthorne    0.120477   0.012037   10.009  < 2e-16 ***
self         0.222937   0.011867   18.786  < 2e-16 ***
neighbors    0.365092   0.011679   31.260  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 429238  on 344083  degrees of freedom
Residual deviance: 428090  on 344079  degrees of freedom
AIC: 428100

Number of Fisher Scoring iterations: 4


We constructed a logistic model using the entire response vector (i.e., voting) as the dependent variable and the four "treatment groups" (i.e., civicduty, hawthorne, self, neighbors) as independent variables. The results were surprising: all four groups were attended by coefficients that were highly significant. Restated, while voting behavior appears to be related to membership in any of the four groups, none stood out one way or the other. 