First Differences and Monte Carlo Simulations

Constanza F. Schibber 2020-10-1

1 Data: Voting Intentions in the 1988 Chilean Plebiscite
2 Model, Logit
3 Using the sim function in the arm library
4 First Differences: Sex & SQ
5 First Differences: Sex & Income

1 Data: Voting Intentions in the 1988 Chilean Plebiscite

The Chile data frame has 2700 rows and 8 columns. The data are from a national survey conducted in April and May of 1988 by FLACSO/Chile. There are some missing data.

The survey wanted to capture Chileans attitudes and whether they would vote in favor/against the authoritarian regime during the Chilean plebiscite/referendum. In this referendum, all citizens would vote on whether Chile would continue as a military regime or it would transition to open and free elections towards democracy.

library(car)

## Loading required package: carData

# help(Chile) # for more information on the dataset
library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

stargazer(Chile, type = "text")

## 
## =======================================================
## Statistic    N      Mean      St. Dev.    Min     Max  
## -------------------------------------------------------
## population 2,700 152,222.200 102,198.000 3,750  250,000
## age        2,699   38.549      14.756      18     70   
## income     2,602 33,875.860  39,502.870  2,500  200,000
## statusquo  2,683   -0.000       1.000    -1.803  2.049 
## -------------------------------------------------------

# Recode yes/no We recode 'undecided' and 'abstain' as NA
Chile$yes <- with(Chile, ifelse(vote == "Y", 1, ifelse(vote ==
    "N", 0, NA)))
table(Chile$vote)

## 
##   A   N   U   Y 
## 187 889 588 868

table(Chile$yes)

## 
##   0   1 
## 889 868

Then omit observations with missing data - Issue is that if you don’t omit missing the predicted values will have a different length than the data and it can cause problems. We will cover in a few weeks how to handle missing values using multiple imputation. Do not omit missing data in your own work!

Chile <- na.omit(Chile)

2 Model, Logit

Chile.out <- glm(yes ~ statusquo + age + income + sex, family = binomial(link = logit),
    data = Chile)

stargazer(Chile.out, type = "text")

## 
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                               yes            
## ---------------------------------------------
## statusquo                  3.206***          
##                             (0.147)          
##                                              
## age                          0.009           
##                             (0.007)          
##                                              
## income                    -0.00001**         
##                            (0.00000)         
##                                              
## sexM                       -0.619***         
##                             (0.201)          
##                                              
## Constant                     0.436           
##                             (0.315)          
##                                              
## ---------------------------------------------
## Observations                 1,703           
## Log Likelihood             -358.736          
## Akaike Inf. Crit.           727.473          
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01

3 Using the `sim` function in the `arm` library

We simulate predictions for sex = Female and sex = Male. Remember that we are setting other covariates at their mean, but in a few weeks we will cover the observed value approach to produce better predictions.

library(arm)

## Loading required package: MASS

## Loading required package: Matrix

## Loading required package: lme4

## 
## arm (Version 1.13-1, built: 2022-8-25)

## Working directory is /Users/connie/Generalized-Linear-Models/Labs/03.week

## 
## Attaching package: 'arm'

## The following object is masked from 'package:car':
## 
##     logit

library(faraway)

## 
## Attaching package: 'faraway'

## The following objects are masked from 'package:arm':
## 
##     fround, logit, pfround

## The following objects are masked from 'package:car':
## 
##     logit, vif

# new data for prediction
sex.F <- cbind(1, mean(na.omit(Chile$statusquo)), mean(na.omit(Chile$age)),
    mean(na.omit(Chile$income)), 0)  #0 for female
sex.M <- cbind(1, mean(na.omit(Chile$statusquo)), mean(na.omit(Chile$age)),
    mean(na.omit(Chile$income)), 1)  # 1 for male

# sim

sim.model <- coef(sim(Chile.out, 1000))

# use sim

predict.F <- ilogit(sex.F %*% t(sim.model))
predict.M <- ilogit(sex.M %*% t(sim.model))

We can now calculate first differences: P(vote|M) - P(vote|F)

# first difference P(vote|M)-P(vote|F)

first.difference <- predict.M - predict.F

FD.M.F <- quantile(first.difference, 0.5)

# For a 95% CI (CHECK THAT FOR SOME FIGURES I DID 90% CI
# w/quantile 0.05 and 0.95
CI.FD.M.F <- quantile(first.difference, probs = c(0.025, 0.975))

(c(CI.FD.M.F[1], FD.M.F, CI.FD.M.F[2]))

##        2.5%         50%       97.5% 
## -0.24242150 -0.15285521 -0.05949068

4 First Differences: Sex & SQ

Let’s add another dimension: we can calculate predictions varying two covariates – status quo and sex.

Below I am using the same figure and calculations like in the predicted probability figure. We have to specify “1” because we have a matrix in which the rows refer to each value of sq and the columns refer to simulated coefficient. Basically, we have a prediction for each coefficient for each sq. We need to do the calculations for each value of sq. We have to slice a distribution of predictions to get the median and the bounds for the confidence intervals.

# Here I will show an example when status quo varies

sq <- seq(min(Chile$statusquo), max(Chile$statusquo), length.out = 100)

sex.F.sq <- cbind(1, sq, mean(na.omit(Chile$age)), mean(na.omit(Chile$income)),
    0)  #0 for female
sex.M.sq <- cbind(1, sq, mean(na.omit(Chile$age)), mean(na.omit(Chile$income)),
    1)  # 1 for male

# sim

sim.model <- coef(sim(Chile.out, 1000))

# use sim

predict.F.sq <- ilogit(sex.F.sq %*% t(sim.model))
predict.M.sq <- ilogit(sex.M.sq %*% t(sim.model))

# first difference P(vote|F)-P(vote|M)

first.difference <- predict.F.sq - predict.M.sq


par(mfrow = c(1, 2))
plot(NULL, xlab = "Support for the Status Quo", ylab = "P(Vote in favor of the military regime)",
    main = "Predicted Probabilities", ylim = c(-0.01, 1), xlim = c(min(na.omit(Chile$statusquo)),
        max(na.omit(Chile$statusquo))))
lines(sq, apply(predict.F.sq, 1, quantile, 0.5), lwd = 3)
lines(sq, apply(predict.F.sq, 1, quantile, 0.025), lty = 2)
lines(sq, apply(predict.F.sq, 1, quantile, 0.97), lty = 2)
lines(sq, apply(predict.M.sq, 1, quantile, 0.5), lwd = 3, col = "blue")
lines(sq, apply(predict.M.sq, 1, quantile, 0.025), lty = 2, col = "blue")
lines(sq, apply(predict.M.sq, 1, quantile, 0.97), lty = 2, col = "blue")
legend("bottomright", legend = c("Female", "Male"), col = c("black",
    "blue"), lty = c(1, 1), cex = 0.5)

plot(NULL, xlab = "Support for the Status Quo", ylab = "P(Vote|Female)-P(Vote|Male)",
    main = "First Differences", ylim = c(-0.01, 1), xlim = c(min(na.omit(Chile$statusquo)),
        max(na.omit(Chile$statusquo))))
lines(sq, apply(first.difference, 1, quantile, 0.5), lwd = 3)
lines(sq, apply(first.difference, 1, quantile, 0.025), lty = 2)
lines(sq, apply(first.difference, 1, quantile, 0.97), lty = 2)
abline(h = 0, lty = 3)

Important for interpretation:

Overlapping CI for predictions DOES NOT mean there is no difference between the predictions. This is a very common error. You have to calculate first difference and the SE for the first difference! By doing simulation the calculation of the CI for the difference is trivial.
We see on the first difference plot that there is a difference between the probability of voting in favor of the Pinochet regime for men and women and it is different from zero for values of status quo from -1 to 1. Because the first difference is positive, it means women were more likely than men to support the Pinochet regime.

5 First Differences: Sex & Income

Now, we calculate predictions for another two varying covariates – income and sex.

# Here I will show an example when status quo varies

income <- seq(min(Chile$income), max(Chile$income), length.out = 100)

sex.F.income <- cbind(1, mean(na.omit(Chile$statusquo)), mean(na.omit(Chile$age)),
    income, 0)  #0 for female
sex.M.income <- cbind(1, mean(na.omit(Chile$statusquo)), mean(na.omit(Chile$age)),
    income, 1)  # 1 for male

# sim

sim.model <- coef(sim(Chile.out, 1000))

# use sim

predict.F.income <- ilogit(sex.F.income %*% t(sim.model))
predict.M.income <- ilogit(sex.M.income %*% t(sim.model))

# first difference P(vote|F)-P(vote|M)

first.difference <- predict.F.income - predict.M.income


par(mfrow = c(1, 2))
plot(NULL, xlab = "Income", ylab = "P(Vote in favor of the military regime)",
    main = "Predicted Probabilities", ylim = c(-0.01, 1), xlim = c(min(na.omit(Chile$income)),
        max(na.omit(Chile$income))))
lines(income, apply(predict.F.income, 1, quantile, 0.5), lwd = 3)
lines(income, apply(predict.F.income, 1, quantile, 0.025), lty = 2)
lines(income, apply(predict.F.income, 1, quantile, 0.97), lty = 2)
lines(income, apply(predict.M.income, 1, quantile, 0.5), lwd = 3,
    col = "blue")
lines(income, apply(predict.M.income, 1, quantile, 0.025), lty = 2,
    col = "blue")
lines(income, apply(predict.M.income, 1, quantile, 0.97), lty = 2,
    col = "blue")
legend("bottomright", legend = c("Female", "Male"), col = c("black",
    "blue"), lty = c(1, 1), cex = 0.5)

plot(NULL, xlab = "Income", ylab = "P(Vote|Female)-P(Vote|Male)",
    main = "First Differences", ylim = c(-0.01, 1), xlim = c(min(na.omit(Chile$income)),
        max(na.omit(Chile$income))))
lines(income, apply(first.difference, 1, quantile, 0.5), lwd = 3)
lines(income, apply(first.difference, 1, quantile, 0.025), lty = 2)
lines(income, apply(first.difference, 1, quantile, 0.97), lty = 2)
abline(h = 0, lty = 3)

Discussion:

Do we observe a difference between the vote choice of women and men?
Do we observe a difference between the voice choice of women and men as income increases? How would you calculate a second difference?
Do we observe a difference between the vote choice of women of lower income and women of high income? How would you calculate the first difference?
Do we observe a difference between the vote choice of women of lower income and men of lower income? How would you calculate the first difference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FirstDifferencesMCSimulation.md

FirstDifferencesMCSimulation.md

First Differences and Monte Carlo Simulations

1 Data: Voting Intentions in the 1988 Chilean Plebiscite

2 Model, Logit

3 Using the `sim` function in the `arm` library

4 First Differences: Sex & SQ

5 First Differences: Sex & Income

Files

FirstDifferencesMCSimulation.md

Latest commit

History

FirstDifferencesMCSimulation.md

File metadata and controls

First Differences and Monte Carlo Simulations

1 Data: Voting Intentions in the 1988 Chilean Plebiscite

2 Model, Logit

3 Using the sim function in the arm library

4 First Differences: Sex & SQ

5 First Differences: Sex & Income

3 Using the `sim` function in the `arm` library