# Westwood et al. (2022) Replication in R - Part 3: Partial Identification Bounds

**IMPORTANT:** Change runtime to R: Runtime -> Change runtime type -> R

**The Problem:**
We observe that disengaged respondents report higher support for violence.
But what would they say if they WERE engaged?

This is a **counterfactual** question - we can't observe it directly.
But we can **bound** it using partial identification methods.

Code follows **partial_id_bounds.R** from the original replication materials.

## Step 1: Setup

In [None]:
# Install and load packages
install.packages(c("dplyr", "readr"), quiet=TRUE)

suppressPackageStartupMessages({
  library(dplyr)
  library(readr)
})

cat("Setup complete\n")

## Step 2: The Partial ID Model

**Notation:**
- $Y$ = observed outcome (1 = supports violence, 0 = doesn't)
- $C$ = passed engagement check (1 = engaged, 0 = disengaged)
- $T$ = truly engaged (latent, unobserved)
- $g$ = guess rate on comprehension check
- $Y^*$ = counterfactual: what would they answer if truly engaged?

**Goal:** Bound $E[Y^*]$ - the population mean if everyone were truly engaged.

**Key Assumptions:**
1. Truly engaged always pass: $P(C=1|T=1) = 1$
2. Disengaged pass by guessing: $P(C=1|T=0) = g$
3. Engaged answer truthfully: $Y = Y^*$ when $T=1$

## Step 3: Mathematical Derivation

From assumption 2, everyone who fails ($C=0$) is truly disengaged.

**Step 1:** Link $T$ to observable $C$:
$$P(C=0) = (1-g) \cdot P(T=0) \implies P(T=0) = \frac{P(C=0)}{1-g}$$

**Step 2:** Decompose the counterfactual mean:
$$E[Y^*] = E[Y^*|T=1] \cdot P(T=1) + E[Y^*|T=0] \cdot P(T=0)$$

For engaged: $E[Y^*|T=1] = E[Y|T=1]$ (truthful)

For disengaged: $E[Y^*|T=0] \in [a, b]$ (unknown, bounded)

**Step 3:** Final bounds formula:
$$\theta \in \left[E[Y] + \frac{P(C=0)(a - E[Y|C=0])}{1-g}, \; E[Y] + \frac{P(C=0)(b - E[Y|C=0])}{1-g}\right]$$

## Step 4: The `partial_bounds()` Function

This is directly from **partial_id_bounds.R** lines 1-30.

In [None]:
# partial_id_bounds.R lines 1-30
partial_bounds <- function(outcome, check, guess_rate,
                           a = 0, b = 1, conf_level = 0.95) {
  # Coefficients for linear combination (Delta method)
  Delta_coef_lo <- c(1, a/(1-guess_rate), -1/(1-guess_rate))
  Delta_coef_hi <- c(1, b/(1-guess_rate), -1/(1-guess_rate))
  
  # Sufficient statistics: [Y, 1-C, Y*(1-C)]
  X <- cbind(outcome, (1 - check), outcome * (1 - check))
  Xbar <- colMeans(X)
  root_N <- sqrt(nrow(X))
  
  # Point estimates for lower and upper bounds
  theta_lo <- c(Delta_coef_lo %*% Xbar)
  theta_hi <- c(Delta_coef_hi %*% Xbar)
  
  # Delta method variance estimation
  V_X <- cov(X)
  sd_lo <- sqrt(c(t(Delta_coef_lo) %*% V_X %*% Delta_coef_lo))
  sd_hi <- sqrt(c(t(Delta_coef_hi) %*% V_X %*% Delta_coef_hi))
  sd_max <- max(sd_hi, sd_lo)
  delta_hat <- theta_hi - theta_lo
  
  # Imbens-Manski critical value optimization
  dist <- function(crit) {
    left <- pnorm(crit + root_N * delta_hat / sd_max) - pnorm(-crit)
    return(abs(left - conf_level))
  }
  res <- optim(1.0, dist, method = "Brent", lower = 0, upper = 10)
  crit <- res$par
  
  # Confidence interval (bounded to [0,1])
  ci_lo <- max(min(theta_lo - crit * sd_lo / root_N, 1), 0)
  ci_hi <- max(min(theta_hi + crit * sd_hi / root_N, 1), 0)
  
  return(c(ci_lo, ci_hi))
}

# Helper function for mean and SE
mean_se <- function(x, mult = 1) {
  m <- mean(x, na.rm = TRUE)
  se <- sd(x, na.rm = TRUE) / sqrt(sum(!is.na(x)))
  return(c(mean = m, lower = m - mult*se, upper = m + mult*se))
}

cat("partial_bounds() function defined\n")

## Step 5: Download and Preprocess Study 3 Data

Study 3 is from YouGov (higher quality panel) and is used for partial ID.

In [None]:
# Download Study 3 from Google Drive
download_gdrive <- function(file_id, destfile) {
  url <- paste0("https://drive.google.com/uc?export=download&id=", file_id)
  download.file(url, destfile, quiet = TRUE, mode = "wb")
}

download_gdrive("1OYlDc-TgzqNa9iFRgcUa1XRLH-uQHomO", "/tmp/study3.csv")
study3 <- read_csv("/tmp/study3.csv", show_col_types = FALSE)

cat("Study 3 loaded: n =", nrow(study3), "\n")

In [None]:
# Preprocess Study 3 (preprocess3.R lines 6-21)

# Engagement check - Q12 asks about state mentioned in vignette
study3$passed <- "Disengaged Respondent"
study3$passed[study3$Q12 == "Iowa"] <- "Engaged Respondent"

# Recode DVs
study3$supportactions <- recode(study3$Q13, 
  "Strongly support" = 5, "Support" = 4, 
  "Neither support nor oppose" = 3, 
  "Oppose" = 2, "Strongly oppose" = 1)

study3$justified <- recode(study3$Q14, "Justified" = 1, "Unjustified" = 0)
study3$charged <- recode(study3$Q15, "Yes" = 1, "No" = 0)

# Alignment based on experiment_randomization2 and pid3
# Note: Original code has a leading space in alignment values
study3$alignment <- NA
study3$alignment[study3$experiment_randomization2 == "Republican/Republican focus" & 
                 study3$pid3 == "Republican"] <- " Out-Party Shooter"
study3$alignment[study3$experiment_randomization2 == "Republican/Republican focus" & 
                 study3$pid3 == "Democrat"] <- " In-Party Shooter"
study3$alignment[study3$experiment_randomization2 == "Democrat/Democratic focus" & 
                 study3$pid3 == "Democrat"] <- " Out-Party Shooter"
study3$alignment[study3$experiment_randomization2 == "Democrat/Democratic focus" & 
                 study3$pid3 == "Republican"] <- " In-Party Shooter"

study3$alignment <- as.factor(study3$alignment)

cat("Study 3 preprocessed\n")
cat("\nEngagement status:\n")
print(table(study3$passed))
cat("\nAlignment:\n")
print(table(study3$alignment, useNA = "ifany"))

## Step 6: Prepare Data for Partial ID Analysis

From **partial_id_bounds.R** lines 33-39.

Focus on In-Party Shooter condition (where respondent's party matches shooter's party).

In [None]:
# partial_id_bounds.R lines 33-39
study3partialID <- study3 %>% 
  filter(alignment == " In-Party Shooter") %>% 
  select(passed, alignment, justified, supportactions, charged) %>%
  mutate(y1 = justified,
         y2 = if_else(supportactions >= 4, 1, 0),  # Support or Strongly support
         y3 = charged,
         c = if_else(passed == 'Engaged Respondent', 1, 0))

cat("Partial ID sample: n =", nrow(study3partialID), "\n")
cat("Engagement rate:", round(mean(study3partialID$c), 3), "\n")

## Step 7: Calculate Partial ID Bounds

The guess rate is 1/7 because the comprehension check has 7 options.

From **partial_id_bounds.R** lines 41-60.

In [None]:
# Guess rate for 7-option comprehension check
g <- 1.0 / 7.0

cat("=", rep("=", 59), "\n", sep="")
cat("OUTCOME 1: JUSTIFIED\n")
cat("=", rep("=", 59), "\n", sep="")

# Engaged-only estimate (partial_id_bounds.R line 42)
engaged_est <- mean_se(study3partialID$y1[study3partialID$c==1], mult=2) * 100
cat("\nEngaged respondents only (observed):")
cat(sprintf("\n  Mean: %.2f%% [%.2f%%, %.2f%%]\n", 
            engaged_est[1], engaged_est[2], engaged_est[3]))

# Agnostic bounds (a=0, b=1) - partial_id_bounds.R lines 43-44
bounds_agnostic <- partial_bounds(outcome = study3partialID$y1, 
                                  check = study3partialID$c,
                                  guess_rate = g, a = 0, b = 1) * 100
cat(sprintf("\nAgnostic bounds (a=0, b=1): [%.2f%%, %.2f%%]\n", 
            bounds_agnostic[1], bounds_agnostic[2]))

# Conservative bounds (a=0, b=0.25) - partial_id_bounds.R lines 45-46
bounds_conservative <- partial_bounds(outcome = study3partialID$y1,
                                      check = study3partialID$c,
                                      guess_rate = g, a = 0, b = 0.25) * 100
cat(sprintf("Conservative bounds (a=0, b=0.25): [%.2f%%, %.2f%%]\n", 
            bounds_conservative[1], bounds_conservative[2]))

In [None]:
cat("=", rep("=", 59), "\n", sep="")
cat("OUTCOME 2: SUPPORT (Support or Strongly Support)\n")
cat("=", rep("=", 59), "\n", sep="")

# Engaged-only estimate (partial_id_bounds.R line 49)
engaged_est2 <- mean_se(study3partialID$y2[study3partialID$c==1], mult=2) * 100
cat("\nEngaged respondents only (observed):")
cat(sprintf("\n  Mean: %.2f%% [%.2f%%, %.2f%%]\n", 
            engaged_est2[1], engaged_est2[2], engaged_est2[3]))

# Agnostic bounds - partial_id_bounds.R lines 50-51
bounds_agnostic2 <- partial_bounds(outcome = study3partialID$y2, 
                                   check = study3partialID$c,
                                   guess_rate = g, a = 0, b = 1) * 100
cat(sprintf("\nAgnostic bounds (a=0, b=1): [%.2f%%, %.2f%%]\n", 
            bounds_agnostic2[1], bounds_agnostic2[2]))

# Conservative bounds - partial_id_bounds.R lines 52-53
bounds_conservative2 <- partial_bounds(outcome = study3partialID$y2,
                                       check = study3partialID$c,
                                       guess_rate = g, a = 0, b = 0.2) * 100
cat(sprintf("Conservative bounds (a=0, b=0.20): [%.2f%%, %.2f%%]\n", 
            bounds_conservative2[1], bounds_conservative2[2]))

In [None]:
cat("=", rep("=", 59), "\n", sep="")
cat("OUTCOME 3: CHARGED (Should suspect be charged?)\n")
cat("=", rep("=", 59), "\n", sep="")

# Engaged-only estimate (partial_id_bounds.R line 56)
engaged_est3 <- mean_se(study3partialID$y3[study3partialID$c==1], mult=2) * 100
cat("\nEngaged respondents only (observed):")
cat(sprintf("\n  Mean: %.2f%% [%.2f%%, %.2f%%]\n", 
            engaged_est3[1], engaged_est3[2], engaged_est3[3]))

# Agnostic bounds - partial_id_bounds.R lines 57-58
bounds_agnostic3 <- partial_bounds(outcome = study3partialID$y3, 
                                   check = study3partialID$c,
                                   guess_rate = g, a = 0, b = 1) * 100
cat(sprintf("\nAgnostic bounds (a=0, b=1): [%.2f%%, %.2f%%]\n", 
            bounds_agnostic3[1], bounds_agnostic3[2]))

# High assumption bounds - partial_id_bounds.R lines 59-60
# For charged, we expect most people would say yes (high a)
bounds_high3 <- partial_bounds(outcome = study3partialID$y3,
                               check = study3partialID$c,
                               guess_rate = g, a = 0.8, b = 1.0) * 100
cat(sprintf("High bounds (a=0.80, b=1.0): [%.2f%%, %.2f%%]\n", 
            bounds_high3[1], bounds_high3[2]))

## Step 8: Summary Table

In [None]:
cat("\n")
cat("=", rep("=", 70), "\n", sep="")
cat("SUMMARY: PARTIAL IDENTIFICATION BOUNDS (Study 3, In-Party Shooter)\n")
cat("=", rep("=", 70), "\n", sep="")

cat("\n%-20s %15s %25s\n", "Outcome", "Engaged Only", "95% CI Bounds")
cat(rep("-", 70), "\n", sep="")

cat(sprintf("%-20s %14.1f%% [%5.2f%%, %5.2f%%] (agnostic)\n", 
            "Justified", engaged_est[1], bounds_agnostic[1], bounds_agnostic[2]))
cat(sprintf("%-20s %14.1f%% [%5.2f%%, %5.2f%%] (conservative)\n", 
            "", engaged_est[1], bounds_conservative[1], bounds_conservative[2]))

cat(sprintf("\n%-20s %14.1f%% [%5.2f%%, %5.2f%%] (agnostic)\n", 
            "Support", engaged_est2[1], bounds_agnostic2[1], bounds_agnostic2[2]))
cat(sprintf("%-20s %14.1f%% [%5.2f%%, %5.2f%%] (conservative)\n", 
            "", engaged_est2[1], bounds_conservative2[1], bounds_conservative2[2]))

cat(sprintf("\n%-20s %14.1f%% [%5.2f%%, %5.2f%%] (agnostic)\n", 
            "Charged", engaged_est3[1], bounds_agnostic3[1], bounds_agnostic3[2]))
cat(sprintf("%-20s %14.1f%% [%5.2f%%, %5.2f%%] (high)\n", 
            "", engaged_est3[1], bounds_high3[1], bounds_high3[2]))

## Interpretation

**Key Results:**

1. **Justified**: Even under agnostic assumptions (disengaged could answer 0-100%), 
   the upper bound on true support is well below 20%.

2. **Support**: Similar pattern - conservative bounds suggest ~1-7% true support.

3. **Charged**: Most people (engaged or not) would say the suspect should be charged.

**The Bottom Line:**

Prior research reported ~20% support for political violence. These partial ID bounds
show that even under conservative assumptions about disengaged respondents' true preferences,
actual support is likely **only 1-7%**.

The 3-8x inflation observed in the main analysis is confirmed by these bounds -
you cannot get to 20% true support under any reasonable assumption about
what disengaged respondents would say if they were truly engaged.