# Westwood et al. (2022) Replication in R - Part 1: Data & Figure 1

**IMPORTANT:** Change runtime to R: Runtime -> Change runtime type -> R

This notebook replicates the paper using **the original R code** as closely as possible.

Paper: Westwood et al. (2022) "Current research overstates American support for political violence." *PNAS*

## Step 1: Install and Load Packages

In [None]:
# Install required packages
install.packages(c("dplyr", "readr", "ggplot2", "cowplot", "forcats"), quiet=TRUE)

# Load packages (matching original code)
suppressPackageStartupMessages({
  library(dplyr)
  library(readr)
  library(ggplot2)
  library(cowplot)
  library(forcats)
})

cat("Packages loaded successfully\n")

## Step 2: Download Data from Google Drive

Data files hosted on Google Drive (originally from Harvard Dataverse).

In [None]:
# Google Drive file IDs
file_ids <- list(
  study14 = "1gKIY11FaM5RmhhXTKx3wVcwGkMoTyTUM",
  study25 = "1VfZM3hSDzIIIVp2AUGC-RwOy-Fk2t_Fm",
  study3 = "1OYlDc-TgzqNa9iFRgcUa1XRLH-uQHomO",
  priorestimates = "1__z-IhvnRPgRqkyfG7rZlss7cIcR_kyn"
)

# Function to download from Google Drive
download_gdrive <- function(file_id, destfile) {
  url <- paste0("https://drive.google.com/uc?export=download&id=", file_id)
  download.file(url, destfile, quiet = TRUE, mode = "wb")
}

# Download all files
for (name in names(file_ids)) {
  cat(paste0("Downloading ", name, "...\n"))
  download_gdrive(file_ids[[name]], paste0("/tmp/", name, ".csv"))
}

cat("\nAll files downloaded!\n")

## Step 3: Load Data

In [None]:
# Load datasets (matching original preprocess scripts)
study14_raw <- read_csv("/tmp/study14.csv", show_col_types = FALSE)
study25_raw <- read_csv("/tmp/study25.csv", show_col_types = FALSE)
study3_raw <- read_csv("/tmp/study3.csv", show_col_types = FALSE)

# Kalmoe-Mason (2019) derived estimates from media coverage
# These are the percentages that journalists reported based on prior survey research
K_M_estimates <- read_csv("/tmp/priorestimates.csv", show_col_types = FALSE)

cat("Study 1/4 raw: n =", nrow(study14_raw), "\n")
cat("Study 2/5 raw: n =", nrow(study25_raw), "\n")
cat("Study 3 raw: n =", nrow(study3_raw), "\n")
cat("Kalmoe-Mason estimates: n =", nrow(K_M_estimates), "\n")

## Step 4: Reproduce FIGURE 1

From **figure1.R** in the original code.

This shows the distribution of **Kalmoe-Mason (2019)** derived percentages
from media coverage. These are the "1 in 5 Americans support violence"
claims that appeared in news articles.

The vertical lines show Westwood et al.'s estimates:
- **Orange**: Disengaged respondents
- **Blue**: Engaged respondents

In [None]:
# FIGURE 1 - Histogram of prior estimates
# Adapted from figure1.R lines 22-36

# Panel A: All partisans
all <- ggplot(data=K_M_estimates[!is.na(K_M_estimates$PartisansSupport),], aes(PartisansSupport)) + 
  geom_histogram(bins=30, aes(y=after_stat(density))) + 
  scale_x_continuous(breaks=seq(0,50, by=10)) +
  theme_bw() +
  theme(axis.text.y = element_text(size = 10),
        axis.text.x = element_text(size = 10),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
  # Orange line: disengaged estimate (~12.6%)
  geom_vline(xintercept=c(12.6), linetype="solid", linewidth=2, color="#D55E00") + 
  # Blue line: engaged estimate (~2.2%)
  geom_vline(xintercept=c(2.2), linetype="solid", linewidth=2, color="#0072B2") + 
  xlab("Percent of Americans Supporting Violence") +
  ylab("Density") + 
  coord_cartesian(xlim = c(0, 50), ylim = c(0,.3))

# Panel B: Republicans
rep <- ggplot(data=K_M_estimates[!is.na(K_M_estimates$RepublicanSupport),], aes(RepublicanSupport)) + 
  geom_histogram(bins=30, aes(y=after_stat(density))) + 
  scale_x_continuous(breaks=seq(0,50, by=10)) +
  theme_bw() +
  theme(axis.text.y = element_text(size = 10),
        axis.text.x = element_text(size = 10),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
  geom_vline(xintercept=c(13.2), linetype="solid", linewidth=2, color="#D55E00") + 
  geom_vline(xintercept=c(1.1), linetype="solid", linewidth=2, color="#0072B2") + 
  xlab("Percent of Republicans Supporting Violence") +
  ylab("Density") + 
  coord_cartesian(xlim = c(0, 50), ylim = c(0,.3))

# Panel C: Democrats  
dem <- ggplot(data=K_M_estimates[!is.na(K_M_estimates$DemocratSupport),], aes(DemocratSupport)) + 
  geom_histogram(bins=30, aes(y=after_stat(density))) + 
  scale_x_continuous(breaks=seq(0,50, by=10)) +
  theme_bw() +
  theme(axis.text.y = element_text(size = 10),
        axis.text.x = element_text(size = 10),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
  geom_vline(xintercept=c(12.2), linetype="solid", linewidth=2, color="#D55E00") + 
  geom_vline(xintercept=c(3.2), linetype="solid", linewidth=2, color="#0072B2") + 
  xlab("Percent of Democrats Supporting Violence") +
  ylab("Density") + 
  coord_cartesian(xlim = c(0, 50), ylim = c(0,.3))

# Combine panels (from figure1.R line 88)
title <- ggdraw() + 
  draw_label("Distribution of Kalmoe-Mason (2019) Derived\nPercentages of Support for Political Violence", 
             fontface='bold')

hist <- plot_grid(title, all, rep, dem, 
                  label_size = 12, ncol=1,
                  rel_heights=c(.4,1,1,1), 
                  labels = c("","A","B","C"))

print(hist)

## Step 5: Summary Statistics

In [None]:
cat("\n=== FIGURE 1 SUMMARY ===")
cat("\n\nKalmoe-Mason (2019) derived estimates from media coverage:\n")

cat("\nAll Partisans:")
cat("\n  Mean:", round(mean(K_M_estimates$PartisansSupport, na.rm=TRUE), 1), "%")
cat("\n  Median:", round(median(K_M_estimates$PartisansSupport, na.rm=TRUE), 1), "%")

cat("\n\nKey insight:")
cat("\n  Grey histogram = what media reported (~15-25%)")
cat("\n  Orange line (disengaged): ~12%")
cat("\n  Blue line (engaged): ~2%")
cat("\n  Both lines are LEFT of the grey mass - showing inflation\n")

## Interpretation

Figure 1 shows what percentages were reported in media based on
Kalmoe-Mason (2019) survey research. Most cluster around 15-25%.

The vertical lines show Westwood et al.'s estimates:
- **Orange (#D55E00)**: Disengaged respondents (~12%)
- **Blue (#0072B2)**: Engaged respondents (~2%)

Both lines are **left of the grey histogram** - meaning even the
disengaged give lower estimates than media reported, and the
engaged give much lower estimates still.

**Next:** Run notebook 02 for the core analysis (Figure 2)