# Westwood et al. (2022) Replication - Part 2: Core Analysis

This notebook replicates the **KEY FINDING** of the paper:

> Disengaged survey respondents report 3-8x higher support for
> political violence than engaged respondents.

**Learning Objectives:**
1. Understand survey "satisficing" (respondents who don't engage)
2. Learn how engagement checks work
3. Calculate group means with confidence intervals
4. Understand why this matters for survey research

**The Core Insight:**
Prior research (Kalmoe & Mason, 2019) reported ~20% of Americans
support political violence. Westwood et al. show this is inflated
because many survey respondents don't actually read the questions.

## Step 1: Setup and Load Data

In [None]:
!pip install -q gdown

import pandas as pd
import numpy as np
from scipy import stats
import gdown

# Load Study 1 data
url = "https://drive.google.com/uc?id=1gKIY11FaM5RmhhXTKx3wVcwGkMoTyTUM"
gdown.download(url, "/tmp/study14.csv", quiet=True)
df = pd.read_csv("/tmp/study14.csv")
print(f"Loaded {len(df):,} rows")

## Step 2: Filter to Study 1 (Vignette Experiment)

The data file contains both Study 1 and Study 4.
- Study 1: Vignette about car-ramming attack (experiment=1)
- Study 4: Sentencing task (experiment=2)

In [None]:
# Filter to Study 1 (vignette experiment)
study1 = df[df['experiment'] == 1].copy()
print(f"Study 1 sample size: n = {len(study1):,}")

## Step 3: Classify Engagement

This is the **KEY METHODOLOGICAL INNOVATION** of the paper.

The comprehension check asks: "In what state did this incident occur?"

- **Engaged**: Answered correctly (Florida or Oregon depending on vignette)
- **Disengaged**: Answered incorrectly (didn't read the vignette)

**Why this matters:** Disengaged respondents might say "violence is justified"
not because they actually believe it, but because they're clicking randomly.

In [None]:
# Create engagement indicator
study1['engaged'] = 'Disengaged'

# Vignette 1 (partisantreatment=1): Correct answer is "Florida"
mask1 = (study1['partisantreatment'] == 1) & (study1['Q43'] == 'Florida')
study1.loc[mask1, 'engaged'] = 'Engaged'

# Vignette 2 (partisantreatment=2): Correct answer is "Oregon"
mask2 = (study1['partisantreatment'] == 2) & (study1['Q49'] == 'Oregon')
study1.loc[mask2, 'engaged'] = 'Engaged'

# Print engagement breakdown
print("Engagement breakdown:")
print(study1['engaged'].value_counts())
print(f"\nEngagement rate: {(study1['engaged'] == 'Engaged').mean():.1%}")

## Step 4: Recode Outcome Variable

The main outcome is: **"Was the driver's action justified?"**
- 1 = Justified
- 0 = Unjustified

In [None]:
# Recode "justified" to binary
# Q45 is for vignette 1, Q51 is for vignette 2
study1['justified'] = np.nan

mask1 = study1['partisantreatment'] == 1
study1.loc[mask1, 'justified'] = (study1.loc[mask1, 'Q45'] == 'Justified').astype(float)

mask2 = study1['partisantreatment'] == 2
study1.loc[mask2, 'justified'] = (study1.loc[mask2, 'Q51'] == 'Justified').astype(float)

print(f"Outcome variable 'justified' created")
print(f"Overall proportion saying justified: {study1['justified'].mean():.1%}")

## Step 5: Calculate the KEY RESULT

Compare proportion saying "justified" between engaged and disengaged respondents.

In [None]:
def calculate_stats(df, outcome, group):
    """Calculate mean, SE, and 95% CI for each group."""
    results = []
    
    for group_name, group_df in df.groupby(group):
        y = group_df[outcome].dropna()
        n = len(y)
        if n == 0:
            continue
        
        mean = y.mean()
        se = y.std(ddof=1) / np.sqrt(n)
        t_crit = stats.t.ppf(0.975, df=n-1)
        ci_lower = mean - t_crit * se
        ci_upper = mean + t_crit * se
        
        results.append({
            'group': group_name,
            'mean': mean,
            'se': se,
            'ci_lower': ci_lower,
            'ci_upper': ci_upper,
            'n': n
        })
    
    return pd.DataFrame(results)

# Calculate statistics
results = calculate_stats(study1, 'justified', 'engaged')
print(results.to_string(index=False))

## Step 6: Display the Key Finding

In [None]:
print("="*60)
print("KEY RESULT: Support for Violence by Engagement Status")
print("="*60)

for _, row in results.iterrows():
    print(f"\n{row['group']}:")
    print(f"  Proportion saying violence justified: {row['mean']:.1%}")
    print(f"  95% CI: [{row['ci_lower']:.1%}, {row['ci_upper']:.1%}]")
    print(f"  Sample size: n = {row['n']:,}")

# Calculate ratio
engaged = results[results['group'] == 'Engaged']['mean'].values[0]
disengaged = results[results['group'] == 'Disengaged']['mean'].values[0]
ratio = disengaged / engaged if engaged > 0 else float('inf')

print(f"\n{'='*60}")
print(f"INFLATION RATIO: {ratio:.1f}x")
print(f"{'='*60}")
print(f"\nDisengaged respondents are {ratio:.1f}x more likely to say")
print(f"violence is justified compared to engaged respondents.")

## Interpretation

This result shows that survey "satisficing" - respondents who don't
engage with survey content - dramatically inflates estimates of
support for political violence.

Prior research (Kalmoe & Mason, 2019) reported that ~20% of Americans
support political violence. But when we separate engaged from disengaged
respondents, we see that:

- **ENGAGED respondents:** ~10-12% say violence is justified
- **DISENGAGED respondents:** ~35-40% say violence is justified

The disengaged responses inflate the overall estimate. When we focus
only on engaged respondents (who actually read the survey), support
for violence is much lower than previously reported.

**Implications for survey methodology:**
1. Always include engagement/comprehension checks
2. Report results separately for engaged vs. disengaged
3. Be skeptical of surveys that don't address this issue

**Next:** Run notebook 03 for partial identification bounds analysis