# Westwood et al. (2022) Replication - Part 4: OLS & Correlates Inflation

## What This Notebook Does

This notebook uses **OLS regression** to show how disengaged survey respondents inflate the apparent relationship between personality traits (aggression) and support for political violence.

---

## Key Concepts

### What is a "vignette"?
A **vignette** is a short story or scenario presented to survey respondents. Instead of asking abstract questions like "Do you support political violence?", the researchers describe a concrete situation:

> *"A man drove his car into a group of protesters in Iowa, injuring several people. The driver was a [Democrat/Republican]. Do you think his actions were justified?"*

This forces respondents to think about a specific scenario rather than give a knee-jerk response.

### Two Ways to Measure Violence Support

| Measure | Question Type | Example |
|---------|--------------|---------|
| **Our Measure** (Vignette) | Concrete scenario | "Was the driver's action justified?" |
| **Kalmoe-Mason Measure** | Abstract question | "How much do you feel violence is justified to advance your party's goals?" (1-5 scale) |

### What is the Buss-Perry Aggression Scale?
A validated 12-item psychological scale measuring trait aggression. Examples:
- "I have trouble controlling my temper"
- "Given enough provocation, I may hit another person"

Higher scores = more aggressive personality.

### What is "correlates inflation"?
Prior research found that aggressive people support political violence more. But Westwood et al. show this correlation is **inflated by disengaged respondents** who click randomly. When you include only engaged respondents, the aggression-violence relationship is much weaker.

---

## The Analysis

We run 4 OLS regressions predicting violence support from aggression:

| Model | Outcome | Sample |
|-------|---------|--------|
| 1 | Our measure (vignette) | Engaged only |
| 2 | Our measure (vignette) | All respondents |
| 3 | Kalmoe-Mason (abstract) | Engaged only |
| 4 | Kalmoe-Mason (abstract) | All respondents |

**Key finding:** Including disengaged respondents inflates the aggression coefficient by 2-4x.

## Step 1: Setup

In [None]:
!pip install -q gdown statsmodels

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from scipy import stats
import gdown

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

ORANGE = '#D55E00'  # Disengaged / All
BLUE = '#0072B2'    # Engaged

## Step 2: Load and Preprocess Data

In [None]:
# Download Study 2 data (used for correlates analysis)
gdown.download("https://drive.google.com/uc?id=1VfZM3hSDzIIIVp2AUGC-RwOy-Fk2t_Fm", 
               "/tmp/study25.csv", quiet=True)
data = pd.read_csv("/tmp/study25.csv")
print(f"Loaded: {len(data):,} rows")

In [None]:
def preprocess_study2_full(df):
    """Preprocess Study 2 with all variables needed for correlates analysis."""
    # Filter by gc and experiment
    study = df[(df['gc'] == 1) & (df['experiment'] == 1)].copy()
    
    # Party ID
    study['pid'] = study['Q10']
    study.loc[study['Q11'] == 'Democratic Party', 'pid'] = 'Democrat'
    study.loc[study['Q11'] == 'Republican Party', 'pid'] = 'Republican'
    
    # Engagement check
    study['engaged'] = 'Disengaged'
    study.loc[study['Q43'] == 'Iowa', 'engaged'] = 'Engaged'
    study['passed01'] = (study['engaged'] == 'Engaged').astype(int)
    
    # Support actions (1-5 scale)
    support_map = {'Strongly support': 5, 'Support': 4, 'Neither support nor oppose': 3,
                   'Oppose': 2, 'Strongly oppose': 1}
    study['supportactions'] = study['Q44'].map(support_map)
    
    # Rescale support to 0-1
    study['supportactions01'] = (study['supportactions'] - 1) / 4
    
    # Alignment
    study['alignment'] = np.nan
    study.loc[(study['version'] == 1) & (study['pid'] == 'Democrat'), 'alignment'] = 'Out-Party Shooter'
    study.loc[(study['version'] == 1) & (study['pid'] == 'Republican'), 'alignment'] = 'In-Party Shooter'
    study.loc[(study['version'] == 2) & (study['pid'] == 'Democrat'), 'alignment'] = 'In-Party Shooter'
    study.loc[(study['version'] == 2) & (study['pid'] == 'Republican'), 'alignment'] = 'Out-Party Shooter'
    study.loc[study['version'] == 3, 'alignment'] = 'Apolitical Shooter'
    
    # Buss-Perry Aggression Scale (Q63-Q73, Q75)
    # Recode from text to numeric
    bp_map = {'1- Very unlike me': 1, '2': 2, '3': 3, '4': 4, '5- Very like me': 5}
    bp_cols = ['Q63', 'Q64', 'Q65', 'Q66', 'Q67', 'Q68', 'Q69', 'Q70', 'Q71', 'Q72', 'Q73', 'Q75']
    
    for col in bp_cols:
        study[col + '_num'] = study[col].map(bp_map)
    
    # Average Buss-Perry score
    bp_num_cols = [c + '_num' for c in bp_cols]
    study['bussperry'] = study[bp_num_cols].mean(axis=1)
    
    # Rescale to 0-1
    study['bussperryc01'] = (study['bussperry'] - 1) / 4
    
    # Kalmoe-Mason measure (Q77: "How much do you feel violence is justified...")
    km_map = {'1 - Not at all': 1, '2': 2, '3': 3, '4': 4, '5 - A great deal': 5}
    study['Q77_num'] = study['Q77'].map(km_map)
    
    # Binary K-M measure (any support = 1)
    study['km'] = (study['Q77_num'] > 1).astype(int)
    
    return study

study2 = preprocess_study2_full(data)
print(f"Study 2 preprocessed: n = {len(study2):,}")
print(f"Engagement: {study2['engaged'].value_counts().to_dict()}")

## Step 3: OLS Regressions (Figure 7)

Four regressions comparing:
1. **Our measure** (vignette-based support) vs **Kalmoe-Mason measure** (abstract survey question)
2. **Engaged only** vs **All respondents**

From `figure7.R` lines 1-4.

In [None]:
# Filter to In-Party Shooter condition (most theoretically relevant)
inparty = study2[study2['alignment'] == 'In-Party Shooter'].copy()

print("OLS REGRESSIONS: Aggression -> Violence Support")
print("="*70)

# Model 1: Our measure, Engaged only (figure7.R line 1)
m1_data = inparty[inparty['engaged'] == 'Engaged'].dropna(subset=['supportactions01', 'bussperryc01'])
m1 = smf.ols('supportactions01 ~ bussperryc01', data=m1_data).fit()
print(f"\nModel 1: Our Measure, Engaged Only (n={len(m1_data)})")
print(f"  Coefficient: {m1.params['bussperryc01']:.3f}")
print(f"  95% CI: [{m1.conf_int().loc['bussperryc01', 0]:.3f}, {m1.conf_int().loc['bussperryc01', 1]:.3f}]")
print(f"  p-value: {m1.pvalues['bussperryc01']:.4f}")

# Model 3: Our measure, All respondents (figure7.R line 2)
m3_data = inparty.dropna(subset=['supportactions01', 'bussperryc01'])
m3 = smf.ols('supportactions01 ~ bussperryc01', data=m3_data).fit()
print(f"\nModel 3: Our Measure, All Respondents (n={len(m3_data)})")
print(f"  Coefficient: {m3.params['bussperryc01']:.3f}")
print(f"  95% CI: [{m3.conf_int().loc['bussperryc01', 0]:.3f}, {m3.conf_int().loc['bussperryc01', 1]:.3f}]")
print(f"  p-value: {m3.pvalues['bussperryc01']:.4f}")

# Model 4: K-M measure, Engaged only (figure7.R line 3)
m4_data = study2[study2['engaged'] == 'Engaged'].dropna(subset=['km', 'bussperryc01'])
m4 = smf.ols('km ~ bussperryc01', data=m4_data).fit()
print(f"\nModel 4: Kalmoe-Mason Measure, Engaged Only (n={len(m4_data)})")
print(f"  Coefficient: {m4.params['bussperryc01']:.3f}")
print(f"  95% CI: [{m4.conf_int().loc['bussperryc01', 0]:.3f}, {m4.conf_int().loc['bussperryc01', 1]:.3f}]")
print(f"  p-value: {m4.pvalues['bussperryc01']:.4f}")

# Model 6: K-M measure, All respondents (figure7.R line 4)
m6_data = study2.dropna(subset=['km', 'bussperryc01'])
m6 = smf.ols('km ~ bussperryc01', data=m6_data).fit()
print(f"\nModel 6: Kalmoe-Mason Measure, All Respondents (n={len(m6_data)})")
print(f"  Coefficient: {m6.params['bussperryc01']:.3f}")
print(f"  95% CI: [{m6.conf_int().loc['bussperryc01', 0]:.3f}, {m6.conf_int().loc['bussperryc01', 1]:.3f}]")
print(f"  p-value: {m6.pvalues['bussperryc01']:.4f}")

## Step 4: Summary Table

In [None]:
# Create summary table
results = pd.DataFrame({
    'DV': ['Our Measure', 'Our Measure', 'Kalmoe-Mason', 'Kalmoe-Mason'],
    'Sample': ['Engaged Only', 'All Respondents', 'Engaged Only', 'All Respondents'],
    'Coefficient': [m1.params['bussperryc01'], m3.params['bussperryc01'],
                    m4.params['bussperryc01'], m6.params['bussperryc01']],
    'CI_lower': [m1.conf_int().loc['bussperryc01', 0], m3.conf_int().loc['bussperryc01', 0],
                 m4.conf_int().loc['bussperryc01', 0], m6.conf_int().loc['bussperryc01', 0]],
    'CI_upper': [m1.conf_int().loc['bussperryc01', 1], m3.conf_int().loc['bussperryc01', 1],
                 m4.conf_int().loc['bussperryc01', 1], m6.conf_int().loc['bussperryc01', 1]],
    'p_value': [m1.pvalues['bussperryc01'], m3.pvalues['bussperryc01'],
                m4.pvalues['bussperryc01'], m6.pvalues['bussperryc01']]
})

print("\nSUMMARY: OLS Coefficients for Aggression -> Violence")
print("="*70)
display(results.round(3))

## Step 5: Reproduce FIGURE 7

In [None]:
fig, axes = plt.subplots(2, 1, figsize=(8, 6), sharex=True)

# Data for plotting
plot_data = [
    {'ax': axes[0], 'title': 'Our Measure (Vignette)',
     'engaged': (m1.params['bussperryc01'], m1.conf_int().loc['bussperryc01', 0], m1.conf_int().loc['bussperryc01', 1]),
     'all': (m3.params['bussperryc01'], m3.conf_int().loc['bussperryc01', 0], m3.conf_int().loc['bussperryc01', 1])},
    {'ax': axes[1], 'title': "Kalmoe-Mason's Measure (Abstract)",
     'engaged': (m4.params['bussperryc01'], m4.conf_int().loc['bussperryc01', 0], m4.conf_int().loc['bussperryc01', 1]),
     'all': (m6.params['bussperryc01'], m6.conf_int().loc['bussperryc01', 0], m6.conf_int().loc['bussperryc01', 1])}
]

for d in plot_data:
    ax = d['ax']
    
    # Engaged respondents (blue)
    ax.errorbar(d['engaged'][0], 1.15, 
                xerr=[[d['engaged'][0] - d['engaged'][1]], [d['engaged'][2] - d['engaged'][0]]],
                fmt='o', markersize=12, color=BLUE, capsize=5,
                markerfacecolor='white', markeredgewidth=2, label='Engaged Respondents')
    ax.annotate(f"{d['engaged'][0]:.2f}", (d['engaged'][0], 1.15), 
                xytext=(0, 15), textcoords='offset points', ha='center', color=BLUE, fontsize=11)
    
    # All respondents (orange)
    ax.errorbar(d['all'][0], 0.85,
                xerr=[[d['all'][0] - d['all'][1]], [d['all'][2] - d['all'][0]]],
                fmt='s', markersize=12, color=ORANGE, capsize=5,
                markerfacecolor='white', markeredgewidth=2, label='All Respondents')
    ax.annotate(f"{d['all'][0]:.2f}", (d['all'][0], 0.85),
                xytext=(0, -20), textcoords='offset points', ha='center', color=ORANGE, fontsize=11)
    
    ax.set_yticks([0.85, 1.15])
    ax.set_yticklabels(['All Respondents', 'Engaged Respondents'])
    ax.set_title(d['title'], fontweight='bold', fontsize=12)
    ax.set_xlim(-0.1, 1.0)
    ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

axes[1].set_xlabel('Estimated Relationship Between Aggression and Violence (95% CI)', fontsize=11)
fig.suptitle('Figure 7: Disengaged Respondents Inflate Correlates of Violence', 
             fontsize=13, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

## Step 6: Scatterplots with Regression Lines

Each plot shows **two separate OLS regressions**:
- **Blue line**: Regression fit to **Engaged respondents only**
- **Orange line**: Regression fit to **Disengaged respondents only**

The steeper orange slope shows that disengaged respondents have a stronger (but spurious) aggression-violence relationship.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Left: Our measure (In-Party Shooter only)
ax = axes[0]
for engaged_status, color, label in [('Engaged', BLUE, 'Engaged'), ('Disengaged', ORANGE, 'Disengaged')]:
    subset = inparty[inparty['engaged'] == engaged_status].dropna(subset=['supportactions01', 'bussperryc01'])
    ax.scatter(subset['bussperryc01'], subset['supportactions01'], 
               alpha=0.3, color=color, label=label, s=30)
    # Regression line
    x = np.linspace(0, 1, 100)
    model = smf.ols('supportactions01 ~ bussperryc01', data=subset).fit()
    ax.plot(x, model.params['Intercept'] + model.params['bussperryc01'] * x, 
            color=color, linewidth=2)

ax.set_xlabel('Buss-Perry Aggression (0-1)')
ax.set_ylabel('Support for Violence (0-1)')
ax.set_title('Our Measure (In-Party Shooter)', fontweight='bold')
ax.legend()
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)

# Right: Kalmoe-Mason measure (all conditions)
ax = axes[1]
for engaged_status, color, label in [('Engaged', BLUE, 'Engaged'), ('Disengaged', ORANGE, 'Disengaged')]:
    subset = study2[study2['engaged'] == engaged_status].dropna(subset=['km', 'bussperryc01'])
    # Jitter for binary outcome
    jitter = np.random.normal(0, 0.02, len(subset))
    ax.scatter(subset['bussperryc01'], subset['km'] + jitter,
               alpha=0.3, color=color, label=label, s=30)
    # Regression line
    x = np.linspace(0, 1, 100)
    model = smf.ols('km ~ bussperryc01', data=subset).fit()
    ax.plot(x, model.params['Intercept'] + model.params['bussperryc01'] * x,
            color=color, linewidth=2)

ax.set_xlabel('Buss-Perry Aggression (0-1)')
ax.set_ylabel('Kalmoe-Mason Violence Support (0/1)')
ax.set_title("Kalmoe-Mason's Measure (All Conditions)", fontweight='bold')
ax.legend()
ax.set_xlim(0, 1)
ax.set_ylim(-0.1, 1.1)

plt.suptitle('Scatterplots: Aggression vs Violence Support by Engagement', 
             fontsize=13, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

## Step 7: Inflation Ratio

In [None]:
print("CORRELATES INFLATION")
print("="*50)

# Our measure
our_engaged = m1.params['bussperryc01']
our_all = m3.params['bussperryc01']
our_ratio = our_all / our_engaged if our_engaged != 0 else float('inf')

print(f"\nOur Measure (Vignette):")
print(f"  Engaged only: {our_engaged:.3f}")
print(f"  All respondents: {our_all:.3f}")
print(f"  Inflation ratio: {our_ratio:.1f}x")

# K-M measure
km_engaged = m4.params['bussperryc01']
km_all = m6.params['bussperryc01']
km_ratio = km_all / km_engaged if km_engaged != 0 else float('inf')

print(f"\nKalmoe-Mason Measure (Abstract):")
print(f"  Engaged only: {km_engaged:.3f}")
print(f"  All respondents: {km_all:.3f}")
print(f"  Inflation ratio: {km_ratio:.1f}x")

## Step 8: Full Regression Tables

In [None]:
print("FULL REGRESSION OUTPUT: Model 1 (Our Measure, Engaged Only)")
print("="*70)
print(m1.summary())

In [None]:
print("FULL REGRESSION OUTPUT: Model 6 (K-M Measure, All Respondents)")
print("="*70)
print(m6.summary())

## Interpretation

**Figure 7 shows correlates inflation:**

The relationship between aggression (Buss-Perry scale) and violence support appears **much stronger** when you include disengaged respondents.

- **Our measure (vignette)**: Including disengaged inflates the coefficient by ~2-3x
- **Kalmoe-Mason measure (abstract)**: Including disengaged inflates the coefficient by ~4x

**Why this matters:**

Prior research found strong correlations between personality traits (like aggression) and support for political violence. But these correlations are partly artifacts of disengaged respondents clicking randomly - their random responses create spurious correlations.

When you look only at engaged respondents, the aggression-violence relationship is much weaker (though still positive).