# LLMs for Synthetic Data II: Interactive Experiments and Persuasion

**Learning objectives:**
- Design and implement interactive persuasion experiments with LLMs
- Generate personalized persuasive messages tailored to demographics
- Implement pre-test/post-test experimental designs
- Conduct multi-turn conversations with LLM respondents
- Measure persuasion effects and validate against human data

**How to run this notebook:**
- **Google Colab** (recommended): Works for all parts
- **OpenAI API key needed**: For generating messages and responses
- **Cost**: Approximately  for full notebook

---

## What are Interactive Persuasion Experiments?

Building on silicon sampling, **interactive persuasion experiments** use LLMs to:
1. Generate personalized persuasive messages
2. Test those messages on synthetic respondents
3. Engage in multi-turn conversations to change attitudes
4. Measure attitude change over time

**Key innovations from recent research:**
- **Argyle et al. (2025)**: Testing microtargeting and elaboration theories
- **Velez & Liu (2025)**: Personalizing both treatments and outcomes

**Potential applications:**
- Rapid testing of persuasive message variants
- Understanding mechanisms of attitude change
- Prototyping interventions before expensive field work
- Exploring counterfactual scenarios

**Key challenges:**
- **External validity**: Do LLM responses predict human behavior?
- **Effect size accuracy**: Are magnitudes realistic?
- **Mechanism validity**: Do LLMs change "beliefs" for the right reasons?

---

## Setup

In [None]:
# Install packages
!pip install -q openai pandas numpy scipy scikit-learn matplotlib seaborn

In [None]:
import os
import json
import getpass
import time
from datetime import datetime
from collections import Counter

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from openai import OpenAI

# Set API key
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

client = OpenAI()

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ Setup complete!")

---

## Part 1: Pre-test/Post-test Experimental Design

The basic experimental setup:
1. **Pre-test**: Measure initial attitude
2. **Treatment**: Expose to persuasive message
3. **Post-test**: Measure attitude again
4. **Effect**: Post - Pre = attitude change

This is the foundation for testing persuasion effects.

In [None]:
def create_persona(demographics):
    """
    Create persona string for experiment
    
    Args:
        demographics: dict with demographic attributes
    
    Returns:
        str: Formatted persona prompt
    """
    persona = f"""You are a {demographics['age']}-year-old {demographics['race']} {demographics['gender']} 
from {demographics['region']} with {demographics['education']} education and an income of ${demographics['income']}."""
    
    if 'party' in demographics:
        persona += f" You identify as a {demographics['party']}."
    
    return persona


def measure_attitude(demographics, topic, model="gpt-4o-mini", temperature=1.0):
    """
    Measure attitude on a topic using 1-7 Likert scale
    
    Args:
        demographics: dict of demographic attributes
        topic: string describing the policy/issue
        model: which OpenAI model to use
        temperature: sampling temperature
    
    Returns:
        int or None: attitude score 1-7, or None if parsing fails
    """
    persona = create_persona(demographics)
    
    prompt = f"""On a scale of 1 to 7, how much do you support the following:

{topic}

Where:
1 = Strongly Oppose
4 = Neutral
7 = Strongly Support

Please respond with ONLY a number from 1 to 7."""
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": persona},
                {"role": "user", "content": prompt}
            ],
            temperature=temperature
        )
        
        # Extract number
        import re
        content = response.choices[0].message.content.strip()
        numbers = re.findall(r'\d+', content)
        
        if numbers:
            value = int(numbers[0])
            if 1 <= value <= 7:
                return value
        
        return None
        
    except Exception as e:
        print(f"Error: {e}")
        return None


# Example: measure attitudes before any treatment
test_persona = {
    'age': 35,
    'race': 'white',
    'gender': 'female',
    'education': 'college degree',
    'income': '65,000',
    'region': 'Northeast',
    'party': 'Independent'
}

topic = "Increasing government funding for renewable energy research"

print(f"Measuring baseline attitude...\n")
print(f"Persona: {test_persona['age']}yo {test_persona['party']} {test_persona['gender']}")
print(f"Topic: {topic}\n")

baseline = measure_attitude(test_persona, topic)
print(f"Baseline attitude: {baseline}/7")

**What this code does:**

Implements the **baseline measurement** for a pre-test/post-test experiment:

**Key design choices:**
- **7-point scale** (vs 5-point): More granularity for detecting change
- **Labeled midpoint** (4 = Neutral): Clearer interpretation
- **Temperature = 1.0**: Default (allows some variation)
- **System message**: Persona context
- **User message**: Attitude question

**Why this approach:**
- Matches standard survey methodology
- 7-point scales common in persuasion research (more sensitive)
- Separates persona from question (cleaner prompt structure)

**Cost:** ~ per measurement with GPT-4o-mini

---

## Part 2: Generating Personalized Persuasive Messages

Following **Argyle et al. (2025)**, we'll test two approaches:
1. **Generic message**: Same message for everyone
2. **Microtargeted message**: Tailored to demographics

In [None]:
def generate_generic_message(topic, position, model="gpt-4o", temperature=0.8):
    """
    Generate generic persuasive message (NOT personalized)
    
    Args:
        topic: string describing the policy/issue
        position: 'support' or 'oppose'
        model: which OpenAI model to use
        temperature: creativity level
    
    Returns:
        str: persuasive message
    """
    prompt = f"""Write a brief persuasive message (2-3 sentences) that argues people should {position} the following policy:

{topic}

The message should:
- Be concise and clear
- Use factual arguments
- Be respectful and non-manipulative
- Appeal to common values

Return only the message text, no preamble."""
    
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature
    )
    
    return response.choices[0].message.content.strip()


def generate_microtargeted_message(demographics, topic, position, model="gpt-4o", temperature=0.8):
    """
    Generate personalized persuasive message tailored to demographics
    
    Args:
        demographics: dict with demographic attributes
        topic: string describing the policy/issue
        position: 'support' or 'oppose'
        model: which OpenAI model to use
        temperature: creativity level
    
    Returns:
        str: personalized persuasive message
    """
    prompt = f"""Write a brief persuasive message (2-3 sentences) that argues people should {position} the following policy:

{topic}

Tailor the message to resonate with this specific audience:
- Age: {demographics['age']}
- Education: {demographics['education']}
- Region: {demographics['region']}
- Income: ${demographics['income']}

The message should:
- Use language and examples appropriate for this demographic
- Focus on values and concerns likely to matter to them
- Be concise, factual, and respectful
- Not be manipulative or deceptive

Return only the message text, no preamble."""
    
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature
    )
    
    return response.choices[0].message.content.strip()


# Generate both types of messages
topic = "Increasing government funding for renewable energy research"

print("GENERIC MESSAGE (same for everyone):")
print("=" * 70)
generic_msg = generate_generic_message(topic, "support")
print(generic_msg)

print("\n" + "=" * 70)
print("\nMICROTARGETED MESSAGE (tailored to 35yo college-educated Independent):")
print("=" * 70)
microtargeted_msg = generate_microtargeted_message(test_persona, topic, "support")
print(microtargeted_msg)

**What this code does:**

Implements **two message generation strategies** from Argyle et al. (2025):

**Generic messages:**
- Same for all respondents
- Appeal to universal values
- Cheaper to generate (one message per topic)
- Control condition in experiments

**Microtargeted messages:**
- Tailored to specific demographics
- Uses age, education, income, region to customize
- Tests **personalization hypothesis** (more effective?)
- More expensive (one per demographic group)

**Key findings from Argyle et al. (2025):**
- Both approaches produce measurable persuasion effects
- **BUT**: Microtargeting didn't significantly outperform generic messages
- Simple approaches may be just as effective as complex ones

**Ethical considerations:**
- Prompt explicitly asks for non-manipulative messages
- In real research, screen messages for harmful content
- Consider IRB requirements for persuasion experiments

**Temperature = 0.8:**
- Allows creativity in message generation
- Higher than annotation tasks (0.0-0.2)
- Still controlled enough for consistency

---

## Part 3: Testing Persuasive Effects

Now we implement the full **pre-test → treatment → post-test** workflow.

In [None]:
def test_persuasion_effect(demographics, topic, message, model="gpt-4o-mini", temperature=1.0):
    """
    Run complete persuasion experiment: pre-test -> message -> post-test
    
    Args:
        demographics: dict of demographic attributes
        topic: policy/issue being measured
        message: persuasive message to show
        model: which model to use for respondent
        temperature: sampling temperature
    
    Returns:
        dict: {'pre': int, 'post': int, 'effect': int, 'message': str}
    """
    # Step 1: Pre-test
    pre_attitude = measure_attitude(demographics, topic, model, temperature)
    time.sleep(0.5)  # Small delay between measurements
    
    # Step 2: Show message and measure post-test
    persona = create_persona(demographics)
    
    post_prompt = f"""Please carefully read this message:

\"{message}\"

Now, on a scale of 1 to 7, how much do you support the following:

{topic}

Where:
1 = Strongly Oppose
4 = Neutral
7 = Strongly Support

Please respond with ONLY a number from 1 to 7."""
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": persona},
                {"role": "user", "content": post_prompt}
            ],
            temperature=temperature
        )
        
        import re
        content = response.choices[0].message.content.strip()
        numbers = re.findall(r'\d+', content)
        
        post_attitude = int(numbers[0]) if numbers and 1 <= int(numbers[0]) <= 7 else None
        
    except Exception as e:
        print(f"Error in post-test: {e}")
        post_attitude = None
    
    # Step 3: Calculate effect
    if pre_attitude is not None and post_attitude is not None:
        effect = post_attitude - pre_attitude
    else:
        effect = None
    
    return {
        'pre': pre_attitude,
        'post': post_attitude,
        'effect': effect,
        'message': message
    }


# Test persuasion effect
print("Testing persuasion effect...\n")
print(f"Respondent: {test_persona['age']}yo {test_persona['party']} {test_persona['gender']}")
print(f"Topic: {topic}\n")
print("=" * 70)

result = test_persuasion_effect(test_persona, topic, microtargeted_msg)

print(f"\nMessage shown:")
print(f""{result['message']}"")
print(f"\nResults:")
print(f"  Pre-test attitude:  {result['pre']}/7")
print(f"  Post-test attitude: {result['post']}/7")
print(f"  Effect:             {result['effect']:+d} points")

if result['effect'] is not None:
    if result['effect'] > 0:
        print(f"  → Message increased support")
    elif result['effect'] < 0:
        print(f"  → Message decreased support (backfire effect)")
    else:
        print(f"  → No effect")

**What this code does:**

Implements the **complete persuasion experiment workflow**:

**Three-step process:**
1. **Pre-test**: Measure baseline attitude (no message exposure)
2. **Treatment**: Show persuasive message
3. **Post-test**: Measure attitude again

**Key design decisions:**
- **Separate API calls** for pre and post (more realistic)
- **Short delay** between calls (simulate time passing)
- **Same persona** for both measurements (within-subjects design)

**Measuring the effect:**
- **Effect = Post - Pre**
- Positive effect: Message increased support
- Negative effect: Backfire (decreased support)
- Zero effect: No change

**Typical findings:**
- Argyle et al. (2025): ~2.5-4 percentage point effects
- On 7-point scale: ~0.2-0.4 point effects typical
- Larger effects (>1 point) may indicate stereotyping

**Limitations of single observation:**
- Need multiple respondents for statistical power
- Need control group (no message) for comparison
- Next: Scale up to experimental design

---

## Part 4: Experimental Design with Multiple Conditions

Now we'll implement a full experiment comparing:
1. **Control**: No message
2. **Generic message**: Same for everyone
3. **Microtargeted message**: Personalized to demographics

In [None]:
def run_persuasion_experiment(demographics_list, topic, n_per_condition=5, model="gpt-4o-mini"):
    """
    Run full experiment with control and treatment conditions
    
    Args:
        demographics_list: list of demographic dicts
        topic: policy/issue to test
        n_per_condition: sample size per condition
        model: which model for respondents
    
    Returns:
        pd.DataFrame: results with all conditions
    """
    results = []
    
    # Generate messages once
    generic_msg = generate_generic_message(topic, "support")
    
    for i, demographics in enumerate(demographics_list[:n_per_condition]):
        print(f"\rProgress: {i+1}/{n_per_condition}", end="")
        
        # Generate microtargeted message for this person
        micro_msg = generate_microtargeted_message(demographics, topic, "support")
        
        # Condition 1: Control (no message - measure twice)
        pre_control = measure_attitude(demographics, topic, model)
        time.sleep(0.5)
        post_control = measure_attitude(demographics, topic, model)
        
        results.append({
            **demographics,
            'condition': 'control',
            'pre': pre_control,
            'post': post_control,
            'effect': post_control - pre_control if (pre_control and post_control) else None,
            'message': 'None'
        })
        
        time.sleep(0.5)
        
        # Condition 2: Generic message
        generic_result = test_persuasion_effect(demographics, topic, generic_msg, model)
        results.append({
            **demographics,
            'condition': 'generic',
            **generic_result
        })
        
        time.sleep(0.5)
        
        # Condition 3: Microtargeted message
        micro_result = test_persuasion_effect(demographics, topic, micro_msg, model)
        results.append({
            **demographics,
            'condition': 'microtargeted',
            **micro_result
        })
        
        time.sleep(1.0)
    
    print("\n✓ Experiment complete")
    return pd.DataFrame(results)


# Create sample of respondents
np.random.seed(42)

respondents = []
parties = ['Democrat', 'Republican', 'Independent']
ages = [25, 35, 45, 55, 65]
educations = ['high school', 'some college', 'college degree', 'graduate degree']
regions = ['Northeast', 'South', 'Midwest', 'West']

for i in range(5):  # 5 respondents
    respondents.append({
        'age': np.random.choice(ages),
        'race': np.random.choice(['white', 'Black', 'Hispanic']),
        'gender': np.random.choice(['male', 'female']),
        'education': np.random.choice(educations),
        'income': np.random.choice(['35,000', '55,000', '75,000', '95,000']),
        'region': np.random.choice(regions),
        'party': np.random.choice(parties)
    })

print(f"Running experiment with {len(respondents)} respondents...")
print(f"Topic: {topic}\n")

# Run experiment (this will take a few minutes and cost ~)
# Uncomment to run:
# df_experiment = run_persuasion_experiment(respondents, topic, n_per_condition=5)
# df_experiment.to_csv('persuasion_experiment_results.csv', index=False)

print("[Commented out to save API costs - uncomment to run]")
print("Expected output: 15 rows (5 respondents × 3 conditions)")
print("

**What this code does:**

Implements a **full between-subjects experimental design**:

**Three conditions:**
1. **Control**: No message exposure (test-retest)
   - Measures natural fluctuation in responses
   - Baseline for comparison
2. **Generic**: Same message for all
   - Tests basic persuasion effect
3. **Microtargeted**: Personalized message
   - Tests whether personalization adds value

**Why this design:**
- **Control group**: Essential for causal inference
  - LLM responses may naturally vary (need baseline)
  - Any "effect" in control = noise
- **Generic vs Microtargeted**: Tests personalization hypothesis
  - Argyle et al. (2025) found NO significant difference
  - Challenges common assumption about targeting

**Sample size considerations:**
- 5 respondents = demonstration only
- Real study: 30-50+ per condition
- Power analysis for ≥80% power to detect d=0.3

**Cost management:**
- Commented out by default
- 5 respondents × 3 conditions × 2 measurements = 30 API calls
- Plus message generation: ~ total

**Next steps:** Analyze results and calculate average treatment effects

### Analyzing experimental results

After running the experiment, we'd analyze the data:

In [None]:
# Simulated results for demonstration (replace with real df_experiment)
np.random.seed(42)

simulated_results = []
for condition, true_effect in [('control', 0.0), ('generic', 0.3), ('microtargeted', 0.35)]:
    for i in range(15):  # 15 per condition
        pre = np.random.randint(3, 6)  # Initial attitudes 3-5
        # Add effect + noise
        post = pre + np.random.normal(true_effect, 0.3)
        post = np.clip(post, 1, 7)
        post = int(np.round(post))
        
        simulated_results.append({
            'condition': condition,
            'pre': pre,
            'post': post,
            'effect': post - pre
        })

df_sim = pd.DataFrame(simulated_results)

print("Experimental Results Summary")
print("=" * 70)
print("\nAverage Treatment Effects (ATE):\n")

summary = df_sim.groupby('condition')['effect'].agg(['mean', 'std', 'count'])
summary.columns = ['Mean Effect', 'Std Dev', 'N']
summary['SE'] = summary['Std Dev'] / np.sqrt(summary['N'])
summary['95% CI Lower'] = summary['Mean Effect'] - 1.96 * summary['SE']
summary['95% CI Upper'] = summary['Mean Effect'] + 1.96 * summary['SE']

print(summary[['Mean Effect', 'Std Dev', 'N', '95% CI Lower', '95% CI Upper']].round(3))

# Statistical tests
print("\n" + "=" * 70)
print("\nStatistical Comparisons (t-tests):\n")

control_effects = df_sim[df_sim['condition'] == 'control']['effect']
generic_effects = df_sim[df_sim['condition'] == 'generic']['effect']
micro_effects = df_sim[df_sim['condition'] == 'microtargeted']['effect']

# Generic vs Control
t_stat, p_val = stats.ttest_ind(generic_effects, control_effects)
print(f"Generic vs Control:")
print(f"  t = {t_stat:.3f}, p = {p_val:.3f}")
print(f"  {'✓ Significant' if p_val < 0.05 else '✗ Not significant'} at α=0.05\n")

# Microtargeted vs Control
t_stat, p_val = stats.ttest_ind(micro_effects, control_effects)
print(f"Microtargeted vs Control:")
print(f"  t = {t_stat:.3f}, p = {p_val:.3f}")
print(f"  {'✓ Significant' if p_val < 0.05 else '✗ Not significant'} at α=0.05\n")

# Microtargeted vs Generic
t_stat, p_val = stats.ttest_ind(micro_effects, generic_effects)
print(f"Microtargeted vs Generic:")
print(f"  t = {t_stat:.3f}, p = {p_val:.3f}")
print(f"  {'✓ Significant' if p_val < 0.05 else '✗ Not significant'} at α=0.05")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Mean effects with error bars
ax = axes[0]
summary_plot = summary.reset_index()
ax.barh(summary_plot['condition'], summary_plot['Mean Effect'], 
        xerr=[summary_plot['Mean Effect'] - summary_plot['95% CI Lower'],
              summary_plot['95% CI Upper'] - summary_plot['Mean Effect']],
        capsize=5, alpha=0.7, color=['lightgray', 'steelblue', 'coral'])
ax.axvline(0, color='red', linestyle='--', alpha=0.5, label='No effect')
ax.set_xlabel('Mean Attitude Change (points on 7-point scale)')
ax.set_title('Average Treatment Effects\n(with 95% Confidence Intervals)')
ax.legend()
ax.grid(axis='x', alpha=0.3)

# Plot 2: Distribution of effects
ax = axes[1]
df_sim.boxplot(column='effect', by='condition', ax=ax)
ax.axhline(0, color='red', linestyle='--', alpha=0.5)
ax.set_xlabel('Condition')
ax.set_ylabel('Attitude Change (points)')
ax.set_title('Distribution of Effects by Condition')
plt.suptitle('')  # Remove default title

plt.tight_layout()
plt.show()

print("\n" + "=" * 70)
print("\nKey Insights:")
print("=" * 70)
print("\n• Control group: Shows natural fluctuation in responses")
print("• Generic message: Produces persuasion effect above control")
print("• Microtargeted: Effect similar to generic (not significantly different)")
print("\nConclusion: Matches Argyle et al. (2025) finding that simple")
print("persuasion works, but personalization doesn't add much value.")

**What this analysis shows:**

**Average Treatment Effects (ATE):**
- **Control**: ~0 points (baseline variation - no message shown)
- **Generic**: Small positive effect (simple message persuasion)
- **Microtargeted**: Small positive effect (similar to generic)

**Statistical significance:**
- Both treatments likely significant vs control
- Microtargeted vs generic: NOT significant
- **Key finding**: Personalization doesn't add value

**Why this matters:**
- Challenges assumption that microtargeting is necessary
- Simple generic messages can be just as effective
- Cost-benefit: Generic is much cheaper

**Real-world implications:**
- Political campaigns: May not need complex targeting
- Public health: Generic messages can work
- BUT: This is from simulations, needs human validation

**Limitations:**
- Small sample (n=15 per condition)
- Synthetic respondents (not real humans)
- Single topic (may vary by issue)
- Short-term effects only

---

## Part 5: Interactive Multi-Turn Conversations

Beyond one-shot messages, **Argyle et al. (2025)** tested multi-turn conversations:
- **Direct persuasion**: AI tries to convince respondent
- **Motivational interviewing**: AI uses reflective listening

Let's implement this approach.

In [None]:
def multi_turn_persuasion(demographics, topic, strategy="direct", n_turns=3, model="gpt-4o-mini"):
    """
    Conduct multi-turn persuasion conversation
    
    Args:
        demographics: dict of demographic attributes
        topic: policy/issue for discussion
        strategy: 'direct' (persuasive) or 'motivational' (reflective)
        n_turns: number of conversation rounds
        model: which model to use
    
    Returns:
        dict: conversation history and attitude change
    """
    persona = create_persona(demographics)
    
    # Measure pre-conversation attitude
    pre_attitude = measure_attitude(demographics, topic, model)
    
    # Set up conversation system prompts
    if strategy == "direct":
        persuader_prompt = f"""You are having a conversation to persuade someone to support: {topic}

Your goal: Convince them with clear, factual arguments. Be direct but respectful.
Keep responses brief (2-3 sentences)."""
    else:  # motivational interviewing
        persuader_prompt = f"""You are using motivational interviewing techniques to discuss: {topic}

Your approach: Ask open-ended questions, reflect their concerns, explore ambivalence.
Be empathetic and non-directive. Keep responses brief (2-3 sentences)."""
    
    # Conversation history
    conversation = []
    respondent_history = []
    
    # Initial question from persuader
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": persuader_prompt},
            {"role": "user", "content": f"Start the conversation about {topic}"}
        ],
        temperature=0.8
    )
    
    persuader_msg = response.choices[0].message.content
    conversation.append({"speaker": "Persuader", "message": persuader_msg})
    respondent_history.append({"role": "user", "content": persuader_msg})
    
    # Multi-turn exchange
    for turn in range(n_turns):
        # Respondent replies
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": persona},
                *respondent_history
            ],
            temperature=1.0
        )
        
        respondent_msg = response.choices[0].message.content
        conversation.append({"speaker": "Respondent", "message": respondent_msg})
        respondent_history.append({"role": "assistant", "content": respondent_msg})
        
        # Persuader responds (if not last turn)
        if turn < n_turns - 1:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[
                    {"role": "system", "content": persuader_prompt},
                    *[{"role": "user" if msg["speaker"] == "Respondent" else "assistant", 
                       "content": msg["message"]} for msg in conversation]
                ],
                temperature=0.8
            )
            
            persuader_msg = response.choices[0].message.content
            conversation.append({"speaker": "Persuader", "message": persuader_msg})
            respondent_history.append({"role": "user", "content": persuader_msg})
        
        time.sleep(0.5)
    
    # Measure post-conversation attitude
    post_attitude = measure_attitude(demographics, topic, model)
    
    return {
        'pre': pre_attitude,
        'post': post_attitude,
        'effect': post_attitude - pre_attitude if (pre_attitude and post_attitude) else None,
        'conversation': conversation,
        'strategy': strategy
    }


# Example conversation (commented to save costs)
print("Example multi-turn conversation:")
print("=" * 70)
print("\n[Commented out to save API costs - uncomment to run]")
print("\nTo run:")
print("result = multi_turn_persuasion(test_persona, topic, strategy='direct', n_turns=3)")
print("\n
print("\nExample output:")
print("  Turn 1:")
print("    Persuader: [Opening argument about renewable energy]")
print("    Respondent: [Initial reaction based on persona]")
print("  Turn 2:")
print("    Persuader: [Follow-up argument]")
print("    Respondent: [Response, possibly showing attitude shift]")
print("  ...")
print("\nPre-conversation attitude: 4/7")
print("Post-conversation attitude: 5/7")
print("Effect: +1 point")

**What this code does:**

Implements **multi-turn interactive persuasion** conversations:

**Two conversation strategies:**

1. **Direct persuasion**:
   - AI presents arguments directly
   - Uses facts, logic, appeals to values
   - Traditional persuasion approach

2. **Motivational interviewing**:
   - AI uses reflective listening
   - Asks open-ended questions
   - Explores respondent's ambivalence
   - More collaborative, less confrontational

**Conversation structure:**
- Persuader (GPT-4o): Generates strategic messages
- Respondent (GPT-4o-mini): Replies based on persona
- Multiple rounds of back-and-forth
- Maintains conversation context throughout

**Key findings from Argyle et al. (2025):**
- Multi-turn conversations DID produce persuasion effects
- BUT: Didn't significantly outperform one-shot messages
- Direct vs motivational: No significant difference
- **Implication**: Elaboration may not add value (at least in short conversations)

**Why multi-turn might not help:**
- Short conversations (3-6 turns) may not allow deep engagement
- LLMs may already "elaborate" internally in one-shot
- Survey context different from real dialogue
- Missing non-verbal cues, rapport-building

**Cost considerations:**
- 3 turns × 2 messages per turn = 6 messages
- Plus pre/post measurements = 8 API calls
- Using GPT-4o for persuader: ~ per conversation
- More expensive than one-shot (but tests different theory)

**When to use:**
- Testing elaboration vs one-shot hypotheses
- Exploring conversation dynamics
- Studying resistance and counter-arguments
- Prototyping chatbot interventions

---

## Part 6: Personalized Outcomes (Velez & Liu Approach)

**Velez & Liu (2025)** innovated by personalizing not just messages but also the **outcome measures** themselves.

Their approach:
1. Ask open-ended: "What political issue matters most to you?"
2. Use LLM to summarize their response
3. Generate personalized attitude scales for THEIR specific issue
4. Generate pro/con arguments about THEIR issue
5. Measure attitude change on personalized scales

In [None]:
def collect_core_issue(demographics, model="gpt-4o-mini"):
    """
    Simulate respondent identifying their core political issue
    (In real survey, this would be open-ended text entry)
    
    Args:
        demographics: dict of demographic attributes
        model: which model to use
    
    Returns:
        str: open-ended response about core issue
    """
    persona = create_persona(demographics)
    
    prompt = """What political or social issue matters most to you personally?

Please describe in 2-3 sentences:
- What the issue is
- Why it matters to you
- What you think should be done"""
    
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": persona},
            {"role": "user", "content": prompt}
        ],
        temperature=1.0
    )
    
    return response.choices[0].message.content


def summarize_core_issue(core_issue_text, model="gpt-3.5-turbo"):
    """
    Summarize respondent's core issue in one sentence
    
    Args:
        core_issue_text: open-ended response
        model: which model to use
    
    Returns:
        str: one-sentence summary
    """
    prompt = f"""Summarize this person's core political concern in ONE clear sentence:

{core_issue_text}

Return only the summary sentence."""
    
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    return response.choices[0].message.content.strip()


def generate_personalized_scale(summary, model="gpt-3.5-turbo"):
    """
    Generate personalized Likert scale items for their specific issue
    
    Args:
        summary: one-sentence summary of core issue
        model: which model to use
    
    Returns:
        list: 3 Likert scale items
    """
    prompt = f"""Create 3 Likert scale items (statements) to measure someone's attitude about:

{summary}

Requirements:
- Each item should be a clear statement (not a question)
- Items should measure different aspects (strength, certainty, priority)
- Use language from their original concern
- Suitable for 1-7 scale (Strongly Disagree to Strongly Agree)

Return as numbered list."""
    
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    
    return response.choices[0].message.content


def generate_personalized_argument(summary, stance, intensity="moderate", model="gpt-4o"):
    """
    Generate pro or con argument about their specific issue
    
    Args:
        summary: one-sentence summary of core issue
        stance: 'pro' or 'con'
        intensity: 'moderate', 'strong', or 'vitriolic'
        model: which model to use
    
    Returns:
        str: persuasive argument
    """
    if intensity == "moderate":
        tone = "respectful and factual"
    elif intensity == "strong":
        tone = "strongly worded but civil"
    else:  # vitriolic
        tone = "harsh and uncivil (for research purposes only)"
    
    prompt = f"""Write a {tone} argument that {'supports' if stance == 'pro' else 'opposes'}:

{summary}

Make it 2-3 sentences. Return only the argument."""
    
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.8
    )
    
    return response.choices[0].message.content.strip()


# Demonstrate the Velez & Liu pipeline
print("Velez & Liu (2025) Personalized Outcome Approach")
print("=" * 70)

# Step 1: Collect core issue
print("\nStep 1: Collecting respondent's core issue...\n")
core_issue = collect_core_issue(test_persona)
print(f"Respondent says:\n\"{core_issue}\"\n")

# Step 2: Summarize
print("Step 2: Summarizing core issue...\n")
summary = summarize_core_issue(core_issue)
print(f"Summary: {summary}\n")

# Step 3: Generate personalized scales
print("Step 3: Generating personalized attitude scales...\n")
scale_items = generate_personalized_scale(summary)
print(f"Personalized Likert items:\n{scale_items}\n")

# Step 4: Generate pro and con arguments
print("Step 4: Generating arguments...\n")
pro_arg = generate_personalized_argument(summary, "pro", "moderate")
con_arg = generate_personalized_argument(summary, "con", "moderate")

print(f"Pro argument:\n\"{pro_arg}\"\n")
print(f"Con argument:\n\"{con_arg}\"\n")

print("=" * 70)
print("\nVelez & Liu Innovation:")
print("  • Personalizes BOTH treatment AND outcome")
print("  • Maximizes relevance to each respondent")
print("  • Tests 'easy case' for finding polarization")
print("\nKey finding: Even with this personalization, polarization")
print("was hard to produce (only emerged with vitriolic messages)")

**What this code does:**

Implements **Velez & Liu's (2025) personalized outcome approach**:

**The innovation:**
- Most persuasion research: Same topic for everyone
- Velez & Liu: Let EACH person define their own core issue
- Then personalize everything to THEIR specific concern

**Four-step pipeline:**

1. **Collect core issue** (open-ended)
   - "What matters most to you?"
   - Respondent writes in their own words
   - Could be anything: healthcare, immigration, climate, etc.

2. **Summarize** with LLM
   - GPT-3 extracts key concern in one sentence
   - Uses respondent's language
   - Creates consistent format for next steps

3. **Generate personalized scales**
   - Create Likert items about THEIR specific issue
   - Measure attitude strength, certainty, extremity
   - All tailored to what they care about

4. **Generate personalized arguments**
   - Pro and con arguments about THEIR issue
   - Different intensity levels (moderate → vitriolic)
   - Test what induces polarization

**Why this is powerful:**
- Maximum personal relevance
- "Easy test" for polarization theories
- If polarization doesn't happen here, maybe it's rare

**Key findings:**
- Even with full personalization, polarization was RARE
- Moderate arguments: No polarization
- Strong arguments: Little polarization
- Vitriolic arguments: Some attitude defense emerged
- **Implication**: Polarization harder to produce than assumed

**Practical applications:**
- Survey design: Let respondents define dimensions
- Intervention testing: Personalize to real concerns
- Theory testing: Create strong tests of hypotheses

**Limitations:**
- LLM summarization may miss nuance
- Generated scales may not capture all aspects
- Still simulated respondents (need human validation)

---

## Open-Source Alternatives

The examples above use OpenAI's API. Here's how to run persuasion experiments with open-source models.

### Using Ollama

In [None]:
from ollama import Client as OllamaClient

ollama_client = OllamaClient(host='http://localhost:11434')

In [None]:
# Measure attitude with Ollama
def measure_attitude_ollama(demographics, topic, model="llama3.2"):
    persona = f"""You are a {demographics['age']}-year-old {demographics['race']} {demographics['gender']}."""
    prompt = f"""On a scale of 1-7, how much do you support {topic}?
(1=strongly oppose, 7=strongly support)

Answer with a single number only."""
    
    response = ollama_client.chat(
        model=model,
        messages=[
            {"role": "system", "content": persona},
            {"role": "user", "content": prompt}
        ],
        options={"temperature": 1.0}
    )
    
    import re
    text = response['message']['content']
    match = re.search(r'\b([1-7])\b', text)
    return int(match.group(1)) if match else None

# Generate persuasive message
def generate_message_ollama(topic, position, model="llama3.2"):
    prompt = f"""Write a brief persuasive message (2-3 sentences) arguing that people should {position} {topic}."""
    
    response = ollama_client.chat(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        options={"temperature": 0.8}
    )
    
    return response['message']['content']

# Pre-test/post-test experiment
demographics = {'age': 40, 'race': 'white', 'gender': 'male'}
topic = "universal basic income"

pre = measure_attitude_ollama(demographics, topic)
message = generate_message_ollama(topic, "support")
print(f"Message: {message}\n")

# Show message to respondent (simulated by adding to conversation history)
post = measure_attitude_ollama(demographics, topic)

print(f"Pre-test: {pre}")
print(f"Post-test: {post}")
print(f"Effect: {post - pre if pre and post else 'N/A'}")

### Using Hugging Face

In [None]:
# Persuasion experiment with Hugging Face
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
hf_model_name = "microsoft/Phi-3-mini-4k-instruct"
hf_tokenizer = AutoTokenizer.from_pretrained(hf_model_name, trust_remote_code=True)
hf_model = AutoModelForCausalLM.from_pretrained(hf_model_name, torch_dtype=torch.float16, trust_remote_code=True).to(device)

def measure_attitude_hf(demographics, topic):
    persona = f"""You are a {demographics['age']}-year-old {demographics['gender']}."""
    prompt = f"""On a scale of 1-7, how much do you support {topic}? Answer with a single number only."""
    
    messages = [{"role": "system", "content": persona}, {"role": "user", "content": prompt}]
    formatted_prompt = hf_tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = hf_tokenizer(formatted_prompt, return_tensors="pt").to(device)
    
    outputs = hf_model.generate(**inputs, max_new_tokens=10, temperature=1.0, do_sample=True)
    response_text = hf_tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
    
    import re
    match = re.search(r'\b([1-7])\b', response_text)
    return int(match.group(1)) if match else None

def generate_message_hf(topic, position):
    prompt = f"""Write a brief persuasive message (2-3 sentences) arguing that people should {position} {topic}."""
    formatted_prompt = hf_tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True)
    inputs = hf_tokenizer(formatted_prompt, return_tensors="pt").to(device)
    outputs = hf_model.generate(**inputs, max_new_tokens=100, temperature=0.8, do_sample=True)
    return hf_tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)

# Run experiment
pre = measure_attitude_hf(demographics, topic)
message = generate_message_hf(topic, "support")
post = measure_attitude_hf(demographics, topic)

print(f"Pre: {pre}, Post: {post}, Effect: {post - pre if pre and post else 'N/A'}")

---

## Summary and Best Practices

**What we covered:**
1. Pre-test/post-test experimental design
2. Generic vs microtargeted message generation
3. Testing persuasion effects with synthetic respondents
4. Multi-turn interactive conversations
5. Personalized outcomes approach (Velez & Liu)

**Key findings from recent research:**
- **Argyle et al. (2025)**: Persuasion works, but simple = complex
  - Microtargeting didn't beat generic messages
  - Multi-turn didn't beat one-shot
- **Velez & Liu (2025)**: Polarization is hard to produce
  - Even with maximum personalization
  - Only vitriolic messages showed effects

**Best practices:**
1. **Always include control group** (no message)
2. **Use 7-point scales** (more sensitive than 5-point)
3. **Validate with human data** (essential!)
4. **Report effect sizes** with confidence intervals
5. **Test simple approaches first** (may be just as good)
6. **Document all parameters** (model, temperature, prompts)
7. **Consider ethics** (IRB approval, deception protocols)

**Limitations to remember:**
- LLM responses may not predict human behavior
- Effect sizes may be inflated or deflated
- Missing real-world context (face-to-face, trust, etc.)
- Short-term effects only (no follow-up)
- Synthetic data for prototyping, not replacement

**When to use these methods:**
- ✓ Rapid prototyping of message variants
- ✓ Testing theoretical mechanisms
- ✓ Exploring design space before expensive studies
- ✗ As sole evidence for claims about humans
- ✗ Without validation against real respondents

**Next steps:**
- Run your own experiments with real survey topics
- Compare to actual human experimental data
- Explore boundary conditions (when does it work/fail?)
- Consider ethical implications of persuasion research