# LLMs for Simulation I: Generative Agents

**Learning objectives:**
- Understand how LLMs can enhance agent-based models (ABMs)
- Create realistic personas for simulated social media users
- Implement multi-agent simulations with realistic(ish) discourse
- Test alternative platform algorithms (echo chambers vs bridging)
- Measure emergent properties (toxicity, cross-partisan engagement)
- Evaluate social media interventions through simulation

**How to run this notebook:**
- **Google Colab** (recommended): Works for all parts
- **OpenAI API key needed**: For agent conversations

**Key paper:** Törnberg et al. (2023). "Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms." [arXiv:2310.05984](https://arxiv.org/abs/2310.05984)

---

## Introduction: Why Simulate Social Media?

Social media platforms face a fundamental challenge: **How do you design algorithms that promote healthy public discourse?**

**The problem with existing research methods:**
- **Observational studies**: Can't test non-existent alternatives
- **Field experiments**: Expensive, risky, limited by platform cooperation
- **Lab experiments**: Can't capture emergent group dynamics
- **Traditional ABMs**: Agents follow simple rules, can't engage in realistic discourse

**The LLM + ABM solution:**
- Combine the **emergence** of agent-based models
- With the **linguistic realism** of large language models
- Create synthetic social media platforms to test "what if" scenarios
- Rapid, safe, and cost-effective exploration of design space

**What we'll build:**
A simplified version of Törnberg et al. (2023) that simulates a social media platform with:
- 20 agents with realistic personas based on (made-up) survey data
- Real news headlines
- Three different newsfeed algorithms
- Posting, liking, and commenting behaviors
- Measurement of toxicity and cross-partisan engagement

---

## Setup

In [None]:
# Install packages
!pip install -q openai pandas numpy matplotlib seaborn networkx

In [None]:
import os
import json
import getpass
import time
import re
from collections import defaultdict, Counter
from typing import List, Dict, Optional

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from openai import OpenAI

# Set API key
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

client = OpenAI()

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 6)

print("✓ Setup complete!")

---

## Part 1: Creating Realistic Personas

Following Törnberg et al. (2023), we create agent personas based on (made-up) survey data. In the original study, they used the American National Election Study (ANES) to generate 500 personas. For this tutorial, we'll create a smaller set of 6 diverse personas.

**Key persona components:**
1. **Demographics**: Age, gender, race, education, income, location
2. **Political identity**: Party affiliation, voting history
3. **Attitudes**: Feelings toward political figures and groups
4. **Interests**: Non-political hobbies, sports teams, TV shows
5. **Personality**: Traits generated by LLM to enrich the persona

**Why survey data?**
- Ensures demographic representativeness
- Captures correlations between attributes (e.g., age and party---though not my presentation on this subject)
- Provides realistic distributions of political attitudes

In [None]:
# Sample personas representing diverse US electorate
# In a full study, these would come from ANES survey data
SAMPLE_PERSONAS = [
    {
        "name": "John Smith",
        "age": 45,
        "gender": "male",
        "race": "White",
        "education": "High school",
        "income": "middle",
        "region": "Iowa",
        "religion": "Evangelical Protestant",
        "party": "Strong Republican",
        "voted_for": "Donald Trump in 2020",
        "loves": "Republicans, Donald Trump, Christians, NRA, Christian Fundamentalists, conservatives",
        "hates": "Democrats, Joe Biden, Black Lives Matter, Anthony Fauci, liberals",
        "hobbies": "Hunting, fishing, woodworking",
        "sports_team": "Iowa Hawkeyes (college football)",
        "political_opinions": "Supports strict immigration policies, opposes abortion rights, believes in limited government intervention",
        "personality": "Patriotic, traditionalist, family-oriented"
    },
    {
        "name": "Maria Garcia",
        "age": 32,
        "gender": "female",
        "race": "Hispanic",
        "education": "Bachelor's degree",
        "income": "middle",
        "region": "California",
        "religion": "Catholic",
        "party": "Strong Democrat",
        "voted_for": "Joe Biden in 2020",
        "loves": "Democrats, Joe Biden, immigration rights activists, healthcare workers, teachers",
        "hates": "Republicans, Donald Trump, ICE, anti-immigrant rhetoric",
        "hobbies": "Volunteering, reading, cooking",
        "sports_team": "LA Dodgers",
        "political_opinions": "Supports immigration reform, universal healthcare, climate action",
        "personality": "Compassionate, community-focused, socially conscious"
    },
    {
        "name": "James Thompson",
        "age": 58,
        "gender": "male",
        "race": "Black",
        "education": "Some college",
        "income": "lower-middle",
        "region": "Georgia",
        "religion": "Baptist",
        "party": "Democrat",
        "voted_for": "Joe Biden in 2020",
        "loves": "Democrats, civil rights activists, Black Lives Matter, community leaders",
        "hates": "Republicans, systemic racism, police brutality",
        "hobbies": "Church activities, basketball, mentoring youth",
        "sports_team": "Atlanta Hawks",
        "political_opinions": "Supports criminal justice reform, voting rights, racial equity",
        "personality": "Faith-driven, resilient, community advocate"
    },
    {
        "name": "Sarah Johnson",
        "age": 28,
        "gender": "female",
        "race": "White",
        "education": "Master's degree",
        "income": "upper-middle",
        "region": "New York",
        "religion": "Atheist",
        "party": "Independent (leans Democrat)",
        "voted_for": "Joe Biden in 2020",
        "loves": "Science, evidence-based policy, environmental activism",
        "hates": "Anti-vaxxers, climate change deniers, political extremism",
        "hobbies": "Hiking, photography, craft beer",
        "sports_team": "Not interested in sports",
        "political_opinions": "Supports climate action, science funding, secular government",
        "personality": "Analytical, environmentally conscious, pragmatic"
    },
    {
        "name": "Michael Chen",
        "age": 35,
        "gender": "male",
        "race": "Asian",
        "education": "Bachelor's degree",
        "income": "upper-middle",
        "region": "Washington",
        "religion": "No religion",
        "party": "Independent",
        "voted_for": "Did not vote in 2020",
        "loves": "Technology, innovation, economic growth",
        "hates": "Government overreach, bureaucracy",
        "hobbies": "Gaming, coding, investing",
        "sports_team": "Seattle Seahawks",
        "political_opinions": "Fiscally conservative, socially moderate, pro-technology",
        "personality": "Tech-savvy, libertarian-leaning, entrepreneurial"
    },
    {
        "name": "Linda Davis",
        "age": 52,
        "gender": "female",
        "race": "White",
        "education": "Associate degree",
        "income": "middle",
        "region": "Ohio",
        "religion": "Methodist",
        "party": "Independent",
        "voted_for": "Donald Trump in 2020",
        "loves": "Small businesses, local communities, traditional values",
        "hates": "Big corporations, political elites, cancel culture",
        "hobbies": "Gardening, baking, quilting",
        "sports_team": "Ohio State Buckeyes",
        "political_opinions": "Supports small business, skeptical of both parties",
        "personality": "Down-to-earth, independent-minded, community-oriented"
    }
]

# For demonstration, we'll use 6 agents
NUM_AGENTS = len(SAMPLE_PERSONAS)

print(f"Created {NUM_AGENTS} agent personas:")
print("="*70)
for i, p in enumerate(SAMPLE_PERSONAS, 1):
    print(f"{i}. {p['name']}: {p['age']}yo {p['party']} {p['gender']} from {p['region']}")

**What this code does:**

Creates a diverse sample of agent personas with:
- **Demographic diversity**: Different ages, races, genders, regions
- **Political diversity**: Republicans, Democrats, Independents
- **Rich attributes**: Beyond demographics to include values, interests, personality

**Key design principle from Törnberg et al.:**
- **Dynamic persona generation**: Only include relevant attributes
  - Political agents: Include party, attitudes toward figures
  - Non-political agents: Emphasize hobbies, interests
  - This prevents "overpoliticization" of all agents

**Scaling up:**
- For a full study: 100-500 agents from survey data
- Each additional agent adds cost but increases diversity (in theory)
- Trade-off: Complexity vs. computational resources

### Convert persona to prompt format

In [None]:
def create_persona_prompt(persona: Dict) -> str:
    """
    Convert persona dict into prompt format for LLM
    
    Args:
        persona: Dictionary with persona attributes
    
    Returns:
        str: Formatted persona prompt
    """
    prompt = f"""You are {persona['name']}.

Demographics:
- Age: {persona['age']}
- Gender: {persona['gender']}
- Race: {persona['race']}
- Education: {persona['education']}
- Income: {persona['income']} income
- Region: {persona['region']}
- Religion: {persona['religion']}

Political Identity:
- Party: {persona['party']}
- You voted for {persona['voted_for']}
- You love: {persona['loves']}
- You dislike: {persona['hates']}
- Political views: {persona['political_opinions']}

Interests:
- Hobbies: {persona['hobbies']}
- Sports: {persona['sports_team']}

Personality: {persona['personality']}
"""
    return prompt

# Example
print("Example persona prompt:")
print("="*70)
print(create_persona_prompt(SAMPLE_PERSONAS[0]))

---

## Part 2: News Headlines and Content

Agents need something to react to. Törnberg et al. used real news headlines from July 1, 2020 - a particularly contentious moment in US politics:
- COVID-19 pandemic ongoing
- Black Lives Matter protests
- Presidential election campaign

For this simulation, we'll use a simplified set of headlines representing different news sources and topics.

In [None]:
# Sample news headlines from July 2020
# In the full study, these came from newsapi.ai with 100-word summaries
NEWS_HEADLINES = [
    {
        "source": "CNN",
        "headline": "Trump calls Black Lives Matter a 'symbol of hate'",
        "summary": "President Trump defended his refusal to paint 'Black Lives Matter' on Fifth Avenue, calling it a 'symbol of hate.' The mayor of New York announced plans to paint the slogan on the street in front of Trump Tower.",
        "political_slant": "left"
    },
    {
        "source": "Fox News",
        "headline": "Seattle CHOP zone closes after shootings, violence",
        "summary": "Seattle police moved in to clear the Capitol Hill Organized Protest zone after weeks of violence, including multiple shootings. Mayor Jenny Durkan ordered the area cleared following the death of a teenager.",
        "political_slant": "right"
    },
    {
        "source": "ABC News",
        "headline": "Alabama students throwing 'COVID parties' to see who gets infected first",
        "summary": "City officials report that students in Tuscaloosa have been organizing parties where attendees deliberately expose themselves to COVID-19, with a prize going to the first person who tests positive.",
        "political_slant": "neutral"
    },
    {
        "source": "New York Times",
        "headline": "Minor League Baseball Season Is Canceled for the First Time",
        "summary": "Minor League Baseball announced the cancellation of its 2020 season, affecting 160 teams. This marks the first time in the league's history that a season has been called off entirely.",
        "political_slant": "neutral"
    },
    {
        "source": "NPR",
        "headline": "Supreme Court Rules Trump Cannot Block Critics On Twitter",
        "summary": "The Supreme Court ruled that President Trump violated the First Amendment by blocking critics on Twitter, saying public officials cannot exclude people from online forums based on their views.",
        "political_slant": "neutral"
    },
    {
        "source": "Fox News",
        "headline": "Kristin Chenoweth says country music is 'becoming more open' to LGBTQ inclusion",
        "summary": "Singer and actress Kristin Chenoweth praised country music for becoming more inclusive of LGBTQ artists and themes, highlighting recent statements of support from major country music stars.",
        "political_slant": "right"
    },
    {
        "source": "HuffPost",
        "headline": "Terry Crews Panned Online For His Cautionary Tweet On Black Lives Matter",
        "summary": "Actor Terry Crews faced backlash on social media after tweeting that Black Lives Matter should not become 'Black Lives Better,' warning against what he called 'Black supremacy.' Critics accused him of misunderstanding the movement.",
        "political_slant": "left"
    }
]

print(f"Loaded {len(NEWS_HEADLINES)} news headlines:")
print("="*70)
for i, news in enumerate(NEWS_HEADLINES, 1):
    print(f"{i}. [{news['source']}] {news['headline']}")

**What this represents:**

- **Diverse news sources**: CNN, Fox News, NPR, etc. (left, right, center)
- **Contentious topics**: BLM, COVID, Trump, social issues
- **Non-political content**: Sports, entertainment (not everything is political)

**Matching personas to news sources:**
In Törnberg et al., agents only see news from sources they consume according to ANES data:
- Conservatives → Fox News, Breitbart
- Liberals → CNN, MSNBC, NYT
- Moderates → Mix of sources

This creates **selective exposure** - agents naturally see news aligned with their views.

---

## Part 3: Agent Actions - Posting, Liking, Commenting

Now we implement the core agent behaviors:
1. **Read news** → Select article of interest
2. **Post** → Share article with comment
3. **See timeline** → View posts from others (algorithm-dependent)
4. **Like** → React positively to posts
5. **Comment** → Reply to posts

These actions are implemented as API calls where agents use their persona to decide what to do.

In [None]:
class SocialMediaAgent:
    """
    An LLM-powered agent that can post, like, and comment on social media
    """
    
    def __init__(self, persona: Dict, model: str = "gpt-4o-mini"):
        self.persona = persona
        self.name = persona['name']
        self.party = persona.get('party', 'Independent')
        self.model = model
        self.persona_prompt = create_persona_prompt(persona)
        self.posts = []  # Posts this agent has made
        self.likes = []  # Post IDs this agent has liked
        self.comments = []  # Comments this agent has made
    
    def share_news(self, news_articles: List[Dict], max_tokens: int = 100) -> Optional[Dict]:
        """
        Agent reads news headlines and decides which to share with a comment
        
        Args:
            news_articles: List of news articles to choose from
            max_tokens: Maximum length of comment
        
        Returns:
            Dict with selected article and comment, or None
        """
        # Format news list
        news_list = "\n".join([
            f"{i+1}. [{article['source']}] {article['headline']}\n   {article['summary']}"
            for i, article in enumerate(news_articles)
        ])
        
        prompt = f"""Here are today's news headlines:

{news_list}

Choose ONE headline that interests you most based on your values and opinions. Write a brief comment (2-3 sentences) sharing your reaction to share on social media.

Format your response as:
HEADLINE_NUMBER: [number]
COMMENT: [your 2-3 sentence comment]
"""
        
        try:
            response = client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": self.persona_prompt},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.9,
                max_tokens=max_tokens
            )
            
            content = response.choices[0].message.content
            
            # Parse response
            headline_match = re.search(r'HEADLINE_NUMBER:\s*(\d+)', content)
            comment_match = re.search(r'COMMENT:\s*(.+)', content, re.DOTALL)
            
            if headline_match and comment_match:
                headline_idx = int(headline_match.group(1)) - 1
                if 0 <= headline_idx < len(news_articles):
                    post = {
                        "author": self.name,
                        "party": self.party,
                        "article": news_articles[headline_idx],
                        "comment": comment_match.group(1).strip(),
                        "likes": 0,
                        "comments": [],
                        "likes_by_party": {"Democrat": 0, "Republican": 0, "Independent": 0}
                    }
                    self.posts.append(post)
                    return post
        
        except Exception as e:
            print(f"Error in share_news for {self.name}: {e}")
        
        return None
    
    def decide_likes(self, posts: List[Dict]) -> List[int]:
        """
        Agent sees posts and decides which to like
        
        Args:
            posts: List of posts to consider
        
        Returns:
            List of indices of posts to like
        """
        if not posts:
            return []
        
        # Format posts
        posts_text = "\n\n".join([
            f"Post {i+1} by {post['author']}:\n\"{post['comment']}\"\n(About: {post['article']['headline']})"
            for i, post in enumerate(posts)
        ])
        
        prompt = f"""You see these posts on social media:

{posts_text}

Which posts do you want to LIKE?

Respond with comma-separated post numbers (e.g., "1, 3, 5") or "none" if you don't like any."""
        
        try:
            response = client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": self.persona_prompt},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.7,
                max_tokens=50
            )
            
            content = response.choices[0].message.content.lower()
            
            if "none" in content:
                return []
            
            # Extract numbers
            numbers = re.findall(r'\d+', content)
            liked_indices = [int(n) - 1 for n in numbers if 0 <= int(n) - 1 < len(posts)]
            
            self.likes.extend([posts[i] for i in liked_indices])
            return liked_indices
        
        except Exception as e:
            print(f"Error in decide_likes for {self.name}: {e}")
            return []
    
    def write_comment(self, post: Dict, max_tokens: int = 80) -> Optional[str]:
        """
        Agent writes a comment on a post
        
        Args:
            post: Post to comment on
            max_tokens: Maximum comment length
        
        Returns:
            Comment text or None
        """
        prompt = f"""You see this post:

Author: {post['author']}
Post: "{post['comment']}"
Article: {post['article']['headline']}

Write a brief reply (1-2 sentences) responding to this post."""
        
        try:
            response = client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": self.persona_prompt},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.9,
                max_tokens=max_tokens
            )
            
            comment_text = response.choices[0].message.content.strip()
            
            comment = {
                "author": self.name,
                "party": self.party,
                "text": comment_text
            }
            
            self.comments.append(comment)
            return comment
        
        except Exception as e:
            print(f"Error in write_comment for {self.name}: {e}")
            return None

print("✓ SocialMediaAgent class defined")

**What this code does:**

Implements a complete social media agent with three key behaviors:

1. **`share_news()`**: Agent as content creator
   - Reads multiple news headlines
   - Selects one based on their interests/values
   - Writes authentic comment (2-3 sentences)
   - Temperature = 0.9 (allows personality variation)

2. **`decide_likes()`**: Agent as consumer
   - Sees posts from others in timeline
   - Decides which posts to like
   - Only likes content they genuinely support
   - Returns list of post indices

3. **`write_comment()`**: Agent as discussant
   - Reads a specific post
   - Writes reply (agree, disagree, nuance)
   - Temperature = 0.9 (authentic reactions)

**Key design decisions:**
- **System message**: Full persona context for all actions
- **Structured outputs**: Parse responses to extract decisions
- **Error handling**: Build in handling for e.g., API failures
- **Tracking**: Store posts, likes, comments for analysis

**Limitations:**
- No memory of previous interactions (could add with context)
- Each action is independent API call (expensive)
- Simplified emotion model (real humans more complex)

### Test agent behavior

In [None]:
# Create test agent
test_agent = SocialMediaAgent(SAMPLE_PERSONAS[0])

print(f"Testing {test_agent.name}:")
print(f"Party: {test_agent.party}")
print("\nSharing news...")
print("="*70)

post = test_agent.share_news(NEWS_HEADLINES)

if post:
    print(f"\nSelected article: [{post['article']['source']}] {post['article']['headline']}")
    print(f"\nComment: \"{post['comment']}\"")
else:
    print("(Failed to generate post)")

---

## Part 4: Platform Algorithms

The core innovation of Törnberg et al. is comparing **three different newsfeed algorithms**:

### Platform 1: Echo Chamber (Following + Engagement)
- Users only see posts from accounts they follow
- Posts ranked by likes + comments (engagement)
- **Hypothesis**: Low cross-partisan interaction, low toxicity
- **Mechanism**: Homophily → filter bubble

### Platform 2: Global Feed (All Users + Engagement)
- Users see all high-engagement posts
- Not limited to who they follow
- **Hypothesis**: More cross-partisan interaction, higher toxicity
- **Mechanism**: Exposure to opposing views → conflict

### Platform 3: Bridging Algorithm (Cross-Partisan Likes)
- Posts ranked by **likes from opposing party members**
- Amplifies consensus, not outrage
- **Hypothesis**: High cross-partisan interaction, low toxicity
- **Mechanism**: Rewards civility and common ground

Let's implement these algorithms.

In [None]:
def create_social_network(agents: List[SocialMediaAgent], homophily: float = 0.7) -> Dict:
    """
    Create social network where agents follow each other based on homophily
    
    Args:
        agents: List of agents
        homophily: Probability of following similar others (0-1)
    
    Returns:
        Dict mapping agent name to list of followed agent names
    """
    np.random.seed(42)
    network = {agent.name: [] for agent in agents}
    
    for agent in agents:
        # Follow some agents
        n_follows = np.random.randint(3, min(8, len(agents)))  # Follow 3-8 people
        
        # Calculate similarity scores (simplified - based on party)
        candidates = [a for a in agents if a.name != agent.name]
        
        # Probability proportional to homophily
        probs = []
        for candidate in candidates:
            if agent.party == candidate.party:
                prob = homophily
            else:
                prob = 1 - homophily
            probs.append(prob)
        
        # Normalize
        probs = np.array(probs)
        probs = probs / probs.sum()
        
        # Sample follows
        follows = np.random.choice(candidates, size=min(n_follows, len(candidates)), 
                                  replace=False, p=probs)
        network[agent.name] = [f.name for f in follows]
    
    return network


def platform_1_timeline(posts: List[Dict], agent: SocialMediaAgent, 
                       network: Dict, top_k: int = 5) -> List[Dict]:
    """
    Platform 1: Echo chamber - only see posts from followed users
    Ranked by engagement (likes + comments)
    
    Args:
        posts: All posts on platform
        agent: Agent viewing timeline
        network: Social network (who follows whom)
        top_k: Number of posts to show
    
    Returns:
        List of top posts for this agent
    """
    followed = network.get(agent.name, [])
    
    # Filter to followed users only
    relevant_posts = [p for p in posts if p['author'] in followed]
    
    # Rank by engagement
    ranked = sorted(relevant_posts, 
                   key=lambda p: p['likes'] + len(p['comments']),
                   reverse=True)
    
    return ranked[:top_k]


def platform_2_timeline(posts: List[Dict], agent: SocialMediaAgent, 
                       top_k: int = 5) -> List[Dict]:
    """
    Platform 2: Global feed - see all high-engagement posts
    Not limited by following
    
    Args:
        posts: All posts on platform
        agent: Agent viewing timeline
        top_k: Number of posts to show
    
    Returns:
        List of top posts for this agent
    """
    # Rank all posts by engagement
    ranked = sorted(posts, 
                   key=lambda p: p['likes'] + len(p['comments']),
                   reverse=True)
    
    return ranked[:top_k]


def platform_3_timeline(posts: List[Dict], agent: SocialMediaAgent,
                       top_k: int = 5) -> List[Dict]:
    """
    Platform 3: Bridging algorithm - posts ranked by CROSS-PARTISAN likes
    Amplifies content liked by opposing party members
    
    Args:
        posts: All posts on platform
        agent: Agent viewing timeline
        top_k: Number of posts to show
    
    Returns:
        List of top posts for this agent
    """
    # Calculate bridging score for each post
    def bridging_score(post):
        # For a Republican post, count Democrat likes (and vice versa)
        post_party = post['party']
        
        if 'Republican' in post_party:
            # Count Democrat likes
            return post['likes_by_party'].get('Democrat', 0)
        elif 'Democrat' in post_party:
            # Count Republican likes
            return post['likes_by_party'].get('Republican', 0)
        else:
            # Independent: count likes from both parties
            return (post['likes_by_party'].get('Democrat', 0) + 
                   post['likes_by_party'].get('Republican', 0))
    
    # Rank by bridging score
    ranked = sorted(posts, key=bridging_score, reverse=True)
    
    return ranked[:top_k]


print("✓ Platform algorithms defined")
print("  • Platform 1: Echo chamber (following + engagement)")
print("  • Platform 2: Global feed (all posts + engagement)")
print("  • Platform 3: Bridging (cross-partisan likes)")

**What this code does:**

Implements three distinct recommendation algorithms:

**Platform 1 - Echo Chamber:**
- Filters posts to only followed accounts
- Ranks by total engagement (likes + comments)
- **Mirrors Twitter pre-2023** (chronological → engagement ranking)
- **Expected outcome**: Minimal exposure to opposing views

**Platform 2 - Global Feed:**
- Shows ALL posts regardless of following
- Ranks purely by engagement
- **Mirrors Twitter post-2023** ("For You" tab)
- **Expected outcome**: More cross-partisan exposure, more conflict

**Platform 3 - Bridging Algorithm:**
- NEW approach proposed by Törnberg et al.
- Ranks by cross-partisan likes only:
  - Republican post → Count Democrat likes
  - Democrat post → Count Republican likes
- **Theory**: Rewards civility, consensus, common ground
- **Expected outcome**: Cross-partisan engagement without toxicity

**Why bridging might work:**
- Extremists can't gain visibility (their own side likes, but not opponents)
- Moderate voices amplified (appeal across divide)
- Shared concerns surface (healthcare, economy, etc.)
- Trolls/provocateurs marginalized

**Implementation notes:**
- `top_k=5`: Show 5 posts per timeline (manageable for agents to process)
- Social network created with homophily (like follows like)
- All algorithms see same underlying posts (fair comparison)

---

## Part 5: Running the Simulation

Now we put it all together. The simulation proceeds in steps:

1. **Initialize**: Create agents and social network
2. **Posting phase**: Agents read news and create posts
3. **Engagement phase**: Agents see timeline (algorithm-dependent) and like/comment
4. **Repeat**: For multiple rounds
5. **Analyze**: Measure toxicity and cross-partisan engagement

**Note**: This is computationally expensive (many API calls). For demonstration, we'll run a simplified version.

In [None]:
def run_simulation(agents: List[SocialMediaAgent], 
                  news_headlines: List[Dict],
                  platform_algorithm: str = "bridging",
                  n_posting_rounds: int = 1,
                  n_engagement_rounds: int = 1) -> Dict:
    """
    Run social media simulation
    
    Args:
        agents: List of agents
        news_headlines: List of news articles
        platform_algorithm: "echo", "global", or "bridging"
        n_posting_rounds: How many rounds of posting
        n_engagement_rounds: How many rounds of engagement per post
    
    Returns:
        Dict with simulation results
    """
    print(f"\nRunning simulation with {platform_algorithm} algorithm...")
    print(f"Agents: {len(agents)}, News: {len(news_headlines)}")
    print("="*70)
    
    # Create social network (for Platform 1)
    network = create_social_network(agents, homophily=0.7)
    
    all_posts = []
    
    # PHASE 1: Posting
    print("\nPhase 1: Agents posting news...")
    for round_num in range(n_posting_rounds):
        print(f"  Round {round_num + 1}/{n_posting_rounds}", end="")
        
        for i, agent in enumerate(agents):
            print(f"\r  Round {round_num + 1}/{n_posting_rounds} - Agent {i+1}/{len(agents)}", end="")
            post = agent.share_news(news_headlines)
            if post:
                all_posts.append(post)
            time.sleep(0.5)  # Rate limiting
        print()
    
    print(f"\n  ✓ {len(all_posts)} posts created")
    
    # PHASE 2: Engagement (liking and commenting)
    print("\nPhase 2: Agents engaging with posts...")
    
    for round_num in range(n_engagement_rounds):
        print(f"  Round {round_num + 1}/{n_engagement_rounds}")
        
        for i, agent in enumerate(agents):
            print(f"\r    Agent {i+1}/{len(agents)}", end="")
            
            # Get timeline based on algorithm
            if platform_algorithm == "echo":
                timeline = platform_1_timeline(all_posts, agent, network, top_k=5)
            elif platform_algorithm == "global":
                timeline = platform_2_timeline(all_posts, agent, top_k=5)
            else:  # bridging
                timeline = platform_3_timeline(all_posts, agent, top_k=5)
            
            if not timeline:
                continue
            
            # Agent decides which posts to like
            liked_indices = agent.decide_likes(timeline)
            
            for idx in liked_indices:
                post = timeline[idx]
                post['likes'] += 1
                # Track likes by party for bridging algorithm
                liker_party = agent.party
                if 'Democrat' in liker_party:
                    post['likes_by_party']['Democrat'] += 1
                elif 'Republican' in liker_party:
                    post['likes_by_party']['Republican'] += 1
                else:
                    post['likes_by_party']['Independent'] += 1
            
            # Agent might comment on one post
            if timeline and liked_indices:
                # Comment on a liked post
                comment_idx = np.random.choice(liked_indices)
                comment = agent.write_comment(timeline[comment_idx])
                if comment:
                    timeline[comment_idx]['comments'].append(comment)
            
            time.sleep(0.5)  # Rate limiting
        print()
    
    print(f"\n  ✓ Engagement complete")
    
    # Compile results
    results = {
        "platform": platform_algorithm,
        "posts": all_posts,
        "agents": agents,
        "network": network,
        "n_posts": len(all_posts),
        "total_likes": sum(p['likes'] for p in all_posts),
        "total_comments": sum(len(p['comments']) for p in all_posts)
    }
    
    print("\n" + "="*70)
    print("Simulation complete!")
    print(f"  Posts: {results['n_posts']}")
    print(f"  Likes: {results['total_likes']}")
    print(f"  Comments: {results['total_comments']}")
    
    return results

print("✓ Simulation function defined")

### Run simulation (simplified version)

**Cost warning**: Running the full simulation with 6 agents will result in a large number of API calls. The code below is commented out to prevent accidental spending. Uncomment to run.

In [None]:
# Create agents
agents = [SocialMediaAgent(persona) for persona in SAMPLE_PERSONAS]

print(f"Created {len(agents)} agents")
print("\nTo run simulation, uncomment the code below:")
print("")
print("# Run with bridging algorithm")
print("# results = run_simulation(")
print("#     agents,")
print("#     NEWS_HEADLINES,")
print("#     platform_algorithm='bridging',")
print("#     n_posting_rounds=1,")
print("#     n_engagement_rounds=1")
print("# )")
print("")
print("[Commented out to prevent API costs]")

# Uncomment below to run:
# results = run_simulation(
#     agents,
#     NEWS_HEADLINES,
#     platform_algorithm='bridging',
#     n_posting_rounds=1,
#     n_engagement_rounds=1
# )

**What the simulation does:**

**Phase 1: Posting** (Content creation)
- Each agent independently reads news headlines
- Selects article aligned with their values
- Writes authentic comment from their perspective
- All posts added to shared platform

**Phase 2: Engagement** (Content consumption)
- Each agent sees timeline (algorithm-dependent)
- Decides which posts to like
- Writes comments on some posts
- Likes and comments update post engagement scores

**Key mechanisms:**
- **Emergence**: Timeline updates based on engagement
  - Platform 1: Following determines visibility
  - Platform 2: Engagement determines visibility
  - Platform 3: Cross-partisan likes determine visibility
- **Feedback loops**: Popular posts get more visible → more engagement
- **Agent heterogeneity**: Different personas react differently

**Computational cost:**
- 6 agents × 1 post each = 6 API calls
- 6 agents × (5 timeline posts × decide_likes) = 6 calls
- ~3-4 agents × comment = 3-4 calls
- **Total**: ~15-16 API calls per simulation
- At $0.15/1M input + $0.60/1M output tokens: ~$0.50-1.00

**Scaling:**
- Original study: 500 agents, multiple rounds
- Cost scales linearly with agents × rounds
- Trade-off: Realism vs. budget

---

## Part 6: Measuring Outcomes

Törnberg et al. measured two key outcomes:

1. **Toxicity**: Using Perspective API to score text
2. **Cross-partisan engagement**: E-I Index of interactions

### E-I Index (External-Internal Index)

Measures the balance of within-group vs between-group interactions:

$$E-I = \frac{E - I}{E + I}$$

Where:
- $E$ = External (cross-party) interactions
- $I$ = Internal (within-party) interactions

Range: -1 to +1
- -1 = All interactions within-party (echo chamber)
- 0 = Equal mix
- +1 = All interactions cross-party (bridging)

For simplicity, we'll implement the E-I index. Toxicity scoring requires additional API (Perspective) which we'll skip here.

In [None]:
def calculate_ei_index(interactions: List[tuple]) -> float:
    """
    Calculate E-I index for interactions
    
    Args:
        interactions: List of (source_party, target_party) tuples
    
    Returns:
        float: E-I index (-1 to 1)
    """
    if not interactions:
        return 0.0
    
    external = 0  # Cross-party
    internal = 0  # Within-party
    
    for source_party, target_party in interactions:
        # Normalize party labels
        source = 'Democrat' if 'Democrat' in source_party else ('Republican' if 'Republican' in source_party else 'Independent')
        target = 'Democrat' if 'Democrat' in target_party else ('Republican' if 'Republican' in target_party else 'Independent')
        
        if source == target:
            internal += 1
        else:
            external += 1
    
    total = external + internal
    if total == 0:
        return 0.0
    
    ei_index = (external - internal) / total
    return ei_index


def analyze_results(results: Dict) -> Dict:
    """
    Analyze simulation results
    
    Args:
        results: Output from run_simulation()
    
    Returns:
        Dict with analysis metrics
    """
    posts = results['posts']
    
    # Track interactions (who liked whose post)
    like_interactions = []
    comment_interactions = []
    
    # Get agent party mapping
    agent_parties = {agent.name: agent.party for agent in results['agents']}
    
    # Count likes by party
    for post in posts:
        post_author_party = post['party']
        
        # Reconstruct likes (simplified - in real version, track each liker)
        for party, count in post['likes_by_party'].items():
            for _ in range(count):
                like_interactions.append((party, post_author_party))
        
        # Comments
        for comment in post['comments']:
            comment_interactions.append((comment['party'], post_author_party))
    
    # Calculate E-I indices
    ei_likes = calculate_ei_index(like_interactions)
    ei_comments = calculate_ei_index(comment_interactions)
    
    # Party distribution of posts
    party_counts = Counter([p['party'] for p in posts])
    
    analysis = {
        'platform': results['platform'],
        'n_posts': len(posts),
        'n_likes': results['total_likes'],
        'n_comments': results['total_comments'],
        'ei_index_likes': ei_likes,
        'ei_index_comments': ei_comments,
        'party_distribution': dict(party_counts),
        'like_interactions': like_interactions,
        'comment_interactions': comment_interactions
    }
    
    return analysis


def print_analysis(analysis: Dict):
    """
    Print analysis results in readable format
    """
    print("\n" + "="*70)
    print(f"ANALYSIS: {analysis['platform'].upper()} PLATFORM")
    print("="*70)
    
    print(f"\nContent:")
    print(f"  Posts: {analysis['n_posts']}")
    print(f"  Likes: {analysis['n_likes']}")
    print(f"  Comments: {analysis['n_comments']}")
    
    print(f"\nParty Distribution of Posts:")
    for party, count in analysis['party_distribution'].items():
        print(f"  {party}: {count}")
    
    print(f"\nCross-Partisan Engagement (E-I Index):")
    print(f"  Likes: {analysis['ei_index_likes']:.3f}")
    print(f"  Comments: {analysis['ei_index_comments']:.3f}")
    
    print(f"\nInterpretation:")
    if analysis['ei_index_likes'] < -0.3:
        print(f"  → Likes: Strong ECHO CHAMBER (mostly within-party)")
    elif analysis['ei_index_likes'] > 0.3:
        print(f"  → Likes: Strong BRIDGING (mostly cross-party)")
    else:
        print(f"  → Likes: MIXED (balance of within/cross-party)")
    
    if analysis['ei_index_comments'] < -0.3:
        print(f"  → Comments: Strong ECHO CHAMBER (mostly within-party)")
    elif analysis['ei_index_comments'] > 0.3:
        print(f"  → Comments: Strong BRIDGING (mostly cross-party)")
    else:
        print(f"  → Comments: MIXED (balance of within/cross-party)")


print("✓ Analysis functions defined")

# Example with mock data
print("\nExample E-I Index calculation:")
mock_interactions = [
    ('Democrat', 'Democrat'),  # Internal
    ('Democrat', 'Republican'),  # External
    ('Republican', 'Democrat'),  # External
    ('Republican', 'Republican'),  # Internal
]
ei = calculate_ei_index(mock_interactions)
print(f"Interactions: 2 internal, 2 external")
print(f"E-I Index: {ei:.3f} (perfectly balanced)")

**What this code does:**

**E-I Index calculation:**
- Tracks all interactions (likes, comments)
- Classifies each as internal (same party) or external (different party)
- Calculates balance:
  - -1.0 = Complete echo chamber
  - 0.0 = Perfect balance
  - +1.0 = Complete bridging

**Expected results from Törnberg et al. (2023):**

| Platform | E-I Likes | E-I Comments | Toxicity |
|----------|-----------|--------------|----------|
| 1 (Echo) | -0.97 | -0.89 | 0.09 |
| 2 (Global) | -0.78 | -0.70 | 0.13 |
| 3 (Bridging) | -0.18 | +0.33 | 0.07 |

**Key findings:**
1. **Platform 1**: Near-total echo chamber, low toxicity
2. **Platform 2**: Still mostly echo chamber (despite global feed!), higher toxicity
3. **Platform 3**: Substantial cross-party engagement, LOWEST toxicity

### Visualize results (with mock data)

In [None]:
# Create mock results matching Törnberg et al. findings
mock_results = pd.DataFrame([
    {'Platform': 'Platform 1\n(Echo Chamber)', 'E-I Index (Likes)': -0.97, 'E-I Index (Comments)': -0.89, 'Toxicity': 0.09},
    {'Platform': 'Platform 2\n(Global Feed)', 'E-I Index (Likes)': -0.78, 'E-I Index (Comments)': -0.70, 'Toxicity': 0.13},
    {'Platform': 'Platform 3\n(Bridging)', 'E-I Index (Likes)': -0.18, 'E-I Index (Comments)': 0.33, 'Toxicity': 0.07},
])

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: E-I Indices
ax = axes[0]
x = np.arange(len(mock_results))
width = 0.35

ax.bar(x - width/2, mock_results['E-I Index (Likes)'], width, label='Likes', alpha=0.8, color='steelblue')
ax.bar(x + width/2, mock_results['E-I Index (Comments)'], width, label='Comments', alpha=0.8, color='coral')

ax.set_ylabel('E-I Index')
ax.set_xlabel('Platform')
ax.set_title('Cross-Partisan Engagement by Platform\n(Törnberg et al. 2023 findings)')
ax.set_xticks(x)
ax.set_xticklabels(mock_results['Platform'])
ax.axhline(0, color='black', linestyle='--', alpha=0.3, label='Balanced')
ax.legend()
ax.grid(axis='y', alpha=0.3)

# Add interpretation labels
ax.text(0.02, -0.5, 'Echo\nChamber', ha='left', va='center', fontsize=9, alpha=0.5)
ax.text(0.02, 0.5, 'Cross-Party\nBridging', ha='left', va='center', fontsize=9, alpha=0.5)

# Plot 2: Toxicity
ax = axes[1]
colors = ['lightgray', 'coral', 'lightgreen']
ax.bar(mock_results['Platform'], mock_results['Toxicity'], alpha=0.7, color=colors)
ax.set_ylabel('Toxicity Score')
ax.set_xlabel('Platform')
ax.set_title('Toxicity by Platform\n(Törnberg et al. 2023 findings)')
ax.set_ylim(0, 0.20)
ax.grid(axis='y', alpha=0.3)

# Add reference line
ax.axhline(0.13, color='red', linestyle='--', alpha=0.5, label='US Twitter average')
ax.legend()

plt.tight_layout()
plt.show()

print("\nKey Insights:")
print("="*70)
print("1. Platform 1 (Echo Chamber):")
print("   • Minimal cross-partisan engagement (E-I ≈ -0.9)")
print("   • Low toxicity (0.09) - but at cost of isolation")
print("")
print("2. Platform 2 (Global Feed):")
print("   • Still mostly echo chamber (E-I ≈ -0.7)")
print("   • HIGHEST toxicity (0.13) - matches Twitter average")
print("   • Breaking bubbles ≠ healthy discourse")
print("")
print("3. Platform 3 (Bridging Algorithm):")
print("   • Substantial cross-party engagement (E-I comments = +0.33!)")
print("   • LOWEST toxicity (0.07) - healthiest discourse")
print("   • Achieves both goals: bridging + civility")
print("")
print("Conclusion: Bridging algorithm succeeds where others fail.")
print("It's not enough to break echo chambers - need smarter algorithms.")

---

## Open-Source Alternatives

The simulation above uses OpenAI's API. Here's how to run it with open-source models.

### Using Ollama

Ollama provides a local alternative with OpenAI-compatible API.

In [None]:
# Install and use Ollama
# !pip install -q ollama

# from ollama import Client as OllamaClient
# ollama_client = OllamaClient(host='http://localhost:11434')

# Modify SocialMediaAgent to use Ollama:
# Simply replace `client.chat.completions.create()` with:
# ollama_client.chat(model="llama3.2", messages=[...])

print("To use Ollama:")
print("1. Install: https://ollama.com/download")
print("2. Pull model: ollama pull llama3.2")
print("3. Modify SocialMediaAgent class to use ollama_client")
print("")
print("Advantages:")
print("  • Free (runs locally)")
print("  • Private (no data sent to API)")
print("  • Reproducible (fixed model weights)")
print("")
print("Disadvantages:")
print("  • Slower (depends on your hardware)")
print("  • Lower quality (smaller models than GPT-4)")
print("  • Requires local installation")

### Using Hugging Face

For full control, use Hugging Face Transformers directly.

In [None]:
# from transformers import AutoTokenizer, AutoModelForCausalLM
# import torch

# device = "cuda" if torch.cuda.is_available() else "cpu"
# model_name = "microsoft/Phi-3-mini-4k-instruct"

# tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# model = AutoModelForCausalLM.from_pretrained(
#     model_name,
#     torch_dtype=torch.float16,
#     trust_remote_code=True
# ).to(device)

print("To use Hugging Face:")
print("1. Load model with transformers library")
print("2. Modify SocialMediaAgent to use model.generate()")
print("3. Apply chat template for proper formatting")
print("")
print("Best models for this task:")
print("  • Llama 3.2 (8B): Good balance")
print("  • Mistral 7B: Fast and capable")
print("  • Phi-3-mini: Smallest (3.8B), still decent")
print("")
print("Requires:")
print("  • GPU with 8-16GB VRAM (or use CPU slowly)")
print("  • Google Colab: Use T4 GPU (free tier)")

---

## Summary

**What we learned:**
1. ✓ How to create **realistic agent personas** from survey data
2. ✓ How to implement **agent-based simulations** with LLMs
3. ✓ How to test **alternative platform algorithms** (echo, global, bridging)
4. ✓ How to measure **emergent properties** (cross-partisan engagement, toxicity)
5. ✓ How LLMs enable **realistic discourse** in simulations

**Key findings from Törnberg et al. (2023):**
- **Echo chambers**: Isolated but civil
- **Global feeds**: Exposure doesn't guarantee healthy discourse
- **Bridging algorithms**: Can achieve both cross-partisan engagement AND civility
- **Mechanism**: Rewarding consensus (not just engagement) changes incentives

**Advantages of LLM-based simulation:**
- ✓ **Rapid prototyping**: Test ideas before expensive field work
- ✓ **Realistic discourse**: Agents can actually converse
- ✓ **Ethical**: No risk to human subjects
- ✓ **Counterfactuals**: Test non-existent alternatives
- ✓ **Transparency**: Full visibility into agent reasoning

**Limitations to remember:**
- ⚠ **External validity**: Do LLMs predict human behavior?
- ⚠ **Training data bias**: LLMs reflect internet discourse (potentially toxic)
- ⚠ **Simplifications**: Real social media more complex (images, videos, networks)
- ⚠ **Cost**: API calls add up quickly with many agents
- ⚠ **Reproducibility**: Proprietary models may change over time

**When to use this approach:**
- ✓ Exploring design space of interventions
- ✓ Testing theoretical mechanisms
- ✓ Generating hypotheses for human studies
- ✗ As sole evidence for claims about humans
- ✗ Without validation against real data

**Best practices:**
1. **Ground in data**: Use survey data for personas (ANES, etc.)
2. **Validate outcomes**: Compare to real social media metrics
3. **Document everything**: Model versions, prompts, parameters
4. **Run sensitivity analyses**: Test robustness to parameters
5. **Complement with human studies**: Simulations guide, not replace

**Future directions:**
- Longer timeframes (multi-day simulations)
- Larger agent populations (100-1000+)
- Richer social networks (weighted, dynamic)
- Memory and learning (agents change over time)
- Multi-modal content (images, memes)
- Platform combinations (multiple algorithms)
- Human-in-the-loop validation

**Related applications:**
- Testing content moderation policies
- Studying misinformation spread
- Designing recommendation algorithms
- Evaluating polarization interventions
- Simulating online communities
- Prototyping new social platforms

---

**Next steps:**
- Try running the simulation with different parameters
- Compare results across all three platforms
- Extend with your own algorithms, personas, or simulation scenarios