# Political Bias Opinion Dynamics Dataset Demo

This notebook demonstrates the **cajcodes/political-bias** dataset, which provides political bias classifications on a 5-point scale. The data is ideal for agent-based opinion dynamics simulations.

## Dataset Overview
- **Source**: cajcodes/political-bias
- **Scale**: 5-point (0=far_right to 4=far_left)
- **Normalized Scale**: -1.0 to +1.0 (right to left)
- **Categories**: far_right, right, center, left, far_left

## 1. Load Data

Load the dataset from GitHub (or local fallback for development).

In [None]:
import json
import urllib.request
from pathlib import Path

# GitHub raw URL for the demo data
GITHUB_DATA_URL = "https://raw.githubusercontent.com/AMGrobelnik/ai-invention-e59d7b-perception-asymmetry-feedback-loop-how-d/main/data_exec_iter1_idx3/demo/demo_data.json"
LOCAL_DATA_PATH = Path("demo_data.json")

def load_data():
    """Load data from GitHub URL with local fallback."""
    # Try GitHub URL first
    try:
        with urllib.request.urlopen(GITHUB_DATA_URL, timeout=10) as response:
            data = json.loads(response.read().decode('utf-8'))
            print(f"Loaded data from GitHub: {GITHUB_DATA_URL}")
            return data
    except Exception as e:
        print(f"GitHub fetch failed ({e}), trying local file...")
    
    # Fallback to local file
    if LOCAL_DATA_PATH.exists():
        with open(LOCAL_DATA_PATH, 'r', encoding='utf-8') as f:
            data = json.load(f)
            print(f"Loaded data from local file: {LOCAL_DATA_PATH}")
            return data
    
    raise FileNotFoundError("Could not load data from GitHub or local file")

# Load the dataset
data = load_data()
examples = data['examples']
print(f"\nLoaded {len(examples)} examples")

## 2. Explore Data Structure

Examine the structure of individual examples.

In [None]:
# Display first example structure
print("Example data structure:")
print(json.dumps(examples[0], indent=2))

In [None]:
# Extract key fields for analysis
statements = []
opinion_scores = []
bias_categories = []

for ex in examples:
    # Extract statement from input (remove the prompt prefix)
    statement = ex['input'].replace('Classify the political bias of this statement: "', '').rstrip('"')
    statements.append(statement)
    opinion_scores.append(ex['context']['opinion_score'])
    bias_categories.append(ex['context']['bias_category'])

print(f"Extracted {len(statements)} statements with opinion scores")

## 3. Analyze Distribution

Examine the distribution of political bias categories and opinion scores.

In [None]:
from collections import Counter

# Count bias categories
category_counts = Counter(bias_categories)

print("Bias Category Distribution:")
print("=" * 40)
for category in ['far_right', 'right', 'center', 'left', 'far_left']:
    count = category_counts.get(category, 0)
    bar = '*' * count
    print(f"{category:12} | {bar} ({count})")

In [None]:
# Opinion score statistics
print("\nOpinion Score Statistics:")
print("=" * 40)
print(f"Min score:  {min(opinion_scores):.1f} (far right)")
print(f"Max score:  {max(opinion_scores):.1f} (far left)")
print(f"Mean score: {sum(opinion_scores)/len(opinion_scores):.2f}")

# Score distribution
score_counts = Counter(opinion_scores)
print("\nScore Distribution:")
for score in sorted(score_counts.keys()):
    count = score_counts[score]
    bar = '*' * count
    print(f"{score:5.1f} | {bar} ({count})")

## 4. Sample Statements by Category

View example statements from each political bias category.

In [None]:
# Group statements by category
statements_by_category = {}
for statement, category in zip(statements, bias_categories):
    if category not in statements_by_category:
        statements_by_category[category] = []
    statements_by_category[category].append(statement)

# Display one example from each category
print("Sample Statements by Political Bias:")
print("=" * 60)
for category in ['far_right', 'right', 'center', 'left', 'far_left']:
    if category in statements_by_category:
        print(f"\n[{category.upper()}]")
        print(f"  \"{statements_by_category[category][0]}\"")

## 5. Opinion Dynamics Simulation Potential

This dataset is ideal for agent-based opinion dynamics models because:

1. **Continuous Scale**: Opinion scores range from -1.0 to +1.0, enabling smooth opinion transitions
2. **Multimodal Clustering**: Data naturally clusters into 5 distinct positions
3. **Real-World Grounding**: Based on actual political statements with known biases

In [None]:
# Create a simple DataFrame-like summary for opinion dynamics
print("Opinion Dynamics Data Summary:")
print("=" * 70)
print(f"{'Index':<6} {'Score':>6} {'Category':<12} {'Statement (truncated)':<40}")
print("-" * 70)

for i, (score, cat, stmt) in enumerate(zip(opinion_scores, bias_categories, statements)):
    truncated = stmt[:37] + "..." if len(stmt) > 40 else stmt
    print(f"{i:<6} {score:>6.1f} {cat:<12} {truncated:<40}")

## 6. Export for Further Analysis

Prepare data in formats suitable for opinion dynamics simulations.

In [None]:
# Create a simplified dataset for opinion dynamics modeling
opinion_data = [
    {
        'id': i,
        'opinion_score': score,
        'bias_category': cat,
        'statement': stmt
    }
    for i, (score, cat, stmt) in enumerate(zip(opinion_scores, bias_categories, statements))
]

print(f"Prepared {len(opinion_data)} records for opinion dynamics simulation")
print("\nFirst 3 records:")
for record in opinion_data[:3]:
    print(f"  ID {record['id']}: score={record['opinion_score']:.1f}, category={record['bias_category']}")