# Political Bias Opinion Distribution Dataset

This notebook demonstrates working with the **cajcodes/political-bias** dataset, which provides a 5-point political bias scale ideal for agent-based opinion dynamics simulations.

## Dataset Overview
- **Source**: cajcodes/political-bias
- **Scale**: 5-point (0=far_right to 4=far_left)
- **Normalized Scale**: -1.0 to +1.0 (right to left)
- **Use Case**: Opinion dynamics, agent-based modeling, political bias classification

## 1. Load the Dataset

In [None]:
import json
import urllib.request
import os

# Configuration
GITHUB_RAW_URL = "https://raw.githubusercontent.com/AMGrobelnik/ai-invention-ab58ab-perception-asymmetry-feedback-loop-how-d/main/political_bias_opinion_distribution_dataset_for_agent_simula/demo/demo_data.json"
LOCAL_FILE = "demo_data.json"

def load_data():
    """Load data from GitHub URL (for Colab) or local file (for local development)."""
    # Try GitHub URL first (works in Colab)
    try:
        print(f"Attempting to load from GitHub: {GITHUB_RAW_URL}")
        with urllib.request.urlopen(GITHUB_RAW_URL) as response:
            print("Successfully loaded from GitHub")
            return json.loads(response.read().decode("utf-8"))
    except Exception as e:
        print(f"GitHub fetch failed: {e}")
    
    # Fallback to local file
    if os.path.exists(LOCAL_FILE):
        print(f"Loading from local file: {LOCAL_FILE}")
        with open(LOCAL_FILE, "r", encoding="utf-8") as f:
            return json.load(f)
    
    raise FileNotFoundError("Could not load data from GitHub or local file")

# Load the dataset
data = load_data()
examples = data["examples"]
print(f"Loaded {len(examples)} examples")

## 2. Explore the Data Structure

In [None]:
# View the structure of a single example
example = examples[0]
print("Example structure:")
print(json.dumps(example, indent=2))

In [None]:
# Extract key fields for analysis
for i, ex in enumerate(examples[:5]):
    print(f"\n--- Example {i+1} ---")
    print(f"Bias Category: {ex['context']['bias_category']}")
    print(f"Opinion Score: {ex['context']['opinion_score']}")
    # Extract the statement from the input
    statement = ex['input'].split('"')[1]
    print(f"Statement: {statement[:80]}..." if len(statement) > 80 else f"Statement: {statement}")

## 3. Analyze Opinion Distribution

In [None]:
from collections import Counter

# Count examples by bias category
categories = [ex["context"]["bias_category"] for ex in examples]
category_counts = Counter(categories)

print("Distribution by Bias Category:")
print("=" * 40)
# Order from far_right to far_left
order = ["far_right", "right", "center", "left", "far_left"]
for cat in order:
    count = category_counts.get(cat, 0)
    bar = "█" * count
    print(f"{cat:12} | {bar} ({count})")

In [None]:
# Extract opinion scores for numerical analysis
opinion_scores = [ex["context"]["opinion_score"] for ex in examples]

print("Opinion Score Statistics:")
print("=" * 40)
print(f"Min:  {min(opinion_scores)}")
print(f"Max:  {max(opinion_scores)}")
print(f"Mean: {sum(opinion_scores) / len(opinion_scores):.2f}")
print(f"\nScore mapping:")
print("  -1.0 = far_right")
print("  -0.5 = right")
print("   0.0 = center")
print("   0.5 = left")
print("   1.0 = far_left")

## 4. Prepare Data for Agent-Based Simulations

In [None]:
def extract_statement(input_text):
    """Extract the statement from the classification input."""
    try:
        return input_text.split('"')[1]
    except IndexError:
        return input_text

# Create a simplified dataset for agent simulations
agent_data = []
for ex in examples:
    agent_data.append({
        "statement": extract_statement(ex["input"]),
        "opinion_score": ex["context"]["opinion_score"],
        "bias_category": ex["context"]["bias_category"]
    })

print("Agent-ready data format:")
print(json.dumps(agent_data[0], indent=2))

In [None]:
# Group statements by opinion score for simulation scenarios
by_opinion = {}
for item in agent_data:
    score = item["opinion_score"]
    if score not in by_opinion:
        by_opinion[score] = []
    by_opinion[score].append(item["statement"])

print("Statements grouped by opinion score:")
print("=" * 50)
for score in sorted(by_opinion.keys()):
    statements = by_opinion[score]
    print(f"\nScore {score:+.1f} ({len(statements)} statements):")
    for stmt in statements[:2]:  # Show first 2
        print(f"  • {stmt[:60]}..." if len(stmt) > 60 else f"  • {stmt}")

## 5. Visualization (Optional)

In [None]:
try:
    import matplotlib.pyplot as plt
    
    # Create histogram of opinion scores
    fig, ax = plt.subplots(figsize=(10, 5))
    
    # Plot histogram
    ax.hist(opinion_scores, bins=5, range=(-1.25, 1.25), 
            color='steelblue', edgecolor='white', alpha=0.8)
    
    # Customize plot
    ax.set_xlabel('Opinion Score', fontsize=12)
    ax.set_ylabel('Count', fontsize=12)
    ax.set_title('Distribution of Political Opinion Scores', fontsize=14)
    ax.set_xticks([-1.0, -0.5, 0.0, 0.5, 1.0])
    ax.set_xticklabels(['Far Right\n(-1.0)', 'Right\n(-0.5)', 'Center\n(0.0)', 
                       'Left\n(0.5)', 'Far Left\n(1.0)'])
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
except ImportError:
    print("matplotlib not available. Install with: pip install matplotlib")
    print("Skipping visualization.")

## Summary

This dataset provides:
- **15 diverse examples** spanning the full political spectrum
- **5-point scale** from far_right (-1.0) to far_left (+1.0)
- **Ready-to-use format** for opinion dynamics simulations
- **Structured context** with bias category and normalized scores

### Use Cases
1. **Agent-Based Modeling**: Use opinion scores to initialize agent beliefs
2. **Polarization Studies**: Analyze how extreme vs. moderate opinions interact
3. **Feedback Loop Analysis**: Study perception asymmetry in political discourse