# Process Topics

Filter and analyze political and emotional topics for the main experiment studies.

**Prerequisites**: Run the processing scripts first:
- `python scripts/process_political_issues.py` → produces `yougov_survey_stances_processed.csv`
- `python scripts/process_emotional_topics.py` → produces `emotchat_topics.csv`

In [1]:
from ast import literal_eval
from pathlib import Path

import pandas as pd

# Paths
PROJECT_ROOT = Path("..").resolve()
STIMULI_DIR = PROJECT_ROOT / "stimuli" / "main_studies"
INPUT_DIR = STIMULI_DIR / "inputs"

# Input files (from processing scripts)
POLITICAL_PROCESSED_PATH = INPUT_DIR / "yougov_survey_stances_processed.csv"
EMOTIONAL_PROCESSED_PATH = INPUT_DIR / "emotchat_topics.csv"

# Output files
POLCHAT_OUTPUT_PATH = INPUT_DIR / "polchat_topics.csv"

## 1. Political Topics: Compute Partisan Leaning

In [2]:
processed_df = pd.read_csv(POLITICAL_PROCESSED_PATH)
print(f"Loaded {len(processed_df)} processed political survey items")

Loaded 60 processed political survey items


In [3]:
def analyze_poll(agg_results_pct, threshold=15.0):
    """
    Analyze aggregated poll results and return partisan leaning information.
    """
    agg_results_pct = literal_eval(agg_results_pct)
    con_support = agg_results_pct["Con"].get("support", 0)
    lab_support = agg_results_pct["Lab"].get("support", 0)
    overall_support = agg_results_pct["All"].get("support", 0)

    partisan_diff = lab_support - con_support

    if abs(partisan_diff) < threshold:
        leaning = "Neutral"
        partisan_leaning = None
    else:
        leaning = "Partisan"
        partisan_leaning = "Labour" if partisan_diff > 0 else "Conservative"

    return {
        "leaning": leaning,
        "partisan_diff": partisan_diff,
        "partisan_leaning": partisan_leaning,
        "con_support": con_support,
        "lab_support": lab_support,
        "overall_support": overall_support,
    }

In [4]:
partisan_rows = []
for _, row in processed_df.iterrows():
    pol_stance_results = analyze_poll(row["agg_results_pct"], threshold=20.0)
    row["leaning"] = pol_stance_results["leaning"]
    row["partisan_diff"] = pol_stance_results["partisan_diff"]
    row["partisan_leaning"] = pol_stance_results["partisan_leaning"]
    row["con_support"] = pol_stance_results["con_support"]
    row["lab_support"] = pol_stance_results["lab_support"]
    row["overall_support"] = pol_stance_results["overall_support"]
    partisan_rows.append(row)

candidate_df = pd.DataFrame(partisan_rows)

### Filter Political Candidates

In [5]:
# Remove extreme items where overall_support is <30% or >70%
candidate_df = candidate_df[
    (candidate_df["overall_support"] < 70) & (candidate_df["overall_support"] > 30)
]
print(f"Remaining after removing extreme items: {len(candidate_df)}")

Remaining after removing extreme items: 41


In [6]:
print("Leaning distribution:")
print(candidate_df["leaning"].value_counts())
print()
print("Partisan leaning breakdown:")
print(candidate_df[["leaning", "partisan_leaning"]].value_counts())

Leaning distribution:
leaning
Neutral     31
Partisan    10
Name: count, dtype: int64

Partisan leaning breakdown:
leaning   partisan_leaning
Partisan  Labour              7
          Conservative        3
Name: count, dtype: int64


In [7]:
# Keep only neutral statements
candidate_df = candidate_df[candidate_df["leaning"] == "Neutral"]
print(f"Remaining after removing partisan items: {len(candidate_df)}")

Remaining after removing partisan items: 31


In [8]:
candidate_df = candidate_df.reset_index(drop=True)
for i, row in candidate_df.iterrows():
    print(f"{i}: {row['statement']}")

0: The U.K. SHOULD allow the construction of nuclear power stations in local communities across the country.
1: The U.K. SHOULD ban new oil and gas developments in Britain's North Sea territory.
2: The U.K. SHOULD allow the NHS to provide weight-loss jabs on prescription from high-street pharmacies without requiring patients to see a doctor first.
3: The U.K. SHOULD nationalize British Steel, bringing it under government ownership and control.
4: The U.K. SHOULD sign a defence and security partnership with the European Union.
5: The U.K. SHOULD participate in the creation of a European army that would include British forces.
6: The U.K. SHOULD nationalise British Steel to protect the industry and its workers.
7: The U.K. SHOULD give judges the option of sentencing criminals to house arrest.
8: The U.K. SHOULD increase its NATO defence spending commitment from 2% of GDP to 3% of GDP.
9: The U.K. SHOULD exempt parents with children under the age of five from the two-child benefit cap.
10

### Manual Review

Remove any topics that are too similar to each other.

In [9]:
# Indices of topics that are too similar (manual review)
too_similar = [3, 4, 18, 24, 21, 23]

polchat_df = candidate_df.drop(too_similar)
print(f"Final political topics after removing similar items: {len(polchat_df)}")

Final political topics after removing similar items: 25


In [10]:
polchat_df.to_csv(POLCHAT_OUTPUT_PATH, index=False)
rel_path = POLCHAT_OUTPUT_PATH.relative_to(PROJECT_ROOT)
print(f"Saved political topics to {rel_path}")

Saved political topics to stimuli/main_studies/inputs/polchat_topics.csv


---

## 2. Topic Summaries

Load final topic files and summarize.

In [11]:
# Reload final topic files
polchat_topics = pd.read_csv(POLCHAT_OUTPUT_PATH)
emotchat_topics = pd.read_csv(EMOTIONAL_PROCESSED_PATH)

print(f"Political topics: {len(polchat_topics)}")
print(f"Emotional topics: {len(emotchat_topics)}")

Political topics: 25
Emotional topics: 25


### Political Topics Summary

In [12]:
print("Category distribution:")
print(polchat_topics["category"].value_counts())
print()
print("Sample statements:")
for _, row in polchat_topics.head(5).iterrows():
    print(f"  - [{row['description']}] {row['statement']}")

Category distribution:
category
Health              5
Defence             3
Energy              2
Legal               2
Transport           2
Education           2
Environment         2
Economy             1
Welfare             1
Employment          1
Science             1
Alcohol             1
Housing/Taxation    1
Time                1
Name: count, dtype: int64

Sample statements:
  - [Nuclear power plants] The U.K. SHOULD allow the construction of nuclear power stations in local communities across the country.
  - [North Sea oil and gas development] The U.K. SHOULD ban new oil and gas developments in Britain's North Sea territory.
  - [NHS weight-loss medication access] The U.K. SHOULD allow the NHS to provide weight-loss jabs on prescription from high-street pharmacies without requiring patients to see a doctor first.
  - [European army including the UK] The U.K. SHOULD participate in the creation of a European army that would include British forces.
  - [Nationalisation of British

### Emotional Topics Summary

In [13]:
print("Domain distribution:")
print(emotchat_topics["pathway_name"].value_counts())
print()
print("Sample topics:")
for _, row in emotchat_topics.head(5).iterrows():
    print(f"  - [{row['description']}] {row['neutral_question_text']}")

Domain distribution:
pathway_name
Careers          10
Health            8
Relationships     7
Name: count, dtype: int64

Sample topics:
  - [Sleep difficulties] How would you describe your sleep quality? Do you ever have difficulty sleeping?
  - [Mental health and mood] How would you describe your daily moods and emotional well-being? Do you suffer from low mood, or ever feel anxious?
  - [Cognitive function and concentration] How would you rate your ability to focus and concentrate throughout the day? Do you ever have any "brain fog" or difficulty concentrating?
  - [Energy levels and fatigue] How would you describe your energy levels on a typical day? Do you often feel tired?
  - [Physical activity levels] How satisfied are you with your current level of physical activity? Do you feel like you get too little exercise?


In [14]:
print("\n" + "=" * 50)
print("FINAL TOPIC COUNTS")
print("=" * 50)
print(f"Political topics (polchat): {len(polchat_topics)}")
print(f"Emotional topics (emotchat): {len(emotchat_topics)}")
print(f"Total: {len(polchat_topics) + len(emotchat_topics)}")


FINAL TOPIC COUNTS
Political topics (polchat): 25
Emotional topics (emotchat): 25
Total: 50
