# How Swiss Parties Mobilize Voters on Instagram and TikTok
## Reproducible Analysis Notebook

**Group**: Nick Eichmann, Marc Eggenberger, Sarah Häusermann, David Rothschild

This notebook contains all hypothesis tests from the paper. Run all cells to reproduce the results.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DavidSimonRothschild/Introduction-to-Computational-Media-Research/blob/main/analysis_notebook.ipynb)

## Setup

In [None]:
# Clone repo if running in Colab
import os
if 'COLAB_GPU' in os.environ or 'google.colab' in str(get_ipython()):
    !git clone https://github.com/DavidSimonRothschild/Introduction-to-Computational-Media-Research.git repo
    os.chdir('repo')
    PROJECT_ROOT = os.getcwd()
else:
    PROJECT_ROOT = os.path.dirname(os.path.abspath('__file__'))

In [None]:
# Install dependencies
!pip install -q pandas numpy scipy statsmodels

In [None]:
# Imports
import pandas as pd
import numpy as np
import glob
import os
from pathlib import Path
from scipy.stats import mannwhitneyu, spearmanr
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Paths
INSTAGRAM_FOLDER = os.path.join(PROJECT_ROOT, "A_Data", "2_Instagram", "2_CLEAN")
TIKTOK_FOLDER = os.path.join(PROJECT_ROOT, "A_Data", "1_Tiktok", "2_CLEAN")
NETWORK_INSTAGRAM = os.path.join(PROJECT_ROOT, "1_Processing", "2_Analysis", "2_Network Analysis", "party_mentions_edges_instagram.csv")
NETWORK_TIKTOK = os.path.join(PROJECT_ROOT, "1_Processing", "2_Analysis", "2_Network Analysis", "party_mentions_edges_tiktok.csv")

print(f"Project root: {PROJECT_ROOT}")

In [None]:
# Helper function to load platform data
def load_platform(path):
    files = glob.glob(os.path.join(path, "*.csv"))
    dfs = []
    for f in files:
        df = pd.read_csv(f)
        df["party_file"] = os.path.basename(f)
        dfs.append(df)
    return pd.concat(dfs, ignore_index=True)

# Load data
tiktok = load_platform(TIKTOK_FOLDER)
instagram = load_platform(INSTAGRAM_FOLDER)

tiktok["platform"] = "tiktok"
instagram["platform"] = "instagram"

df = pd.concat([tiktok, instagram], ignore_index=True)
print(f"Total posts loaded: {len(df)}")
print(f"Instagram: {len(instagram)}, TikTok: {len(tiktok)}")

---
## H1: Voting Issues and Engagement

**Hypothesis**: Posts connecting to the latest voting issues get higher engagement than posts related to non-voting issues.

**Test**: Mann-Whitney U test comparing engagement scores between voting-related and non-voting posts.

In [None]:
# H1: Voting vs Non-Voting Engagement

df_h1 = df.dropna(subset=["engagement_score", "voting.topic"])

# Create voting categories
df_h1["voting_cat"] = df_h1["voting.topic"].map({
    0: "none",
    1: "eid_only",
    2: "eigenmietwert_only",
    3: "both"
})

print("=" * 60)
print("H1: VOTING ISSUES AND ENGAGEMENT")
print("=" * 60)

print("\n--- Descriptive statistics by voting category ---")
print(df_h1.groupby("voting_cat")["engagement_score"].describe())

In [None]:
# Binary comparison: any voting vs none
df_h1["is_any_voting"] = (df_h1["voting.topic"] > 0).astype(int)

voting_all = df_h1.loc[df_h1.is_any_voting == 1, "engagement_score"]
non_voting = df_h1.loc[df_h1.is_any_voting == 0, "engagement_score"]

u, pval = mannwhitneyu(voting_all, non_voting, alternative="greater")

print("\n--- Combined voting (1,2,3) vs non-voting (0) ---")
print(f"n voting: {len(voting_all)}, n non-voting: {len(non_voting)}")
print(f"Mann-Whitney U p-value (voting > non-voting): {pval:.4f}")

med_v = voting_all.median()
med_nv = non_voting.median()

print(f"\nMedian engagement – voting: {med_v:.2f}")
print(f"Median engagement – non-voting: {med_nv:.2f}")
print(f"Median shift (voting – non-voting): {med_v - med_nv:.2f}")

if pval > 0.95:
    print("\n>>> RESULT: H1 REJECTED - Voting posts have LOWER engagement!")
else:
    print("\n>>> RESULT: H1 supported" if pval < 0.05 else ">>> RESULT: No significant difference")

In [None]:
# Platform comparison
print("\n--- Platform comparison: voting vs non-voting ---")

results_h1 = []
for p in ["instagram", "tiktok"]:
    sub = df_h1[df_h1.platform == p]
    voting = sub.loc[sub.is_any_voting == 1, "engagement_score"]
    non_voting = sub.loc[sub.is_any_voting == 0, "engagement_score"]
    
    u, pval = mannwhitneyu(voting, non_voting, alternative="greater")
    
    results_h1.append({
        "Platform": p,
        "Median (voting)": voting.median(),
        "Median (non-voting)": non_voting.median(),
        "Median shift": voting.median() - non_voting.median(),
        "p-value": pval
    })

pd.DataFrame(results_h1)

---
## H2: Temporal Proximity to Voting Day

**Hypothesis**: Posts that are closer to the voting date generate higher user engagement.

**Test**: Spearman rank correlation between days-to-vote and engagement score.

In [None]:
# H2: Temporal Proximity

VOTING_DAY = pd.Timestamp("2025-09-28")

df_h2 = df.dropna(subset=["engagement_score", "voting.topic", "data.createTime"])
df_h2["post_date"] = pd.to_datetime(df_h2["data.createTime"])
df_h2["days_to_vote"] = (VOTING_DAY - df_h2["post_date"]).dt.days

# Keep only pre-vote period
pre_vote = df_h2[df_h2["days_to_vote"] >= 0]

print("=" * 60)
print("H2: TEMPORAL PROXIMITY TO VOTING DAY")
print("=" * 60)
print(f"\nPosts before voting day: {len(pre_vote)}")

In [None]:
# Spearman correlation
rho, pval = spearmanr(
    pre_vote["days_to_vote"],
    pre_vote["engagement_score"],
    alternative="less"
)

print("\n--- Spearman correlation: days_to_vote vs engagement ---")
print(f"Spearman rho: {rho:.3f}")
print(f"p-value (negative correlation): {pval:.4f}")

if pval > 0.95:
    print("\n>>> RESULT: H2 REJECTED - Posts FURTHER from vote have higher engagement!")
elif pval < 0.05:
    print("\n>>> RESULT: H2 supported")
else:
    print("\n>>> RESULT: No significant relationship")

In [None]:
# Binned medians
pre_vote_copy = pre_vote.copy()
pre_vote_copy["time_bin"] = pd.cut(
    pre_vote_copy["days_to_vote"],
    bins=[0, 7, 30, 90, 180],
    labels=["0-7d", "8-30d", "31-90d", "91-180d"]
)

print("\n--- Median engagement by time bin (pre-vote) ---")
binned = pre_vote_copy.groupby("time_bin")["engagement_score"].agg(["median", "count"])
print(binned)

---
## H3: Negativity and Engagement

**Hypothesis**: The higher the negativity score, the higher the engagement of a post.

**Test**: OLS regression of engagement on sentiment (negative coefficient = support for H3).

In [None]:
# H3: Sentiment and Engagement

SENTIMENT_COL = "sentiment_rulebased"
Y_COL = "engagement_score"

def run_party_regression(df, x_col, y_col):
    d = df[[x_col, y_col]].copy()
    d[x_col] = pd.to_numeric(d[x_col], errors="coerce")
    d[y_col] = pd.to_numeric(d[y_col], errors="coerce")
    d = d.dropna()
    
    if len(d) < 10:
        return None, d
    
    X = sm.add_constant(d[x_col])
    y = d[y_col]
    model = sm.OLS(y, X).fit(cov_type="HC3")
    
    return {
        "n": int(model.nobs),
        "beta_sentiment": float(model.params[x_col]),
        "se_hc3": float(model.bse[x_col]),
        "p_hc3": float(model.pvalues[x_col]),
        "r2": float(model.rsquared),
    }, d

print("=" * 60)
print("H3: NEGATIVITY AND ENGAGEMENT")
print("=" * 60)

In [None]:
# Run for both platforms
results_h3 = []

for platform, folder in [("Instagram", INSTAGRAM_FOLDER), ("TikTok", TIKTOK_FOLDER)]:
    print(f"\n--- {platform} ---")
    
    pooled_rows = []
    party_results = []
    
    for csv_path in sorted(glob.glob(os.path.join(folder, "*.csv"))):
        party_df = pd.read_csv(csv_path)
        
        if SENTIMENT_COL not in party_df.columns or Y_COL not in party_df.columns:
            continue
        
        res, cleaned = run_party_regression(party_df, SENTIMENT_COL, Y_COL)
        
        if res is not None:
            party_results.append({"party": os.path.basename(csv_path), **res})
            pooled_rows.append(cleaned)
    
    # Pooled regression
    if pooled_rows:
        pooled_df = pd.concat(pooled_rows, ignore_index=True)
        Xp = sm.add_constant(pooled_df[SENTIMENT_COL])
        yp = pooled_df[Y_COL]
        pooled_model = sm.OLS(yp, Xp).fit(cov_type="HC3")
        
        print(f"Pooled: n={int(pooled_model.nobs)}, beta={pooled_model.params[SENTIMENT_COL]:.3f}, p={pooled_model.pvalues[SENTIMENT_COL]:.3f}, R²={pooled_model.rsquared:.4f}")
        
        results_h3.append({
            "Platform": platform,
            "n": int(pooled_model.nobs),
            "beta_sentiment": pooled_model.params[SENTIMENT_COL],
            "p-value": pooled_model.pvalues[SENTIMENT_COL],
            "R²": pooled_model.rsquared
        })

print("\n--- Summary ---")
pd.DataFrame(results_h3)

In [None]:
print("\n>>> RESULT: H3 REJECTED - No systematic relationship between sentiment and engagement")

---
## H4: Ideological Distance and Engagement

**Hypothesis**: If parties engage with each other, posts have higher engagement if these parties are ideologically further away from each other.

**Test**: Spearman correlation between ideological distance and mean engagement of cross-party mentions.

In [None]:
# H4: Ideological Distance

# Ideology scores from SmartVote (https://smartmonitor.ch/de/issues/9)
IDEOLOGY = {
    "JUSO": 4.4, "SP": 4.4,
    "Junge Grüne": 5.6, "Grüne": 5.6,
    "EVP": 30.9, "Junge EVP": 30.9,
    "Junge GLP": 53.8, "GLP": 53.8,
    "Junge Mitte": 62.4, "Mitte": 62.4,
    "JF": 79.8, "FDP": 79.8,
    "JSVP": 91.9, "SVP": 91.9
}

print("=" * 60)
print("H4: IDEOLOGICAL DISTANCE AND ENGAGEMENT")
print("=" * 60)

In [None]:
# Load network edges
ig = pd.read_csv(NETWORK_INSTAGRAM)
tt = pd.read_csv(NETWORK_TIKTOK)

ig["platform"] = "instagram"
tt["platform"] = "tiktok"

edges = pd.concat([ig, tt], ignore_index=True)

# Calculate ideological distance
edges["ideo_dist"] = edges.apply(
    lambda r: abs(IDEOLOGY.get(r.source_party, 50) - IDEOLOGY.get(r.target_party, 50)),
    axis=1
)

print(f"\nTotal edges: {len(edges)}")
print(f"Instagram: {len(ig)}, TikTok: {len(tt)}")
print(f"\nIdeological distance range: {edges['ideo_dist'].min():.1f} - {edges['ideo_dist'].max():.1f}")
print(f"Mean engagement range: {edges['mean_engagement'].min():.2f} - {edges['mean_engagement'].max():.2f}")

In [None]:
# Overall Spearman correlation
rho, pval = spearmanr(
    edges["ideo_dist"],
    edges["mean_engagement"],
    alternative="greater"
)

print("\n--- Overall: ideological distance vs engagement ---")
print(f"Spearman rho: {rho:.3f}")
print(f"p-value: {pval:.3f}")

In [None]:
# By platform
print("\n--- By platform ---")
results_h4 = []

for p in ["instagram", "tiktok"]:
    sub = edges[edges.platform == p]
    rho, pval = spearmanr(
        sub["ideo_dist"],
        sub["mean_engagement"],
        alternative="greater"
    )
    print(f"{p.upper()}: rho={rho:.3f}, p={pval:.3f}")
    results_h4.append({"Platform": p, "Spearman rho": rho, "p-value": pval})

pd.DataFrame(results_h4)

In [None]:
print("\n>>> RESULT: H4 REJECTED - No correlation between ideological distance and engagement")

---
## Summary

| Hypothesis | Result |
|------------|--------|
| **H1**: Voting posts → higher engagement | ❌ REJECTED (opposite effect) |
| **H2**: Closer to vote → higher engagement | ❌ REJECTED (no effect) |
| **H3**: Negativity → higher engagement | ❌ REJECTED (no effect) |
| **H4**: Ideological distance → higher engagement | ❌ REJECTED (no correlation) |