# 04 — Topic Labeling

In the previous notebook, I used LDA topic modeling to generate 5 topics per event.
This notebook adds **human-readable labels** to those topics so they can be used in:

- the report (clear interpretation)
- visualizations (plots that people understand)
- the Streamlit dashboard (dropdown labels)


In [2]:
import os
import pandas as pd

topic_dir = "../data/processed/topics"

event_files = {
    "event1_kyiv": "event1_kyiv_topics.csv",
    "event2_kherson": "event2_kherson_topics.csv",
    "event3_stalemate": "event3_stalemate_topics.csv",
    "event4_trump_election": "event4_trump_election_topics.csv",
    "event5_white_house_meeting": "event5_white_house_meeting_topics.csv",
}


In [4]:
topics = {}

for event, fname in event_files.items():
    path = os.path.join(topic_dir, fname)
    df = pd.read_csv(path)
    topics[event] = df

    print(event, "loaded:", df.shape)


event1_kyiv loaded: (5, 2)
event2_kherson loaded: (5, 2)
event3_stalemate loaded: (5, 2)
event4_trump_election loaded: (5, 2)
event5_white_house_meeting loaded: (5, 2)


## Labeling Method

Topic labels were assigned manually based on:

- the top words produced by the model
- the event context
- recurring themes across events

The goal is **interpretability**, not perfect automation.


In [6]:
labels = {
    "event1_kyiv": {
        0: "Battlefield updates (weapons, troops, visuals)",
        1: "Propaganda + moral outrage / reactions",
        2: "Poland border + visas + refugee movement",
        3: "Power balance + escalation talk",
        4: "Invasion framing + military response",
    },
    "event2_kherson": {
        0: "Leadership + war framing (Putin / narratives)",
        1: "War timeline + personal reactions",
        2: "Military shock + civilian harm",
        3: "Help / logistics / questions",
        4: "General reactions + humor / chatter",
    },
    "event3_stalemate": {
        0: "Wagner / coup / power struggle",
        1: "War costs + money + volunteering / work",
        2: "Combat + fighting (frontline talk)",
        3: "Reactions to footage + hot takes",
        4: "Forces + tanks + troop movement",
    },
    "event4_trump_election": {
        0: "News cycle + media framing",
        1: "Memes / anger / online banter",
        2: "Weapons + battlefield tech (tanks/missiles/air)",
        3: "War fatigue / endgame talk",
        4: "Geopolitics (China/North Korea/Europe/support)",
    },
    "event5_white_house_meeting": {
        0: "U.S. elections + partisan/media framing",
        1: "Big-picture opinions + world powers (incl. China)",
        2: "Troops + casualties + war duration",
        3: "Zelensky/president meeting drama reactions",
        4: "Peace deal + Europe + Trump/Putin diplomacy framing",
    },
}


In [8]:
for event, df in topics.items():
    df = df.copy()
    df["label"] = df["topic_id"].map(labels[event])
    topics[event] = df

    print("\n===", event, "===")
    display(df)



=== event1_kyiv ===


Unnamed: 0,topic_id,top_words,label
0,0,"used, use, weapons, children, look, troops, se...","Battlefield updates (weapons, troops, visuals)"
1,1,"shit, did, fuck, way, propaganda, world, right...",Propaganda + moral outrage / reactions
2,2,"pl, visa, ua, news, ready, help, polish, borde...",Poland border + visas + refugee movement
3,3,"sure, power, doing, better, mean, bad, point, ...",Power balance + escalation talk
4,4,"invasion, yes, countries, did, military, putin...",Invasion framing + military response



=== event2_kherson ===


Unnamed: 0,topic_id,top_words,label
0,0,"way, shit, training, country, real, putin, lov...",Leadership + war framing (Putin / narratives)
1,1,"months, person, fucking, case, great, getting,...",War timeline + personal reactions
2,2,"fuck, military, eject, insane, children, belie...",Military shock + civilian harm
3,3,"doing, questions, sure, want, line, use, help,...",Help / logistics / questions
4,4,"old, big, thanks, got, ve, sure, work, lol, gu...",General reactions + humor / chatter



=== event3_stalemate ===


Unnamed: 0,topic_id,top_words,label
0,0,"mod, military, deal, way, coup, wagner, power,...",Wagner / coup / power struggle
1,1,"money, experience, war, work, ukrainian, milit...",War costs + money + volunteering / work
2,2,"video, yeah, fight, did, ukrainian, russians, ...",Combat + fighting (frontline talk)
3,3,"guys, sure, thing, video, got, shit, right, ru...",Reactions to footage + hot takes
4,4,"force, tank, forces, probably, army, troops, m...",Forces + tanks + troop movement



=== event4_trump_election ===


Unnamed: 0,topic_id,top_words,label
0,0,"need, news, lot, new, year, years, time, media...",News cycle + media framing
1,1,"thing, fucking, got, time, fuck, guy, shit, ri...",Memes / anger / online banter
2,2,"used, tanks, soldiers, hit, air, use, missiles...",Weapons + battlefield tech (tanks/missiles/air)
3,3,"years, way, ukrainian, ukrainians, putin, want...",War fatigue / endgame talk
4,4,"china, support, weapons, military, north, euro...",Geopolitics (China/North Korea/Europe/support)



=== event5_white_house_meeting ===


Unnamed: 0,topic_id,top_words,label
0,0,"pro, election, american, media, elections, par...",U.S. elections + partisan/media framing
1,1,"ll, want, maybe, right, shit, make, good, chin...",Big-picture opinions + world powers (incl. China)
2,2,"soldiers, army, year, russians, time, years, g...",Troops + casualties + war duration
3,3,"right, fuck, good, president, time, zelensky, ...",Zelensky/president meeting drama reactions
4,4,"european, putin, countries, deal, peace, trump...",Peace deal + Europe + Trump/Putin diplomacy fr...


## Notes on Interpretation

Some events have cleaner topics than others.

For example, Event 2 (Kherson) has fewer comments than Events 4–5, so topics can look more general.
This is expected in real-world NLP and will be mentioned as a limitation in the report.


In [10]:
out_dir = "../data/processed/topics_labeled"
os.makedirs(out_dir, exist_ok=True)

all_labeled = []

for event, df in topics.items():
    out_path = os.path.join(out_dir, f"{event}_topics_labeled.csv")
    df.to_csv(out_path, index=False)
    print("Saved:", out_path)

    df2 = df.copy()
    df2["event"] = event
    all_labeled.append(df2)

master = pd.concat(all_labeled, ignore_index=True)
master_path = os.path.join(out_dir, "topics_labeled_master.csv")
master.to_csv(master_path, index=False)
print("Saved:", master_path)

master


Saved: ../data/processed/topics_labeled/event1_kyiv_topics_labeled.csv
Saved: ../data/processed/topics_labeled/event2_kherson_topics_labeled.csv
Saved: ../data/processed/topics_labeled/event3_stalemate_topics_labeled.csv
Saved: ../data/processed/topics_labeled/event4_trump_election_topics_labeled.csv
Saved: ../data/processed/topics_labeled/event5_white_house_meeting_topics_labeled.csv
Saved: ../data/processed/topics_labeled/topics_labeled_master.csv


Unnamed: 0,topic_id,top_words,label,event
0,0,"used, use, weapons, children, look, troops, se...","Battlefield updates (weapons, troops, visuals)",event1_kyiv
1,1,"shit, did, fuck, way, propaganda, world, right...",Propaganda + moral outrage / reactions,event1_kyiv
2,2,"pl, visa, ua, news, ready, help, polish, borde...",Poland border + visas + refugee movement,event1_kyiv
3,3,"sure, power, doing, better, mean, bad, point, ...",Power balance + escalation talk,event1_kyiv
4,4,"invasion, yes, countries, did, military, putin...",Invasion framing + military response,event1_kyiv
5,0,"way, shit, training, country, real, putin, lov...",Leadership + war framing (Putin / narratives),event2_kherson
6,1,"months, person, fucking, case, great, getting,...",War timeline + personal reactions,event2_kherson
7,2,"fuck, military, eject, insane, children, belie...",Military shock + civilian harm,event2_kherson
8,3,"doing, questions, sure, want, line, use, help,...",Help / logistics / questions,event2_kherson
9,4,"old, big, thanks, got, ve, sure, work, lol, gu...",General reactions + humor / chatter,event2_kherson
