# Algospeak Topic Modeling
This notebook contains some topic modeling for my algospeak project.  
This is a new, continuing script.

In [1]:
import pandas as pd
import numpy as np
import sklearn
%pprint
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)

Pretty printing has been turned OFF


## Data reshaping
I'm only looking at the algospeak usage of terms, so I'll be looking at mention_codes a (algospeak) and m (mention)

In [2]:
algo_df = pd.read_csv('algospeak_top_posts.csv')
algo_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query,mention_code
0,"""Orange Man Bad"" and ""Unalive the Boer"" are universal truths that all living beings are innately attuned to. Like a message from the Creator",maiamindel.bsky.social,Maia,2025-02-23T17:07:01.256Z,92,1,1,18,at://did:plc:ur77nun2q74loi34r2e6r43u/app.bsky.feed.post/3liud2ctyus2b,unalive,a
1,"please for the love of god can we NOT normalise using ""sw"" and ""unalive"" on this app? use real words.\nsex work.\nkill.\ncunt.\n\nthere's no algorithm here.",adrierising.bsky.social,adrie rose 🇯🇲,2025-02-25T03:25:40.832Z,83,0,1,10,at://did:plc:uajcdhsabyf4t7a3qgclm55x/app.bsky.feed.post/3lixw3icgps2u,unalive,m
2,Good morning to everyone except Asian Doll who openly admitted to trying to unalive Kash Doll over a damn name during BHM. 🤦🏿‍♀️,authorreneeb.bsky.social,ReneeB,2025-02-24T12:32:05.544Z,19,1,1,2,at://did:plc:ng6mdz23xa3jae4yr2crgocy/app.bsky.feed.post/3liwe5mro2k2k,unalive,a
3,"Why did I marry someone who picks the WORST FUCKING MOVIES\n\nMy god I either want to die of boredom, want to unalive myself, or am too confused to decide",vitaminpac1.bsky.social,Vitamin Bee 🐝,2025-02-24T01:18:42.171Z,22,0,7,1,at://did:plc:37nxbbnnrozlllwyfngsalya/app.bsky.feed.post/3liv6jj4j5s2y,unalive,a
4,postponing my unalive,baratiddys.bsky.social,Luis 🤍,2025-02-20T16:47:35.943Z,21,0,1,1,at://did:plc:dluxclbmnsh3bt6wyih5l6ds/app.bsky.feed.post/3limqkt5ctc24,unalive,a


In [3]:
am_df = algo_df[algo_df['mention_code'].isin(['a', 'm'])]
am_df[['text', 'mention_code']].sample(15)

Unnamed: 0,text,mention_code
48,"Damn, so much seggs on my TL, y'all are horny! \n\nKeep going. 👀",a
9,I watched something and it used the word 'unalive' which made me so mad that I was shaken back to lucidity,m
32,Lucille and Aurelia get a very interesting message while out >.>\n\nSpider Seggs for my Triple scoop patrons over on Patreon. Come join the fun :) www.patreon.com/c/cslucaris,a
46,Toji car seggs,a
39,"This is very true, we deliver high quality seggs at affordable prices no matter what the state of things is like",a
10,"wait, people actually say unalive? we’re all so cooked",m
51,"I've been reading this one for a while, there's some reeeeal good seggs in there.",a
29,"Two fishies, one seggs 🐟🐟",a
22,Had we known we would live this long we would’ve taken better care of ourselves😝😆\nI’ve gotta live long enough to see this shit stain unalive.,a
5,Love it when you're drinking and suddenly your body forgets the correct procedure for swallowing and you attempt to accidentally unalive yourself.,a


In [4]:
am_df.mention_code.value_counts()
# only down to 59 posts.... I think I'll need to get a lot more!

mention_code
a    49
m    10
Name: count, dtype: int64

In [5]:
#let's just see how topic modeling works with the 'unalive' portion

In [6]:
unalive_df = am_df[am_df['query'] == 'unalive']
unalive_docs = unalive_df.text
unalive_docs.head()

0                    "Orange Man Bad" and "Unalive the Boer" are universal truths that all living beings are innately attuned to. Like a message from the Creator
1    please for the love of god can we NOT normalise using "sw" and "unalive" on this app? use real words.\nsex work.\nkill.\ncunt.\n\nthere's no algorithm here.
2                                Good morning to everyone except Asian Doll who openly admitted to trying to unalive Kash Doll over a damn name during BHM. 🤦🏿‍♀️
3      Why did I marry someone who picks the WORST FUCKING MOVIES\n\nMy god I either want to die of boredom, want to unalive myself, or am too confused to decide
4                                                                                                                                           postponing my unalive
Name: text, dtype: object

In [7]:
len(unalive_docs)

29

## Topic Modeling Unalive (small version)
Since I'm only looking at 29 posts for this, I'm not going to take a huge stock in the results here. This is more of a proof of concept, maybe looking on how topic modeling can be used to aid qualitative sociolinguistic work.

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import NMF, LatentDirichletAllocation

In [9]:
# Na-rae's function
def display_topics(model, feature_names, num_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic %d:" % (topic_idx))
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-num_top_words - 1:-1]]))


In [10]:
#let's just look at 3 topics since there's only 29 posts
num_feats = 1000
num_topics = 3

In [11]:
tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=num_feats, stop_words='english')
tfidf_docs = tfidf_vectorizer.fit_transform(unalive_docs)

nmf_model = NMF(n_components=num_topics, random_state=1, l1_ratio=.5, 
                init='nndsvd').fit(tfidf_docs)

display_topics(nmf_model, tfidf_vectorizer.get_feature_names_out(), 10)

Topic 0:
people say vacation wanna work wait cooked actually don necessary
Topic 1:
app love physician snuffies assisted god sw normalise cunt real
Topic 2:
saying self delete ideation suicidal word ve helpful muted euphemisms


Topic 2 is very clear here, this is from the couple posts doing meta discourse. Topic 1 comes from one specific post about an app called snuffies.

In [15]:
unalive_df[unalive_df['mention_code'] == 'm'] # you can see topic 2 reflected here

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,url,reply_to,reply_to_uri,reply_to_url,query,mention_code
4,"That happened the moment we let these companies forced creators and content-makers to use the word ""unalive"". FUCK THAT! People die, people get killed, people commit suicide. Use words. Don't pander to that nonsense.",jayej330.bsky.social,"Jayleigh ""Jaye"" Jimenez",2025-04-14T08:57:07.765Z,29,0,1,0,at://did:plc:r6coggh5qqmqrqvrpitb7khu/app.bsky.feed.post/3lmr6ye3esc2m,https://bsky.app/profile/did:plc:r6coggh5qqmqrqvrpitb7khu/post/3lmr6ye3esc2m,Yes,at://did:plc:uqppyrcon566pkrszusjonav/app.bsky.feed.post/3lmr67dl25c2s,https://bsky.app/profile/did:plc:uqppyrcon566pkrszusjonav/post/3lmr67dl25c2s,unalive,m
8,">the minecraft movie features a character unironically saying ""unalive""",voicesbyzane.bsky.social,Zane Schacht - Voice Goblin,2025-04-09T13:46:56.708Z,294,1,12,18,at://did:plc:wiw2flcb762ybvhg7gj3bf2k/app.bsky.feed.post/3lmf4ty6oxc2a,https://bsky.app/profile/did:plc:wiw2flcb762ybvhg7gj3bf2k/post/3lmf4ty6oxc2a,No,,,unalive,m
13,"I'm pretty sure someone says ""unalive"" instead of ""kill"" in the Minecraft movies without a hint of irony. \n\nWhat happened.",finlopes.bgay.social,Finlopes97,2025-04-14T08:51:45.000Z,6,0,2,0,at://did:plc:dzb433oi6jkcqcbgxit7s4vl/app.bsky.feed.post/3lmr6oqbjic2v,https://bsky.app/profile/did:plc:dzb433oi6jkcqcbgxit7s4vl/post/3lmr6oqbjic2v,Yes,at://did:plc:uqppyrcon566pkrszusjonav/app.bsky.feed.post/3lmr67dl25c2s,https://bsky.app/profile/did:plc:uqppyrcon566pkrszusjonav/post/3lmr67dl25c2s,unalive,m
14,"I find all of this completely ludicrous. ""Unalive"" - Jesus fucking Christ. It crops up all the time in autogenerated subtitles and it's driving me mental. We're devolving into fucking baby talk trying to adhere to guidelines we can't see and that may not exist.",travisjj.bsky.social,Travis Johnson,2025-04-14T14:56:22.163Z,11,0,1,2,at://did:plc:5pwzpepcw2qcf4xi5bgrkult/app.bsky.feed.post/3lmrt2pxemk2g,https://bsky.app/profile/did:plc:5pwzpepcw2qcf4xi5bgrkult/post/3lmrt2pxemk2g,No,,,unalive,m
15,"remembering that dipshit american girl on tiktok crying at a canadian going ""YOU CANADIANS ARE GOING TO UNALIVE MILLIONS OF PEOPLE BY CUTTING OFF ENERGY TO MICHIGAN AND PENNSYLVANIA"" lol",groverhaustenbosch.bsky.social,Groverhuis,2025-04-11T23:07:28.846Z,15,0,2,0,at://did:plc:jcdj4jhpz5c3x3kad3u6npic/app.bsky.feed.post/3lml5456x7k2j,https://bsky.app/profile/did:plc:jcdj4jhpz5c3x3kad3u6npic/post/3lml5456x7k2j,Yes,at://did:plc:44yddrxlbkmufwcl6yajflro/app.bsky.feed.post/3lmkwltc4fk2a,https://bsky.app/profile/did:plc:44yddrxlbkmufwcl6yajflro/post/3lmkwltc4fk2a,unalive,m
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
179,"gen z jack garland: ""I'm here to unalive Chaos.""",jgntaken.bsky.social,you aer dadonpachi,2025-03-17T22:29:25.307Z,19,0,1,1,at://did:plc:ctjsenqhkkcwa3ghllm32xup/app.bsky.feed.post/3lkm7d2waks2x,https://bsky.app/profile/did:plc:ctjsenqhkkcwa3ghllm32xup/post/3lkm7d2waks2x,No,,,unalive,m
181,"Millennial Moment: I hate that tiktok popularized ""unalive"" when we already had a perfectly fine, algorithm-friendly euphemism: off/offed. Less grammatical brainrot, and makes ya sound like an old-timey gangster!\n\nExample: ""Jane offed her abusive bf.""",missdiedra.bsky.social,Vishanti,2025-03-17T22:25:17.819Z,19,0,4,0,at://did:plc:l23szrmmv4je2bbrd32mghkg/app.bsky.feed.post/3lkm73ove6k2e,https://bsky.app/profile/did:plc:l23szrmmv4je2bbrd32mghkg/post/3lkm73ove6k2e,No,,,unalive,m
182,"So to avoid YouTube demonetizing videos, some content creators have to use certain terms for words. (pedophile is PDF file, suicide is unalive etc etc)\n\nReese Waters calling Nazis ""Yahtzees"" is fucking brilliant. Lol.",lif3asmaj.bsky.social,Maj,2025-03-17T21:21:21.794Z,13,0,3,2,at://did:plc:rj7sphb3g7r2bbdclow4hdut/app.bsky.feed.post/3lkm3jel5zc2d,https://bsky.app/profile/did:plc:rj7sphb3g7r2bbdclow4hdut/post/3lkm3jel5zc2d,No,,,unalive,m
186,"*After careful consideration,\n\n*We've decided to postpone our divorce,\n\n*Just until we kill your ass\n\n*Hey! Language!\n\n*Sorry, just until we UNALIVE your ass\n\nThe divorced rulers that each control 50% of their world, \n\nMeet The Ringmistress and The Ringmaster!\n\n#deltarune\n#deltaruneoc",eggstirfry.bsky.social,Eggstirfry,2025-03-17T20:12:56.712Z,10,0,0,3,at://did:plc:f7zzilh7gloj65ni5ipbnnmv/app.bsky.feed.post/3lklxozolgk2i,https://bsky.app/profile/did:plc:f7zzilh7gloj65ni5ipbnnmv/post/3lklxozolgk2i,No,,,unalive,m


In [12]:
# trying the LDA model now
tf_vectorizer = CountVectorizer(max_df=0.8, max_features=num_feats, stop_words='english')
tf_docs = tf_vectorizer.fit_transform(unalive_docs)

lda_model = LatentDirichletAllocation(n_components=num_topics, max_iter=5, learning_method='online', 
                                learning_offset=50.,random_state=0).fit(tf_docs)

display_topics(lda_model, tf_vectorizer.get_feature_names_out(), 10)

Topic 0:
people saying la bastante ideation decir want el doll women
Topic 1:
work algorithm people best word need week using report offers
Topic 2:
don live like long president ve say taken plan kentucky


Topic zero obviously takes a lot from the one Spanish post in my sample. The other two are less focused than the NMF model. That one seems to work a bit better.

## The same thing but on the Unalive Set

In [13]:
unalive_df = pd.read_csv('unalive_top_posts.csv')

In [14]:
unalive_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,url,reply_to,reply_to_uri,reply_to_url,query,mention_code
0,unalive,ninochuu.zip,nino,2025-04-12T20:19:42.329Z,154,0,0,29,at://did:plc:75rnvxji6taq2xyxq63ejhmm/app.bsky.feed.post/3lmne72jmws2d,https://bsky.app/profile/did:plc:75rnvxji6taq2xyxq63ejhmm/post/3lmne72jmws2d,No,,,unalive,o
1,We should unalive Caesar?,doktorslek.bsky.social,Doktor Slek 🦘☭,2025-04-15T02:50:06.393Z,10,0,0,0,at://did:plc:g7cu7736qmemcopvjip74g3b/app.bsky.feed.post/3lmt2wyd5q22a,https://bsky.app/profile/did:plc:g7cu7736qmemcopvjip74g3b/post/3lmt2wyd5q22a,Yes,at://did:plc:awzzrtrcrvpnxi3ph2sbhxwv/app.bsky.feed.post/3lmt2lhdlwk2l,https://bsky.app/profile/did:plc:awzzrtrcrvpnxi3ph2sbhxwv/post/3lmt2lhdlwk2l,unalive,a
2,You ever chill so hard that you appear unalive??,laminatedliss.bsky.social,Churro’s Mom,2025-04-11T15:57:19.608Z,40,1,7,1,at://did:plc:kvyauj6dn35wre6meqyg6ync/app.bsky.feed.post/3lmkf2xlja22o,https://bsky.app/profile/did:plc:kvyauj6dn35wre6meqyg6ync/post/3lmkf2xlja22o,No,,,unalive,a
3,Musk can unalive any of us and cut off all means of support \nIf they can unalive these 6000,iputadollarin.bsky.social,Iputadollariniwinacar,2025-04-12T23:12:34.756Z,7,0,2,0,at://did:plc:icoabg5urmotmxr3cko3qfsi/app.bsky.feed.post/3lmnnu6fchs26,https://bsky.app/profile/did:plc:icoabg5urmotmxr3cko3qfsi/post/3lmnnu6fchs26,Yes,at://did:plc:q3bbdtxch45wvfxrxpblphxn/app.bsky.feed.post/3lmnh3qmq5c2h,https://bsky.app/profile/did:plc:q3bbdtxch45wvfxrxpblphxn/post/3lmnh3qmq5c2h,unalive,a
4,"That happened the moment we let these companies forced creators and content-makers to use the word ""unalive"". FUCK THAT! People die, people get killed, people commit suicide. Use words. Don't pander to that nonsense.",jayej330.bsky.social,"Jayleigh ""Jaye"" Jimenez",2025-04-14T08:57:07.765Z,29,0,1,0,at://did:plc:r6coggh5qqmqrqvrpitb7khu/app.bsky.feed.post/3lmr6ye3esc2m,https://bsky.app/profile/did:plc:r6coggh5qqmqrqvrpitb7khu/post/3lmr6ye3esc2m,Yes,at://did:plc:uqppyrcon566pkrszusjonav/app.bsky.feed.post/3lmr67dl25c2s,https://bsky.app/profile/did:plc:uqppyrcon566pkrszusjonav/post/3lmr67dl25c2s,unalive,m


In [16]:
un_am_df = unalive_df[unalive_df['mention_code'].isin(['a', 'm'])]
un_am_df[['text', 'mention_code']].sample(10)

Unnamed: 0,text,mention_code
171,"i know this is a very high and mighty thing to say but if i found myself in a position where i was saying ""unalive"" and ""grape"" as part of my job, i would get another job",m
114,"Or harm or even unalive them \n\nAnd we can't count on any recourse for that, or any illegal injustice they are perpetrating",a
151,Can’t someone - anyone - please do the right thing??\n\nUnalive this fucker already.,a
79,"on a ""movies are over"" downer note, we took the kiddo to see ""minecraft"" and i wanted to unalive myself. and i saw it with an audience where there were MULTIPLE applause breaks, which were arcane to the point of questioning my own sanity",a
96,TRUMP most hated man in history right now; hated maybe mer than Hitler. At least Hitler had the decency to unalive himself!!!!,a
65,"Just had to close a video a minute in because the narrator kept censoring the word ""die"" with ""unalive"" and then said ""costed""",m
133,"In this episode of #Hitman Ball Z: Can Agent 47 unalive his next target?\n\nYes, the answer is yes.\n\nHe can do it on my channel:\nwww.twitch.tv/Simon_Novak\n\n#Vtuber #ENVtuber #EyeSeeYou",a
52,Rebel Girl by Bikini Unalive\nMr. Brightside by the Unalivers\nInstitutionalized by Sewer Slide All Tendencies\nC.R.E.A.M feat. Ghostface Unalivah\nMore Human Than Human by Palm-Colored Zombie\nUnhinged Unaliver by The Talking Heads,m
9,Why do tourists insist on picking up and handling the wildlife. Do that with the wrong critter in Australia and it’ll be the end of you. They have more shit that’ll unalive you than anywhere else in earth.,a
144,My fav drivers are always the looks like a Cinnabon but could unalive you type,a


In [17]:
un_am_df.mention_code.value_counts()

mention_code
a    123
m     64
Name: count, dtype: int64

In [18]:
un_docs = un_am_df.text
un_docs.head()

1                                                                                                                                                                                                   We should unalive Caesar?
2                                                                                                                                                                            You ever chill so hard that you appear unalive??
3                                                                                                                                Musk can unalive any of us and cut off all means of support \nIf they can unalive these 6000
4    That happened the moment we let these companies forced creators and content-makers to use the word "unalive". FUCK THAT! People die, people get killed, people commit suicide. Use words. Don't pander to that nonsense.
5                                                                                                               

In [21]:
num_feats = 1000
num_topics = 10

tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=num_feats, stop_words='english')
tfidf_docs = tfidf_vectorizer.fit_transform(un_docs)

nmf_model = NMF(n_components=num_topics, random_state=1, l1_ratio=.5, 
                init='nndsvd').fit(tfidf_docs)

display_topics(nmf_model, tfidf_vectorizer.get_feature_names_out(), 10)

Topic 0:
people think tell way allowed trans money twisted using just
Topic 1:
character chungus named general said movie minecraft word say tells
Topic 2:
don time wish want care reason government just salvador el
Topic 3:
gonna sweaty unaliving minecraft pew say straight movie doesn hear
Topic 4:
going somebody make hard let life ups constant pop regret
Topic 5:
like say things shit kill old saying seggs need self
Topic 6:
really know concerned sorry live actually pretty stuff sure happened
Topic 7:
ll hope wrong wildlife tourists picking critter shit shits tries
Topic 8:
use tiktok words content creators video fuck youtube hitler fucker
Topic 9:
hate fucking slide sewer sounds childish forever term completely trying




In [22]:
tf_vectorizer = CountVectorizer(max_df=0.8, max_features=num_feats, stop_words='english')
tf_docs = tf_vectorizer.fit_transform(un_docs)

lda_model = LatentDirichletAllocation(n_components=num_topics, max_iter=5, learning_method='online', 
                                learning_offset=50.,random_state=0).fit(tf_docs)

display_topics(lda_model, tf_vectorizer.get_feature_names_out(), 10)

Topic 0:
meme fucking wait joke longer gen yes assume jack literally
Topic 1:
people contract word fucking social use does suicide saw creators
Topic 2:
say hate know don just government makes ll wish like
Topic 3:
make killed night sharing charles cuz doesn stage signing shots
Topic 4:
going people black like think term make pence heard just
Topic 5:
like shit gonna man led old seggs case child saying
Topic 6:
tiktok use like video youtube comments places monetizing creators don
Topic 7:
tiktok fuck way bissi da fucker fat video trying people
Topic 8:
time trying don people like need minecraft planet plan raped
Topic 9:
going just en la kill don ass folkway imo por


### Splitting up 'a' and 'm'?

In [23]:
a_df = unalive_df[unalive_df['mention_code'].isin(['a'])]
a_docs = a_df.text
a_docs.head()

1                                                                       We should unalive Caesar?
2                                                You ever chill so hard that you appear unalive??
3    Musk can unalive any of us and cut off all means of support \nIf they can unalive these 6000
5                                                                              I’d unalive myself
6                                                        I’m really concerned that he is unalive.
Name: text, dtype: object

In [25]:
num_feats = 1000
num_topics = 7

tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=num_feats, stop_words='english')
tfidf_docs = tfidf_vectorizer.fit_transform(a_docs)

nmf_model = NMF(n_components=num_topics, random_state=1, l1_ratio=.5, 
                init='nndsvd').fit(tfidf_docs)

display_topics(nmf_model, tfidf_vectorizer.get_feature_names_out(), 10)

Topic 0:
people think trying like allowed tell hate trans fucking legally
Topic 1:
going somebody let reason trump life good hard make nootz
Topic 2:
hope ll literally book deserves hounded throw social love life
Topic 3:
fucker fuck right thing fat wants try fact bring court
Topic 4:
really concerned know sorry stuff moment actually deflect obstruct course
Topic 5:
wish don want care happens come just time government reason
Topic 6:
gonna unaliving sweaty highkey doesn minecraft believing hear blooded hot




In [26]:
m_df = unalive_df[unalive_df['mention_code'].isin(['m'])]
m_docs = m_df.text
m_docs.head()

4                                                  That happened the moment we let these companies forced creators and content-makers to use the word "unalive". FUCK THAT! People die, people get killed, people commit suicide. Use words. Don't pander to that nonsense.
8                                                                                                                                                                                                   >the minecraft movie features a character unironically saying "unalive"
13                                                                                                                                             I'm pretty sure someone says "unalive" instead of "kill" in the Minecraft movies without a hint of irony. \n\nWhat happened.
14    I find all of this completely ludicrous. "Unalive" - Jesus fucking Christ. It crops up all the time in autogenerated subtitles and it's driving me mental. We're devolving into fucking baby t

In [29]:
num_feats = 1000
num_topics = 7

tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=num_feats, stop_words='english')
tfidf_docs = tfidf_vectorizer.fit_transform(m_docs)

nmf_model = NMF(n_components=num_topics, random_state=1, l1_ratio=.5, 
                init='nndsvd').fit(tfidf_docs)

display_topics(nmf_model, tfidf_vectorizer.get_feature_names_out(), 10)

Topic 0:
chungus character named general said say tells called word instead
Topic 1:
like say shit things pew murder kill people old self
Topic 2:
minecraft movie saying unironically features character sure movies gonna im
Topic 3:
use words content people creators tiktok youtube video suicide fucking
Topic 4:
hate slide sewer sounds childish forever fucking term im straight
Topic 5:
just word bikini saw said new dropped kill euphemism ass
Topic 6:
don live know telling omg keeps dude write censored til
