# Dataset Preprocessing (Posts)


### Let's take a look at dataset

In [1]:
import pandas as pd
import numpy as np 

In [2]:
dfp = pd.read_excel("posts.xlsx")

In [3]:
dfp.head(10)

Unnamed: 0,post_id,subreddit,title,selftext,upvote_ratio,ups,downs,score,created_utc,created_date,Category
0,18dk2i9,Palestine,Ethnic Cleansing,Israel in pursuing ethnic cleansing by means o...,0.8,6,0,6,1702031694,2023-12-08 10:34:54,V/HC
1,18dk3t6,Palestine,"Scoop: Egypt warned Israel of ""a rupture"" in r...",,0.85,5,0,5,1702031846,2023-12-08 10:37:26,P/D
2,18dk3kc,Palestine,"Dozens of Palestinians captured by IDF, stripp...",,0.77,7,0,7,1702031814,2023-12-08 10:36:54,V/HC
3,18dit8x,Palestine,"Aryeh King, the Deputy Mayor of Jerusalem's Is...",,0.97,30,0,30,1702026033,2023-12-08 09:00:33,P/D
4,18dirx9,Palestine,This was a question within a test written by P...,,0.94,16,0,16,1702025860,2023-12-08 08:57:40,SM/PO
5,18diquq,Palestine,Gaza Writes Back is a famous book written in 2...,,0.93,36,0,36,1702025714,2023-12-08 08:55:14,SM/PO
6,18dijbe,Palestine,The United States’ Energy envoy has reiterated...,,0.8,3,0,3,1702024755,2023-12-08 08:39:15,P/D
7,18dgmjy,Palestine,I'm looking for more information about the Cen...,"On December 6th, [Israeli forces destroyed cen...",0.94,14,0,14,1702016749,2023-12-08 06:25:49,SM/PO
8,18dgtc2,Palestine,Reuters: Palestinian Authority working with U....,,0.96,49,0,49,1702017501,2023-12-08 06:38:21,P/D
9,18dgisj,Palestine,Families Of Israeli Hostages RAGE At Netanyahu...,,0.95,18,0,18,1702016347,2023-12-08 06:19:07,V/HC


In [4]:
dfp.shape

(7029, 11)

In [5]:
dfp.columns

Index(['post_id', 'subreddit', 'title', 'selftext', 'upvote_ratio', 'ups',
       'downs', 'score', 'created_utc', 'created_date', 'Category'],
      dtype='object')

In [6]:
#getting rid of the unnecessary columns 
dfp = dfp.drop(["Category", "ups" , "downs" , "created_utc" , "selftext"] , axis=1)

In [7]:
#check if there are duplicates
dfp.title.duplicated().sum()

1206

In [8]:
dfp.drop_duplicates(subset='title', keep='first', inplace=True)

In [9]:
dfp.shape

(5823, 6)

In [10]:
#no null values
print(dfp.isnull().sum())

post_id         0
subreddit       0
title           0
upvote_ratio    0
score           0
created_date    0
dtype: int64


In [11]:
dfp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5823 entries, 0 to 6818
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   post_id       5823 non-null   object 
 1   subreddit     5823 non-null   object 
 2   title         5823 non-null   object 
 3   upvote_ratio  5823 non-null   float64
 4   score         5823 non-null   int64  
 5   created_date  5823 non-null   object 
dtypes: float64(1), int64(1), object(4)
memory usage: 318.4+ KB


## Part of text preprocessing to label the maximum of the Data  

## Translate the non english titles 


In [15]:
from langdetect import detect
from googletrans import Translator

def detect_language_and_translate(text):
    try:
        lang = detect(text)
        if lang != 'en':
            translator = Translator()
            translation = translator.translate(text, dest='en')
            print(f"Original: {text} | Translated: {translation.text}")
            return translation.text
        else:
            return text
    except:
        return text



In [16]:
# Assuming dfp is a DataFrame containing the Reddit posts
# Replace 'title' with the actual column name containing the titles in your DataFrame
dfp['title'] = dfp['title'].apply(detect_language_and_translate)



Original: Weaponizing Anti-Semitism Allegations- BadEmpanada | Translated: Weaponizing Anti-Semitism Allegations- BadEmpanada
Original: Does Kraft Heinz have involvement or support Israel? | Translated: Does Kraft Heinz have involvement or support Israel?
Original: Israel drops leaflets quoting scripture | Translated: Israel drops leaflets quoting scripture
Original: Zionism IS antisemitism (antijudaism). | Translated: Zionism IS antisemitism (anti judaism).
Original: Israel's goal  | Translated: Israel's goal
Original: Zionism IS Antisemetism! | Translated: Zionism IS Antisemitism!
Original: Israel’s original plan | Translated: Israel’s original plan
Original: Zondag in Den Haag | Translated: Sunday in The Hague
Original: براءة ذمة  | Translated: Clearance
Original: Palestine, mon amour : RIP Bonanno | Translated: Palestine, mon amour : RIP Bonanno
Original: Never forget | Translated: Never forget
Original: Sankara knew | Translated: Sankara knew
Original: Day 60: 16,248 martyrs | Tra

Original: About cogniti e dissonance | Translated: About the known e dissonance
Original: A Post Hamas discussion | Translated: A Post Hamas discussion
Original: Leo Varadkar comments | Translated: Leo Varadkar Comments
Original: Israel creation | Translated: Israel creation
Original: More religious proofs | Translated: More religious proofs
Original: Hospital debate | Translated: Hospital debate
Original: Disturbing Incident in Gaza - Israeli Sniper Targets Palestinian | Translated: Disturbing Incident in Gaza - Israeli Sniper Targets Palestinian
Original: Muslims killing Muslims is not a big deal. | Translated: Muslims killing Muslims is not a big deal.
Original: Palestine &amp; Israeli hostage swap. | Translated: Palestine &amp; Israeli hostage swap.
Original: Tunnels in Gaza | Translated: Tunnels in Gaza
Original: Best book on Israel/Palestine? | Translated: Best book on Israel/Palestine?
Original: Arabs are from Arabia | Translated: Arabs are from Arabia
Original: Settlers do not 

Original: حسبي الله ونعم الوكيل | Translated: Allah is my suffice, and the best deputy
Original: Real | Translated: Real
Original: TikTok · Helwa | Translated: Cut · Sweet
Original: Stabbing for kids : Palestinian girl gives demo | Translated: Stabbing for kids : Palestinian girl gives demo
Original: 🇮🇷 Iran be like | Translated: 🇮🇷 Iran be like
Original: TikTok · Rae | Translated: Tiktok · Rae
Original: Do you support Hamaz free Palestine ? | Translated: Do you support Hamas free Palestine ?
Original: does Palestine deserve support?? | Translated: does Palestine deserve support??
Original: Violence begets more violence? | Translated: Violence begets more violence?
Original: 🚨JUST IN: YEMEN'S HOUTHIS RELEASE VIDEOS OF MISSILE AND DRONE ATTACKS ON ISRAEL A DAY AFTER DECLARING WAR | Translated: 🚨JUST IN: YEMEN'S HOUTHIS RELEASE VIDEOS OF MISSILE AND DRONE ATTACKS ON ISRAEL A DAY AFTER DECLARING WAR
Original: THE TRUTH | Translated: THE TRUTH
Original: Nazi Israel | Translated: Naza Israe

Original: وينك يا إنسانية - صالح الجعفراوي | Video Official Music - Saleh Aljafarawi | Translated: Where are you, Humanity - Saleh Al -JaafrawiVideo Official Music - Saleh Aljafarawi
Original: سلام لغزة - Salute to Gaza | Translated: Peace to Gaza - Salute to Gaza
Original: زعيمة إسبانية: كيف نؤمن بالقانون الدولي ولا نحاكم مجـ.ـرم الحـ.ـرب #نتنياهو؟!

#غزة #اسرائيل  | Translated: Spanish leader: How do we believe in international law and we do not judge the guy.

#Gaza #Israel
Original: Why does Israel keep the bodies of Palestinians? | Translated: Why does Israel keep the bodies of Palestinians?
Original: We Will Be Free | Translated: We Will Be Free
Original: تتميز التحركات الداعمة من ناحية تعدد الخلفيات كمشاركة منظّمة لمجموعات يهودية مناهضة للصهيونية، ومن ناحية خطاب يركّز على دور الرأسمالية في الاحتلال الصهيوني، جوهر إسرائيل الاستعماري الاستيطاني، وحل الدولة الديمقراطية الواحدة. رابط المقال في التعليقات | Translated: The supportive movements are distinguished in terms of multiple ba

Original: Israeli rabbi justifies killing of babies | Translated: Israeli rabbi justifies killing of babies
Original: No More Excuses For Gaza Silence | Translated: No More Excuses For Gaza Silence
Original: Protest in Switzerland: Not neutral on genocide 🇨🇭 | Translated: Protest in Switzerland: Not neutral on genocide 🇨🇭
Original: Israeli Police injures ‘Abdallah Abu Rahma during a protest. | Translated: Israeli Police injures ‘Abdallah Abu Rahma during a protest.
Original: Nazi vs Zionist quiz | Translated: Nazi vs Zionist quiz
Original: وفي القلب... غزة | Translated: And in the heart ... Gaza
Original: Ireland: pro-Palestine protest | Translated: Ireland: pro-Palestine protest
Original: Beautiful Palestinian dresses/thobes 🇵🇸 | Translated: Beautiful Palestinian dresses/thobes 🇵🇸
Original: "Conquer and Divide" | Translated: "Conquer and Divide"
Original: "سد و فرق" | Translated: "Dam and Teams"
Original: A cool guide to genocide | Translated: A cool guide to genocide
Original: Genuin

In [12]:
#keep only posts in english 
from langdetect import detect

def detect_language_and_filter(text):
    try:
        lang = detect(text)
        return lang == 'en'
    except:
        return False



In [17]:
# Filter posts if needed (optional)
dfp_f = dfp[dfp['title'].apply(detect_language_and_filter)]



In [19]:
print("Dataset shape after filtration: " , dfp_f.shape )

Dataset shape after filtration:  (5377, 6)


### Tokenize , remove Stop words , links and symbols 

In [20]:
#neccessary libraries for text processing 
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string
import re
from nltk.stem import WordNetLemmatizer

#nltk.download('stopwords')
#nltk.download('punkt')
stop_words = set(stopwords.words('english'))
stop_words.update(set(string.punctuation))
lemmatizer = WordNetLemmatizer()




In [21]:
import re
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

# Download NLTK resources (uncomment if not already downloaded)
# import nltk
# nltk.download('punkt')
# nltk.download('stopwords')
# nltk.download('wordnet')

stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def process_text(text, words_to_keep_unchanged=None):
    if words_to_keep_unchanged is None:
        words_to_keep_unchanged = ["us" ]

    if isinstance(text, str):
        # Remove URLs
        text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
        # Tokenize the input text into words
        words = word_tokenize(text)
        # Remove stop words, make words lowercase, and apply lemmatization
        lemmatized_words = [
            word if word.lower() in words_to_keep_unchanged else lemmatizer.lemmatize(word.lower())
            for word in words
            if word.lower() not in stop_words and word.isalpha()
        ]
        lemmatized_text = ' '.join(lemmatized_words)
        return lemmatized_text
    else:
        return text


In [22]:
dfp_f['title'] = dfp_f['title'].apply(process_text)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfp_f['title'] = dfp_f['title'].apply(process_text)


In [23]:
#titles are in lower case , don't contain stop words and punctuation 
for text in dfp_f['title']: 
    print(text, "\n")

ethnic cleansing 

scoop egypt warned israel rupture relation palestinian flee sinai 

dozen palestinian captured idf stripped paraded gaza 

aryeh king deputy mayor jerusalem israeli municipality express wish done displaced palestinian civilian rounded taken un shelter yesterday northern gaza unknown location 

question within test written refaat al areer student islamic university gaza israeli force sniped journalist yaser murtaja israel killed yaser year ago killed refaat today 

gaza writes back famous book written refaat al areer palestinian writer poet professor activist killed yesterday israeli airstrike targeted home gaza strip 

united state energy envoy reiterated country still hopeful regard normalisation talk israel saudi arabia despite current fallout israel invasion gaza strip atrocity palestinian 

looking information central archive gaza city 

reuters palestinian authority working postwar plan gaza 

family israeli hostage rage netanyahu leaked audio 

american funded 

grandson holocaust survivor fled germany happening gaza genocide 

everyone hamas 

doubt israel article nation calawag 

israel systemic oppression unraveling colonialist genocidal apartheid state 

canada ctv news fired yara jamal palestinian working ctv atlantic region 

western medium war criminal 

king hussein allenby bridge border crossing 

reddit pro israel 

graft push bangladesh 

left gaza zoo 

right 

turkey erdogan call netanyahu gaza 

ceasefire call growing israeli military launch pr blitz capitol hill 

really happened october 

sydney theatre company fire three actor staged gesture palestinian solidarity 

erdogan tell un chief israel must tried international court gaza crime 

separated palestinian use truce reunite loved one 

really serious story father told daughter died thought like death better people gaza month resistance released daughter well condition 

released palestinian child recount torture detention finger breaking amp starvation 

washington post rev


son hamas mosab hassan yousef make rather extreme statement 

merging palestine israel granting palestinian full israeli citizenship 

main difference israeli left right far palestinian concerned 

geoform negev 

name palestine 

people forget 

many deny early zionist literally colonized palestine 

bibi say stuff 

people go apesh israel palestine 

history meaning zionism explained 

matter pathetic evil people 

losing friendship 

shany ecstasy denial palestine war 

need opinion new peace symbol 

arab changed mind conflict 

settelments land theft fall palestinian authority 

israel go way ethnic cleaning 

israel achieved primary military objective 

voice gaza addressing claim israel war radicalize population 

continue take hostage 

arab leader ask palestinian leave home 

think antisemitism 

video hamas shooting killing year old boy 

quote uk parliamentary debate palestinian education 

eliminating hamas realistic short term goal 

people still denial 

discussion israe

know would turn like voted hamas 

whataboutism wrong 

way palestinian behave ironically reminds shylock merchant venice 

suspect important hamas leader may moved via ambulance sometimes 

instagram account follow 

palestinian gaza interest destroy hamas well 

come point ignorance excuse racism 

think likely died health condition 

actually interested know arab perspective israel throughout history 

think last weks tonight show week 

israel raid hospital litmus test 

level education opinion 

people keep chanting provenly wrong accusation israel 

aside political issue would someone randomly anti semitic 

idf say order evacuation khan younis 

jewish palestinian want negociate peace treaty 

israel right exist 

hamas plan noticed 

please least make effort learn history 

question anyone want freepalestine hospital 

israeli military order occupation palestine 

term martyr used 

claim land 

someone explain going 

point sorry 

black humour totally accurate 

training unbe

cut sweet 

bomb al azhar university gaza bombing hospital amp school university bombed strike 

support israel company organization donating million 

beirut child named beirut explosion killed idf gaza 

country recognize state israel israel geopolitics 

company brand support israel surprising revelation israel 

corporate news medium amp war 

israel agree commentator weigh washington post 

gaza conflict israel hamas gaza middleeastconflict ceasefire peace middleeast gaza 

anyone else completely equivocal conflict 

country publicly supporting palestine continent ove 

country supporting israel israel short 

day civilian killed 

dont understand war 

life brutally taken ongoing gaza war making horrific genocide 

nadine gaza question sanity amid onslaught 

israeli occupation force widely use internationally prohibited 

palestinian man search child amid israeli bombardment gaza 

israel alternative project suez canal 

got called hitler saying war primitive support anyone us s

proposal achieving humanitarian ceasefire gaza 

finally understood hamas named operation al aqsa flood 

clothing brand pig 

ben shapiro think palestinian would expelled jesus agree 

zionism something understand 

iranian diaspora racist agianst palestinian much 

fall bogus claim antisemitism taunt antisemitism used silence criticism israel including war crime 

arab putting pressure government end relation israel 


israeli pretending help palestinian pr photoshoots murder right 

always plan 

japan removed hamas terrorist list thought 

guy right 

big picture 

w weeknd 

lie israeli propaganda 

israeli considered human shield per israeli logic 

beware try different way argue hasbara agent 

sudan forgotten 

major indian trade union call boycott israel 

ilan pape israeli historian speaks 

using argument people recently 

thought treatment yemeni jew israel 

hmm tell gazans move south 

true 

excerpt black paper jewish agency zionist terrorism memorandum un march 

zionis


idf soldier confessing killing kid 

israel starvation used weapon war gaza 

brigade clashing iof force gaza city 

ongoing ethnic cleansing 

gaza blockade started 

uber driver massive w 

israel kill civilian regularly 

uaw president fain dsa congresspeople hold press conference falsely posture opponent gaza genocide 

gaza emergency worker break tear cradling baby short 

israeli settler attack palestinian amp soldier escorting fire south hebron hill 

like happened vietnam tet offensive public aware truth palestine 

israel keep body palestinian 

explicitly defend hitler ideology military academy idf rabbi 

israeli columnist complaining US supposedly investigating idf committing war crime israeli truly something else 

brigade attack convoy israel vehicle gaza 

new video resistance response found big tunnel israeli propaganda yesterday gaza 

new epic footage resistance hitting iof soldier footage iof soldier camera 

slice life apartheid israel 

update training adl 

david

leader hamas allied plo would support war resistance 

zionism bad thing oppression really source issue 

zionist say war war 

nova movie hunt soul 

hamas health ministry bbc 

israel operating apartheid state west bank 

indigenous thing 

destroying hamas help improve israeli relationship arab state 

israeli violence radicalized hamas 

biden say israel bombing gaza indiscriminately 

palestinian authority caught lying death toll gaza v ukraine 

israel survive current form 

thought usa west funding palestinian terrorism 

brutal killing tanzanian student joshua mollel hamas 

response israel hizbollah start full scale war 

worry israel care iran year 

israel right inflict high level human suffering depravity 

birth israel history zionism 

strategic interest war 

israel stop democracy 

netanyahu proud preventing palestinian state 

idf soldier kill two christian woman inside church 

according nearly half dropped bomb unguided 

realize constant justification israel action 

oct mini ongoing israeli crime since 

always grateful 

un report woman child main victim war killed 

israel getting dragged back court new charge 

houston continue take street demand ceasefire total liberation palestine end imperialism worldwide join us sunday pm corner post oak westheimer 

smotrich interview talk annexation vision palestinian either sing hatikva national anthem leaf fight defeat live torah chazal wrote future jerusalem expand damascus 

los angeles happening hundred car pch caravan headed lax international airport shut street business usual genocide 

israeli idf student columbia university sprayed skunk water pro palestinian protester campus college anything skunk water described cross dead animal human waste used idf palestinian go away day see slide 

recent state dept update tension 

actual conversation 

US air strike yemen mean gaza escalating conflict west asia israel take accountability prosecuted crime palestinian 

netanyahu apparently lied biden rulin


palestinian gaza thank south africa israel war gaza 

post instagram grown twenty year past day 

six operating ambulance left gaza one every thousand people let sink 

jewish house democrat grill israeli ambassador minister 

tour 

war undeniable profit 

adelaide australia jan 

israeli defence fumble day icj proceeding 

printable 

guy call sa legal team monkey belief israeli god warrior 

side 

woman gaza fighting family life 

reliable unbiased medium covering news war 

israel jewish state ethnostate 

best spend money trying help 

number 

grandmother shot waving white flag 

israel bombing across gaza strip intensively imposing telecom blackout 

gaza ministry health saying ambulance left ask international community help rebuild healthcare gaza israel targeted siege attack hospital 

many west easily ignore genocide genocide 

imagine inhumane use one video random shawarma seller city million people claim starvation 

israeli government advertising agency google ad since o

confused frustrated 

message heart hope peace 

israel bombing gaza indiscriminately attack targeted terrorist infrastructure 

significant online hatred towards jew israel 

network private city palestine russia africa syria 

icj court case 

two state solution best option one state would 

military misconception 

discussion 

future palestinian state required absorb jew west bank 

judge genocide 

honest question settlement west bank actually illegal part ii 

protester paid take part demonstration canada 

today israel killed journalist 

issue 

movement israel retake promised land 

humanitarian camp gazan civilian 

discussion hamas training child soldier 

israel control usa 

opinion hamas surrendering best thing palestinian cause 

hamas leader begs donation kill jew 

settler killed palestinian teen israeli force stop news article 

protester practically helping palestinian 

israel supporter attacked hamas supporter bay area 

much ado nothing bogus claim israel indiscri

### Let's take a look at our corpus vocabulary 

In [24]:
from nltk.probability import FreqDist

In [25]:
all_titles = dfp_f['title'].str.cat(sep=' ')
corpus_words = word_tokenize(all_titles)
corpus_freqdist = FreqDist(corpus_words)

In [26]:
print("Total number of words is : ", len(corpus_freqdist.keys()))


Total number of words is :  7297


In [27]:
print("Dictionary of vocabulary : ") 
# Sort the dictionary by values in descending order
sorted_freq_dist = dict(sorted(corpus_freqdist.items(), key=lambda item: item[1], reverse=True))

for word, frequency in sorted_freq_dist.items():
    print(f"{word}: {frequency}")

Dictionary of vocabulary : 
israel: 1119
gaza: 961
israeli: 770
palestinian: 724
palestine: 510
hamas: 454
war: 338
people: 248
genocide: 215
idf: 178
child: 170
hostage: 160
killed: 157
soldier: 156
state: 152
say: 149
conflict: 149
west: 144
US: 142
support: 139
video: 125
zionist: 123
civilian: 119
one: 118
jew: 116
bank: 114
right: 113
hospital: 109
new: 109
attack: 107
day: 103
world: 101
ceasefire: 99
jewish: 97
think: 95
military: 94
peace: 93
family: 89
south: 89
free: 86
occupation: 85
force: 83
journalist: 83
un: 80
american: 80
amp: 80
crime: 80
arab: 79
call: 78
year: 76
like: 76
solution: 75
settler: 75
want: 75
question: 74
iof: 72
woman: 72
death: 71
show: 71
would: 71
home: 69
medium: 69
country: 68
city: 68
october: 68
solidarity: 67
news: 66
stop: 65
time: 63
protest: 62
international: 62
two: 62
netanyahu: 61
footage: 61
resistance: 61
claim: 59
terrorist: 59
al: 58
bombing: 57
take: 57
killing: 57
thought: 57
make: 56
need: 56
zionism: 54
history: 54
muslim: 54
live

poet: 5
reuters: 5
rage: 5
audio: 5
scholar: 5
secretary: 5
flight: 5
trade: 5
mocking: 5
carried: 5
angry: 5
author: 5
noticed: 5
criticism: 5
popular: 5
forcing: 5
abused: 5
glasgow: 5
unfortunately: 5
equal: 5
belief: 5
review: 5
admit: 5
accurate: 5
shift: 5
graffiti: 5
wonder: 5
abby: 5
prominent: 5
defeat: 5
playing: 5
abducted: 5
returned: 5
refusal: 5
friendly: 5
port: 5
reminder: 5
boycotting: 5
announces: 5
organ: 5
poem: 5
jail: 5
keffiyeh: 5
declaration: 5
logic: 5
dog: 5
h: 5
job: 5
bus: 5
learn: 5
proud: 5
shock: 5
storm: 5
fine: 5
suspended: 5
holding: 5
accountable: 5
straight: 5
viral: 5
morning: 5
denying: 5
expelled: 5
conscience: 5
belonging: 5
greater: 5
asian: 5
chair: 5
present: 5
longer: 5
constant: 5
usual: 5
trial: 5
display: 5
resist: 5
driver: 5
toronto: 5
stage: 5
violation: 5
command: 5
partition: 5
removing: 5
grandmother: 5
demolition: 5
commander: 5
posted: 5
colonizer: 5
adl: 5
theft: 5
stealing: 5
thursday: 5
caused: 5
caught: 5
office: 5
organisation

nsfw: 3
range: 3
clown: 3
trudeau: 3
football: 3
database: 3
movent: 3
neither: 3
arweave: 3
perfect: 3
inhuman: 3
scott: 3
ritter: 3
door: 3
provides: 3
grab: 3
lone: 3
forgotten: 3
gaz: 3
service: 3
exposing: 3
hoping: 3
corridor: 3
bar: 3
laid: 3
shelling: 3
undeniable: 3
listen: 3
norman: 3
evacuating: 3
wonderful: 3
advisor: 3
gay: 3
fit: 3
alternative: 3
heavily: 3
dignity: 3
commits: 3
wound: 3
escalating: 3
per: 3
daddy: 3
boyfriend: 3
truly: 3
infant: 3
mia: 3
fly: 3
banner: 3
thats: 3
played: 3
ali: 3
gen: 3
unite: 3
followed: 3
board: 3
nuke: 3
nahal: 3
spike: 3
clearing: 3
ryanair: 3
condemning: 3
crescent: 3
donate: 3
brainwashed: 3
hamza: 3
harass: 3
buying: 3
handle: 3
temple: 3
guernica: 3
course: 3
associated: 3
protested: 3
wsj: 3
iranian: 3
bogus: 3
mock: 3
yassin: 3
advanced: 3
actress: 3
small: 3
include: 3
golani: 3
expressing: 3
indiscriminate: 3
academy: 3
suggesting: 3
diaper: 3
struck: 3
sample: 3
isolated: 3
visual: 3
mourn: 3
pflp: 3
bloodbath: 3
elderly: 3


peacefully: 2
embarrassing: 2
tattoo: 2
band: 2
tomorrow: 2
farm: 2
doviv: 2
heroic: 2
punish: 2
executed: 2
represent: 2
banning: 2
embassy: 2
nut: 2
hollywood: 2
shelf: 2
empire: 2
willing: 2
joy: 2
manufacturing: 2
vow: 2
inequality: 2
zoom: 2
anatomy: 2
gop: 2
shadow: 2
barghouti: 2
connected: 2
investigating: 2
violently: 2
mobile: 2
squad: 2
fashion: 2
repeatedly: 2
eyewitness: 2
injuring: 2
hadid: 2
feed: 2
pointing: 2
secularism: 2
yes: 2
longstanding: 2
fail: 2
pdx: 2
desperate: 2
imperialism: 2
leverage: 2
administrative: 2
locked: 2
registered: 2
contrived: 2
grayzone: 2
exists: 2
dock: 2
sanction: 2
ra: 2
reminded: 2
heartless: 2
openai: 2
tal: 2
mercy: 2
teddy: 2
bear: 2
alec: 2
iceland: 2
contest: 2
imposing: 2
flagged: 2
headed: 2
stood: 2
plo: 2
textbook: 2
train: 2
maritime: 2
alliance: 2
netherlands: 2
norway: 2
responds: 2
bobby: 2
sand: 2
joint: 2
imperialist: 2
journey: 2
pastureland: 2
eng: 2
essence: 2
escorted: 2
munther: 2
editor: 2
allowing: 2
provocation: 2
d

punt: 1
confront: 1
divest: 1
powerless: 1
largely: 1
culled: 1
intro: 1
intercept: 1
carter: 1
smith: 1
breaching: 1
hummus: 1
facility: 1
extension: 1
exploiting: 1
amazon: 1
etsy: 1
outlook: 1
takedown: 1
ari: 1
jeffrey: 1
epstein: 1
ghislaine: 1
maxwell: 1
blackmail: 1
completing: 1
insist: 1
bold: 1
beydoun: 1
pls: 1
pilipinas: 1
theater: 1
performance: 1
persecution: 1
sensible: 1
library: 1
literary: 1
hub: 1
gunner: 1
visualization: 1
cycle: 1
jamaican: 1
tahani: 1
mustafa: 1
qamar: 1
necessity: 1
limb: 1
trained: 1
container: 1
ataa: 1
jaber: 1
realty: 1
jerk: 1
reinforced: 1
laudable: 1
federated: 1
ivy: 1
problematic: 1
falsification: 1
monstrous: 1
pists: 1
nuanced: 1
quora: 1
critic: 1
confederacy: 1
crossfire: 1
celebration: 1
appearing: 1
maid: 1
irl: 1
unfollowed: 1
decent: 1
solving: 1
hinkle: 1
maniac: 1
christopher: 1
jpost: 1
righteous: 1
web: 1
smokescreen: 1
reasoning: 1
attractive: 1
collegiate: 1
desensitize: 1
calculate: 1
peep: 1
rogue: 1
motif: 1
amnesty: 1
j

unpaid: 1
exonerated: 1
understood: 1
agianst: 1
taunt: 1
khamaaas: 1
pretending: 1
photoshoots: 1
sudan: 1
pape: 1
memorandum: 1
glimpse: 1
monitoring: 1
trademark: 1
popcorn: 1
candance: 1
hated: 1
kufiya: 1
dubbed: 1
urban: 1
warfare: 1
adhan: 1
expo: 1
saga: 1
mot: 1
maria: 1
ch: 1
buck: 1
wild: 1
chancellor: 1
noo: 1
semen: 1
paslastinan: 1
appease: 1
draw: 1
spyware: 1
nso: 1
espouse: 1
convenient: 1
raf: 1
akrotiri: 1
declassified: 1
deadliest: 1
levelled: 1
intercepted: 1
passenger: 1
falfoul: 1
toon: 1
almighty: 1
ohhh: 1
antithetical: 1
wei: 1
oz: 1
gymnastics: 1
inflames: 1
civilize: 1
skidrow: 1
purchased: 1
downplay: 1
uefa: 1
aktürkoğlu: 1
twice: 1
ziyech: 1
awarded: 1
yall: 1
io: 1
crap: 1
ck: 1
akp: 1
helpless: 1
ashamed: 1
outspoken: 1
guinelly: 1
teenager: 1
hadath: 1
journal: 1
memo: 1
suck: 1
merry: 1
banned: 1
pulling: 1
manipulates: 1
desecrating: 1
fairouz: 1
bessan: 1
issam: 1
censured: 1
cbc: 1
rev: 1
jesse: 1
marketing: 1
comply: 1
finest: 1
mischaracterized: 

periodical: 1
jewry: 1
terrorize: 1
sell: 1
tobias: 1
huch: 1
applies: 1
sacco: 1
glory: 1
shitrael: 1
orthodox: 1
epiphany: 1
madafeh: 1
spy: 1
undertaking: 1
theobald: 1
wolfe: 1
tone: 1
connolly: 1
marxhing: 1
blockaded: 1
borrell: 1
financed: 1
pink: 1
dell: 1
overwhelmingly: 1
deborah: 1
harrington: 1
hizballah: 1
amal: 1
saad: 1
ei: 1
davos: 1
swiss: 1
forum: 1
recruitment: 1
brough: 1
recruit: 1
component: 1
emmaline: 1
blake: 1
poignant: 1
petach: 1
welcomed: 1
baruchin: 1
derogatory: 1
slur: 1
whore: 1
cancer: 1
slut: 1
reception: 1
expression: 1
compassion: 1
shoumanmansour: 1
expensive: 1
dismissal: 1
lobbyist: 1
orchestrating: 1
renowned: 1
samia: 1
halaby: 1
retrospective: 1
mater: 1
indiana: 1
tolerate: 1
ounce: 1
influential: 1
motorcyclist: 1
shireen: 1
aqleh: 1
unwrapping: 1
sweetness: 1
instance: 1
continuously: 1
lieb: 1
nbc: 1
ocean: 1
coordinate: 1
nurit: 1
portrayal: 1
text: 1
wished: 1
stalking: 1
adviser: 1
insists: 1
noor: 1
harazeen: 1
nonchalant: 1
suposedly:

refer: 1
separately: 1
foolishness: 1
prosecution: 1
deescalate: 1
pretend: 1
fafo: 1
liar: 1
irrelevant: 1
psychological: 1
radicalism: 1
fragility: 1
overthrow: 1
progressivism: 1
wane: 1
seed: 1
interaction: 1
costly: 1
confronted: 1
samaria: 1
pricing: 1
navy: 1
arabian: 1
unreliable: 1
wb: 1
wary: 1
intolerance: 1
answered: 1
virulent: 1
relocated: 1
deport: 1
arguements: 1
prompt: 1
axis: 1
collection: 1
haviv: 1
rettig: 1
gur: 1
involves: 1
pappé: 1
tout: 1
revival: 1
reader: 1
pursue: 1
rusila: 1
traume: 1
switched: 1
marries: 1
timely: 1
hardliner: 1
reliability: 1
incompetent: 1
harbor: 1
centered: 1
buzzword: 1
emigration: 1
baffled: 1
monumental: 1
centrist: 1
ikhwan: 1
islamism: 1
franchise: 1
barrier: 1
verifies: 1
concludes: 1
rein: 1
portmanteau: 1
philistine: 1
greek: 1
critique: 1
dive: 1
implementing: 1
winner: 1
honestly: 1
quantifying: 1
orgins: 1
successfully: 1
minimized: 1
definitely: 1
picked: 1
wafa: 1
correy: 1
greatly: 1
affect: 1
enabling: 1
grievance: 1
un

### Create lists for each category  and label rows 

In [28]:
# Lists for categories
#politics and diplomacy
P_D = ['us', 'uk', 'international', 'egypt', 'countries', 'government','relation', 'un' 
        'country' , 'president' , 'king','iran' , 'leader' , 'biden' , 'russia' , 'minister', 
       'germany' , 'company' , 'allegation' , 'corruption' , 'dictatorship' , 'lobby' , 'patronage' ,
       'abrogate' , 'accreditation', 'alliance', 'ambassador', 'diplomacy' , 'politics' , 'geopolitics', 
        'lobbies' , 'congressman' , 'democracy' , 'politician' , 'political' , 'leader' , 'democratic' , 
       'jordan', 'canada' , 'border' , 'congress' , 'china' , 'yeman' , 'netanyahu' , 'parliament', 'mayor',
       'obama'
      ]
#violence and humanitarian crisis
V_HC = ['hostage', 'children', 'genocide', 'hospital','attack', 'bombing', 'released', 'women',
         'killing', 'humanitarian', 'shot', 'home',  'death' , 'massacre' , 
         'dead', 'kids', 'crimes', 'refugee', 'rape', 'violence', 'prisoner',
         'water' , 'food' ,'medical', 'genocide' ,'abuse' , 'aid' , 'fear' , 'violent' , 'humanity',  
         'crisis' , 'kill' , 'strikes' , 'bomb' , 'massacre' , 'die' , 'emotional' , 'buried', 'hurt',
           'beatings' ,'murdering' , 'starvation' , 'trauma' , 'heartbreaking', 'slaughter']
#media and public opinion 
M_PO = ['propaganda' , 'post' ,  'footage' , 'opinion', 'message' , 'exchange' ,
        'media' , 'twitter' , 'facebook' , 'social media' , 'share' , 'follower' , 'story', 'journalist',
        'tv', 'petition' , 'donation' , 'commentary' , 'fans' ,'website', 'tweet' , 'sharing' , 'newspaper',
         'documentary' , 'interview' , 'reddit' , 'discussion'  , 'telegraph' , 'elon' , 'zuckerberg',
          'musk' , 'tiktok' , 'instagram' , 'article' , 'bbc' , 'cnn' , 'articles' , 'understand' , 'understanding', 'pr', 
           'jazeera', 'boycott', 'x' , 'tweet']

In [29]:
# Assuming dfp is your DataFrame with columns 'title' and 'category'
dfp_f['category'] = ''

# Categorize based on title
for index, row in dfp_f.iterrows(): 
    title = row['title'].lower()
    
    # Check for Politics and Diplomacy
    if any(word in title.split() for word in P_D):
        dfp_f.at[index, 'category'] = 'P_D'
    
    # Check for Media and Public Opinio
    elif any(word in title.split() for word in M_PO):
        dfp_f.at[index, 'category'] = 'M_PO'
    
    # Check for Violence and Humanitarian Crisis
    elif any(word in title.split() for word in V_HC):
        dfp_f.at[index, 'category'] = 'V_HC'





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfp_f['category'] = ''


In [30]:
dfp_vhc = dfp_f[dfp_f['category'] == 'V_HC']
titles_vhc = dfp_vhc['title'].tolist()

for title in titles_vhc:
    print(title ,'\n')

gaza writes back famous book written refaat al areer palestinian writer poet professor activist killed yesterday israeli airstrike targeted home gaza strip 

american funded israeli bomb break child skull stomach intestine spill father grief loss 

power genocide convention 

hamas kindness hostage puppy 

real face humanity 

shocking image israel continued genocide palestinian supposed humanitarian 

bomb israeli presenter rotem achihun facing backlash joking israeli soldier mocking plight palestinian signing tank shell set bomb palestine gaza 

large number displaced palestinian killed israel bombed unrwa school near indonesian hospital north gaza 

nonstop massacre palestinian child civilian israeli angry yet world 

father searching child israeli attack find killed end angry 

israeli settler aided soldier attack resident set fire home um safa ramallah 

photo idf arresting palestinian civilian forcing strip clothes subjecting abuse beit lahia northern gaza strip 

belgium deny en

think yemen going roll face obtuse belligerent violence think support palestine religion clue well yemen know brutality inhumanity colonisation take end yemen remembers 

ocha opt palestine people eaten day child winter clothes medical care five family staying one tent olga cherevko ocha staff speaking gaza day war 

occupy cleanse settle case missed ten statement israeli official declaring ordering committing genocide 

israel pay google manipulate search result icj genocide case 

home one selected great renovation let look deserves israeli soldier randomly selecting civilian home blowing 

many hospital targeted gaza 

david cameron saying intent genocide followed load evidence proving wrong 

care ship getting delayed attempt stop genocide oppressed people happening 

solidarity life death 

bangladesh support south africa genocide case israel icj 

cat food suggestion bd list 

gaza genocide parallel armenian genocide 

entire family gaza surviving one plate food per day un warns 

In [31]:
dfp_vhc.shape

(892, 7)

In [32]:
dfp_pd = dfp_f[dfp_f['category'] == 'P_D']
titles_pd = dfp_pd['title'].tolist()

for title in titles_pd:
    print(title ,'\n')

scoop egypt warned israel rupture relation palestinian flee sinai 

aryeh king deputy mayor jerusalem israeli municipality express wish done displaced palestinian civilian rounded taken un shelter yesterday northern gaza unknown location 

family israeli hostage rage netanyahu leaked audio 

palestinian child agonizing pain bomb particle biden secretary state anthony blinken wholly responsible injured child 

son israeli war cabinet minister killed northern gaza 

israeli hostage afraid government hamas 

turkish president erdogan warns israel threat assassinate hamas member türkiye return flight qatar trip 

biden staffer turning israel support spoke dissenter 

ireland favour recognition state palestine say prime minister 

iof think US gone far propaganda 

geopolitical economy report ben norton US congress back israel plan depopulate gaza kill palestinian day US arm 

easy step control US 

strategy new international popular uprising 

US israeli mass rape propaganda without credib


celebrity endorsing genocide make us lose respect respect begin 

US government employee plan walkout biden gaza policy 

israel already feeling weight houthi justice better worse houthis political military force anything practical demand israel pay act 

effort smear penn president started well push oust liz magill began university hosted festival celebrating palestinian literature 

former german colony namibia condemns germany defending israel genocide case 

biden running patience bibi gaza war hit day 

israeli occupation leaf two family livestock homeless community wadi jordan valley 

first lady namibia monica geingos genocide namibia perpetrated germany started january absurdity germany january rejecting genocide charge 

norman finkelstein analysis icj politics 

international law expert francis boyle strongly belief south africa win case state 

palestinian kid egyptian border chanting god sake go egyptian egyptian solder border help 

cool guide new york time washington pos

In [33]:
dfp_pd.shape

(694, 7)

In [34]:
dfp_mpo = dfp_f[dfp_f['category'] == 'M_PO']
titles_mpo = dfp_mpo['title'].tolist()

for title in titles_mpo:
    print(title ,'\n')

question within test written refaat al areer student islamic university gaza israeli force sniped journalist yaser murtaja israel killed yaser year ago killed refaat today 

message received reefat death 

twitter blocking community note user write note account 

heartfelt word journalist raafat shared x occupation force murdered must die let bring hope let tale 

please sign petition end using tax money kill palestinian 

breaking report palestinian journalist gaza israeli occupation force carried mass execution civilian northern gaza strip besieging day 

telegraph militantly genocide 

dan kovalik solution impossible activist journalist author voiced opinion conflict 

journalist diya kahlot one civilian hostage iof claim threat israel 

journalist diaa bureau chief newspaper gaza appears photo 

news outlet new arab bureau chief journalist dia kahlout among kidnapped israel 

nikki haley every minute someone watch tiktok every day become 


west bank ramallah friend school choir ch

In [35]:
dfp_mpo.shape

(579, 7)

In [36]:
dfp_labeled = dfp_f[dfp_f['category'] != '']
dfp_labeled.shape

(2165, 7)

In [37]:
dfp_labeled.to_excel("labeledposts1.xlsx")