**USAID FUNDING SENTIMENTAL ANALYSIS**

**INTRODUCTION**

The United States Agency for International Development (USAID) plays a pivotal role in supporting global development through strategic funding initiatives. In recent years, discussions around the withdrawal or reduction of USAID funding have sparked significant public response — both within the United States and in countries that have long relied on this aid.

This project seeks to analyze the public sentiment surrounding USAID funding, particularly in relation to controversial political decisions or policy shifts. By leveraging social media data, news articles, and other textual sources, we aim to uncover how these funding changes are perceived by various stakeholders — including American citizens, government officials, development workers, and beneficiaries in aid-receiving countries such as Kenya.

Through natural language processing (NLP) techniques, we perform sentiment classification and thematic analysis to better understand the emotional tone, concerns, and priorities reflected in public discourse. The findings from this analysis can offer valuable insights into the social and geopolitical implications of foreign aid decisions and their real-world human impact.


## Data Merging.

**1. News Data**

In [None]:
import os
import pandas as pd

In [3]:
import os
import pandas as pd

# Define raw data folder path
raw_folder = os.path.join("..", "data", "raw")

# List all CSV files in the raw folder
csv_files = [f for f in os.listdir(raw_folder) if f.endswith('.csv')]

# Separate news and reddit files by filename
news_files = [f for f in csv_files if 'news' in f.lower()]
reddit_files = [f for f in csv_files if 'reddit' in f.lower()]

# Load and merge news CSVs
news_dfs = [pd.read_csv(os.path.join(raw_folder, f)) for f in news_files]
merged_news = pd.concat(news_dfs, ignore_index=True)
merged_news.to_csv(os.path.join(raw_folder, "merged_news.csv"), index=False)

# Load and merge reddit CSVs
reddit_dfs = [pd.read_csv(os.path.join(raw_folder, f)) for f in reddit_files]
merged_reddit = pd.concat(reddit_dfs, ignore_index=True)
merged_reddit.to_csv(os.path.join(raw_folder, "merged_reddit.csv"), index=False)

print(f"Merged {len(news_files)} news files and {len(reddit_files)} reddit files.")

Merged 4 news files and 4 reddit files.


In [4]:
# Load merged datasets
merged_news = pd.read_csv(os.path.join(raw_folder, "merged_news.csv"))
merged_reddit = pd.read_csv(os.path.join(raw_folder, "merged_reddit.csv"))

# Preview the first 5 rows of each dataset
print("News Data Preview:")
display(merged_news.head())

print("\nReddit Data Preview:")
display(merged_reddit.head())

News Data Preview:


Unnamed: 0,keyword,source,author,title,publishedAt,summary,text,url,description,content,urlToImage,published_at,full_text
0,usaid kenya,Al Jazeera English,"['Madison Czopek', 'Amy Sherman']",Has DOGE really saved the US government $180bn?,2025-06-06 00:00:00,President Donald Trump and adviser Elon Musk c...,Elon Musk first claimed the Department of Gove...,https://www.aljazeera.com/news/2025/6/6/has-do...,,,,,
1,usaid kenya,Daily Signal,"['Mike Gonzalez', '.Wp-Block-Co-Authors-Plus-C...",Congress Should Quickly Approve Trump’s Rescis...,2025-06-10 00:00:00,President Donald Trump‘s rescission legislatio...,President Donald Trump‘s rescission legislatio...,https://www.dailysignal.com/2025/06/10/congres...,,,,,
2,usaid kenya,Defense One,"['Meghann Myers', 'Staff Reporter']","AFRICOM asks for help deterring terrorism, aft...",2025-05-29 21:15:17+00:00,“It is the epicenter of terrorism on the globe...,Deterring the spread of terrorism in Africa an...,https://www.defenseone.com/threats/2025/05/afr...,,,,,
3,usaid kenya,Thisamericanlife.org,[],Some Things We Don't Do Anymore,2025-06-06 09:29:47-04:00,Two Americans moved to Eswatini when that coun...,Two Americans moved to Eswatini when that coun...,https://www.thisamericanlife.org/862/some-thin...,,,,,
4,usaid kenya,Biztoc.com,[],BizToc,,"Tech stocks, led by Nvidia and Microsoft, drov...",President Trump abruptly terminated all U.S. t...,https://biztoc.com/x/6c16ca23e701790a,,,,,



Reddit Data Preview:


Unnamed: 0,subreddit,keyword,title,text,date_posted,upvotes,comments,url,permalink,search_term,created_utc,created_date,score,num_comments,created,selftext
0,Kenya,"USAID, USAID Repercussions",USAID Repercussions + Economy,My neighbour’s wife was a very big shot in USA...,2025-05-14,13.0,33.0,https://www.reddit.com/r/Kenya/comments/1kmhn8...,https://reddit.com/r/Kenya/comments/1kmhn87/us...,,,,,,,
1,Kenya,Trump cuts,Bill Gates 'horrified' by Trump cuts to US aid...,,2025-05-08,1.0,1.0,https://www.semafor.com/article/05/08/2025/bil...,https://reddit.com/r/Kenya/comments/1khpakg/bi...,,,,,,,
2,Kenya,"foreign aid, foreign aid",Foreign aid/Philanthropy,I see the aid or philanthropic activities that...,2025-04-27,1.0,0.0,https://www.reddit.com/r/Kenya/comments/1k91v5...,https://reddit.com/r/Kenya/comments/1k91v58/fo...,,,,,,,
3,Kenya,USAID,"USAID left a month ago, do we have ARVs in Kenya?",Someone on a different group (different websit...,2025-04-15,3.0,5.0,https://www.reddit.com/r/Kenya/comments/1jzrn2...,https://reddit.com/r/Kenya/comments/1jzrn2s/us...,,,,,,,
4,Kenya,USAID,Classism in r/Kenya and r/nairobi,The classism I'm seeing in both subs is a good...,2025-04-07,168.0,95.0,https://www.reddit.com/r/Kenya/comments/1jtcvb...,https://reddit.com/r/Kenya/comments/1jtcvbx/cl...,,,,,,,


In [5]:
# Check the number of rows in each merged dataset
print("Number of rows in merged_news:", len(merged_news))
print("Number of rows in merged_reddit:", len(merged_reddit))

Number of rows in merged_news: 3936
Number of rows in merged_reddit: 438


In [6]:
merged_news.info()
merged_reddit.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3936 entries, 0 to 3935
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   keyword       3847 non-null   object
 1   source        3936 non-null   object
 2   author        3551 non-null   object
 3   title         3932 non-null   object
 4   publishedAt   2220 non-null   object
 5   summary       558 non-null    object
 6   text          558 non-null    object
 7   url           3936 non-null   object
 8   description   3341 non-null   object
 9   content       3374 non-null   object
 10  urlToImage    1737 non-null   object
 11  published_at  1498 non-null   object
 12  full_text     1194 non-null   object
dtypes: object(13)
memory usage: 399.9+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 438 entries, 0 to 437
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   subreddit     438 non-nu

### Missing values

In [7]:
#Checking for mising values in the merged datasets
print("Missing values in merged_news:")
print(merged_news.isnull().sum())
print("\nMissing values in merged_reddit:")
print(merged_reddit.isnull().sum())



Missing values in merged_news:
keyword           89
source             0
author           385
title              4
publishedAt     1716
summary         3378
text            3378
url                0
description      595
content          562
urlToImage      2199
published_at    2438
full_text       2742
dtype: int64

Missing values in merged_reddit:
subreddit         0
keyword         168
title             0
text            165
date_posted     168
upvotes         168
comments        168
url               0
permalink        17
search_term     287
created_utc     287
created_date    287
score           270
num_comments    287
created         421
selftext        431
dtype: int64


In [8]:
#Checking for duplicate rows in the merged datasets
print("Duplicate rows in merged_news:", merged_news.duplicated().sum())
print("Duplicate rows in merged_reddit:", merged_reddit.duplicated().sum())


Duplicate rows in merged_news: 98
Duplicate rows in merged_reddit: 36
