# Data Collection Experimentation
- The announcement of the USAID funding cuts was made on March 28, 2025
## 1. Environments
- tweepy, PRAW requests (pip install)

## 2. Reddit Data collection
- Create connection with Reddit: [link to create script->(https://www.reddit.com/prefs/apps)](https://www.reddit.com/prefs/apps)
- Get `client_id`, `client_secret`
```
reddit = praw.Reddit(
    client_id='tWd763iMgmjp8YamFF96Wg',
    client_secret='	XmaO9oO_kioW-8DzCLipxz-A4hffSA',
    user_agent='usaid-sentiment-KE'
)

```

In [None]:
import praw
import pandas as pd
from datetime import datetime, timezone

reddit = praw.Reddit(
    client_id='tWd763iMgmjp8YamFF96Wg',
    client_secret='XmaO9oO_kioW-8DzCLipxz-A4hffSA',
    user_agent='usaid-sentiment-KE'
)

# --- PARAMETERS ---
subreddits = ['Kenya', 'EastAfrica']

"""keywords = [
    "USAID", "usaid", 
    "foreign aid", "foreign assistance", 
    "donor", "donour", 
    "funding", "funds", 
    "budget cuts", "aid cuts", 
    "development aid", 
    "healthcare", "health care", 
    "NGOs", "nonprofits", "non-profits"
]"""


keywords = [
    "USAID", "usaid", 
    "foreign aid", "foreign funding"  
]

# Combine keywords and phrases
search_terms = keywords

# Earliest date (after funding cuts) → March 28, 2025
cutoff_date = datetime(2025, 3, 28, tzinfo=timezone.utc).timestamp()

# --- SCRAPING ---
data = []

for sub in subreddits:
    subreddit = reddit.subreddit(sub)
    print(f"Searching r/{sub}...")
    
    for term in search_terms:
        try:
            for post in subreddit.search(term, sort='new', limit=200):
                if post.created_utc < cutoff_date:
                    continue  # skip posts before March 28, 2025
                
                data.append({
                    'subreddit': sub,
                    'search_term': term,
                    'title': post.title,
                    'text': post.selftext,
                    'created_utc': post.created_utc,
                    'created_date': datetime.fromtimestamp(post.created_utc),
                    'score': post.score,
                    'num_comments': post.num_comments,
                    'permalink': f"https://reddit.com{post.permalink}",
                    'url': post.url
                })
        except Exception as e:
            print(f"Error searching term '{term}' in r/{sub}: {e}")

# --- SAVE TO CSV ---
df = pd.DataFrame(data)
df['created_date'] = pd.to_datetime(df['created_utc'], unit='s')
df.to_csv('../data/raw/leo_reddit_posts.csv', index=False)

print(f" Scraped {len(df)} posts. Saved to ../data/raw/leo_reddit_posts.csv'.")


Searching r/Kenya...
Searching r/EastAfrica...
✅ Scraped 24 posts. Saved to ../data/raw/leo_reddit_posts.csv'.


In [2]:
df.sample(random_state=42, n=10)

Unnamed: 0,subreddit,search_term,title,text,created_utc,created_date,score,num_comments,permalink,url
8,Kenya,usaid,Economy,For the experts in matters economy and finance...,1743959000.0,2025-04-06 17:01:41,1,15,https://reddit.com/r/Kenya/comments/1jsytyp/ec...,https://www.reddit.com/r/Kenya/comments/1jsyty...
16,Kenya,foreign funding,Be very cautious of the UAE,Kasongo has been cozying up to the UAE recentl...,1748512000.0,2025-05-29 09:51:51,32,13,https://reddit.com/r/Kenya/comments/1ky6sma/be...,https://www.reddit.com/r/Kenya/comments/1ky6sm...
0,Kenya,USAID,USAID Repercussions + Economy,My neighbour’s wife was a very big shot in USA...,1747235000.0,2025-05-14 15:11:02,12,32,https://reddit.com/r/Kenya/comments/1kmhn87/us...,https://www.reddit.com/r/Kenya/comments/1kmhn8...
18,Kenya,foreign funding,Daily Nation,,1747811000.0,2025-05-21 07:09:24,1,8,https://reddit.com/r/Kenya/comments/1krrnpb/da...,https://www.reddit.com/gallery/1krrnpb
11,Kenya,foreign aid,Is There a Better Way to Fund Africa’s Infrast...,I'm researching a fintech concept rooted in a ...,1745161000.0,2025-04-20 14:49:50,9,9,https://reddit.com/r/Kenya/comments/1k3o7to/is...,https://www.reddit.com/r/Kenya/comments/1k3o7t...
9,Kenya,usaid,EX-USAID people!! Let's talk,Are you still in contact with the organisation...,1743880000.0,2025-04-05 19:09:10,2,0,https://reddit.com/r/Kenya/comments/1jsb149/ex...,https://www.reddit.com/r/Kenya/comments/1jsb14...
13,Kenya,foreign aid,Kibaki ALSO failed us,\nThere is a tendency to over-exaggerate the p...,1743470000.0,2025-04-01 01:12:42,120,124,https://reddit.com/r/Kenya/comments/1jojl2f/ki...,https://www.reddit.com/r/Kenya/comments/1jojl2...
1,Kenya,USAID,"USAID left a month ago, do we have ARVs in Kenya?",Someone on a different group (different websit...,1744723000.0,2025-04-15 13:16:53,3,5,https://reddit.com/r/Kenya/comments/1jzrn2s/us...,https://www.reddit.com/r/Kenya/comments/1jzrn2...
21,Kenya,foreign funding,"Like It or Not, Here's Why Ruto Will Win in 20...","CALL ME WHATEVER YOU WANT, BUT HERE'S THE BARE...",1745140000.0,2025-04-20 09:06:35,0,9,https://reddit.com/r/Kenya/comments/1k3ijdu/li...,https://www.reddit.com/r/Kenya/comments/1k3ijd...
5,Kenya,usaid,USAID Repercussions + Economy,My neighbour’s wife was a very big shot in USA...,1747235000.0,2025-05-14 15:11:02,13,32,https://reddit.com/r/Kenya/comments/1kmhn87/us...,https://www.reddit.com/r/Kenya/comments/1kmhn8...


In [3]:
sample = df.sample(n=5,random_state= 42)
display(sample)

Unnamed: 0,subreddit,search_term,title,text,created_utc,created_date,score,num_comments,permalink,url
8,Kenya,usaid,Economy,For the experts in matters economy and finance...,1743959000.0,2025-04-06 17:01:41,1,15,https://reddit.com/r/Kenya/comments/1jsytyp/ec...,https://www.reddit.com/r/Kenya/comments/1jsyty...
16,Kenya,foreign funding,Be very cautious of the UAE,Kasongo has been cozying up to the UAE recentl...,1748512000.0,2025-05-29 09:51:51,32,13,https://reddit.com/r/Kenya/comments/1ky6sma/be...,https://www.reddit.com/r/Kenya/comments/1ky6sm...
0,Kenya,USAID,USAID Repercussions + Economy,My neighbour’s wife was a very big shot in USA...,1747235000.0,2025-05-14 15:11:02,12,32,https://reddit.com/r/Kenya/comments/1kmhn87/us...,https://www.reddit.com/r/Kenya/comments/1kmhn8...
18,Kenya,foreign funding,Daily Nation,,1747811000.0,2025-05-21 07:09:24,1,8,https://reddit.com/r/Kenya/comments/1krrnpb/da...,https://www.reddit.com/gallery/1krrnpb
11,Kenya,foreign aid,Is There a Better Way to Fund Africa’s Infrast...,I'm researching a fintech concept rooted in a ...,1745161000.0,2025-04-20 14:49:50,9,9,https://reddit.com/r/Kenya/comments/1k3o7to/is...,https://www.reddit.com/r/Kenya/comments/1k3o7t...


In [4]:
sample_text= sample['text'].to_list()
for x in sample_text:
    display(x)

'For the experts in matters economy and finance I ask this politely(mnielezee Kama mtoto tafadhali). How is our country still semi functional? Everyday we hear cases of billions lost here billions lost there. Sometime there was reports of I think 1.3 trillion irregularly withdrawn from the treasury, the dollar has surprisingly been stable at around 129 despite all this and there was the case where funding would be halted by the USAID. How has the economy not crashed yet? Is it normal to lose a third of the budget and still have a running country?'

"Kasongo has been cozying up to the UAE recently and as Kenyans we should be very careful here, if you look at their foreign policy they a pattern of fostering chaos and undermining democracy and legitimate governments.\n\n- In Sudan they fund and support the RSF ,in fact there are reports that they are the ones who pushed the RSF into launching the war.\n- In Somalia they support the breakaway region of Somaliland.\n- In Libya they fund and support the warlord, Khalifa Haftar.\n- In Egypt they orchestrated a coup to overthrow Morsy, the only democratically elected leader in Egypt.\n\nI don't know but who's to say that they will not try and help Kasongo in subverting the 2027 elections? After all they wouldn't wanna lose their logistics hub.  As the Swahili say 'Ukiona cha mwenzako kinanyolewa ,chako tia maji' and btw all those countries you see above all thought it couldn't happen to them."

'My neighbour’s wife was a very big shot in USAID and has now lost her job. Children have been removed from big private school. Husband is a big guy at PWC. Lifestyle changes are occurring rapidly as her income has vanished. Thousands of her USAID coworkers were sent home with no salaries. \n\nUSAID Vendors, contractors, non-profits that received funding from them have all been left in a lurch. Sasa machozi zimeanza. \n\nNext is empty apartments around “high class” areas.\n\nUN is laying people off left right and center.\n\nAdditionally, public assistance programs in the Europe and America are being slashed so remittances by a certain sector are falling.\n\nIf you think things are hard, ngojeni mpaka December. A lot of your highlife hotspots are about to close. A lot of these restaurants are about to close. \n\nCrime shall return so please rudini mashambani mulime.\n\nAvoid Mombasa, Lamu and malls.\n\n\n'

''

'I\'m researching a fintech concept rooted in a simple but powerful idea: What if African citizens could directly micro-invest in their own infrastructure and economic development — from as little as $1 — instead of relying so heavily on foreign loans or aid?\n\nThe idea is inspired by:\n\nEthiopia\'s Renaissance Dam, where despite China funding most of the $5B project, citizens contributed around $1B through bonds and mobile payments. It was a unifying act of nation-building.\n\nDenmark’s wind cooperatives, where tens of thousands of Danes co-own wind turbines, investing small amounts and earning steady returns from green energy sales.\n\nArla Foods, one of the world’s largest dairy companies, is owned by thousands of farmer-members across Europe.\n\nPark Slope Food Co-op (Brooklyn, USA) – over 17,000 members run and own this highly successful grocery store. Members contribute labor and share in decision-making and cost savings — a small-scale but high-functioning democratic economic 

- As seen above, some texts in our dataset contains mixed languages; we shall therefore have some preprocessing before analysis

## 3. NewsAPI Data Collection
- Sign up at [newsapi.org](newsapi.org)
- Create account and get api key : `bc6c52fd05ee4e63827b7cf45fa0bdb2`
- Limitations
    - The free version allows me to search back until 2025-05-03, 2 months after USAID funding cuts were already announced
    - Paywalled Websites like Daily Nation won't have content
    - Limited to 100 results

- 

In [17]:
import requests
import pandas as pd
from datetime import datetime

# --- PARAMETERS ---
api_key = 'bc6c52fd05ee4e63827b7cf45fa0bdb2'
query = 'USAID'
from_date = '2025-05-04'  # YYYY-MM-DD
country = 'ke'  # Kenya
page_size = 100  # max per request
max_pages = 1    # you can loop through more if needed

# --- FETCH ARTICLES ---
all_articles = []

for page in range(1, max_pages + 1):
    url = (
        f'https://newsapi.org/v2/everything?'
        f'q={query}&'
        f'from={from_date}&'
        f'sortBy=publishedAt&'
        f'pageSize={page_size}&'
        f'page={page}&'
        f'apiKey={api_key}'
    )

    response = requests.get(url)
    if response.status_code != 200:
        print(f"❌ Error: {response.status_code} - {response.json()}")
        break

    articles = response.json().get('articles', [])
    if not articles:
        break  # no more results

    for article in articles:
        all_articles.append({
            'source': article['source']['name'],
            'author': article.get('author'),
            'title': article.get('title'),
            'description': article.get('description'),
            'content': article.get('content'),
            'url': article.get('url'),
            'published_at': article.get('publishedAt')
        })

df = pd.DataFrame(all_articles)
# --- SAVE TO CSV ---
df = pd.DataFrame(all_articles)
df['published_at'] = pd.to_datetime(df['published_at'])
df.to_csv('../data/raw/leo_newsapi_articles.csv', index=False)

print(f"✅ Fetched {len(df)} articles. Saved to ../data/raw/leo_newsapi_articles.csv.")


✅ Fetched 98 articles. Saved to ../data/raw/leo_newsapi_articles.csv.


In [18]:
df.sample(n=10,random_state=42)

Unnamed: 0,source,author,title,description,content,url,published_at
62,Origo.hu,Origo,Elon Musk kiosztotta Bonót: „Hazudik vagy ostoba”,"Bono, a U2 énekese komoly vitát kavart, amikor...",A beszélgetés a The Joe Rogan Experience cím m...,https://www.origo.hu/nagyvilag/2025/06/elon-mu...,2025-06-01 08:59:59+00:00
40,Eleconomista.es,elEconomista.es,Adiós a Acracia o Frumencio y hola a Aizal o U...,María Carmen y Antonio son los nombres más fre...,María Carmen y Antonio son los nombres más fre...,https://www.eleconomista.es/actualidad/noticia...,2025-06-02 10:28:32+00:00
94,Hotnews.ro,George Sîrbu,Rezultat financiar excelent al Recorder. ”Au c...,Redacția Recorder a realizat în 2024 o creșter...,Redacia Recorder a realizat în 2024 o cretere ...,http://hotnews.ro/rezultat-financiar-excelent-...,2025-05-31 08:01:06+00:00
18,Expansion.com,Sergio Saiz,Los tribunales de EEUU bloquean los grandes pl...,El presidente de EEUU se ha encontrado con una...,El presidente de EEUU se ha encontrado con una...,https://www.expansion.com/juridico/actualidad-...,2025-06-02 18:45:11+00:00
81,BBC News,https://www.facebook.com/bbcnews,Elon Musk'ın Beyaz Saray günleri bitti: Trump'...,Teknoloji milyarderi Elon Musk DOGE'un başında...,"Kaynak, WILL OLIVER/POOL/EPA-EFE/REX/Shutterst...",https://www.bbc.com/turkce/articles/cn5y6477r4do,2025-05-31 13:05:26+00:00
83,Origo.hu,,„A baloldalnak nincs joga a USAID korrupt doll...,Donald Trump amerikai elnök hivatalba lépése ó...,"László András a Fidelitas, az olasz Lega Giova...",https://www.origo.hu/itthon/2025/05/laszlo-and...,2025-05-31 11:30:55+00:00
64,Www.abc.es,(abc),República Democrática del Congo: la fractura p...,En la vasta geografía de la República Democrát...,En la vasta geografía de la República Democrát...,https://www.abc.es/internacional/republica-dem...,2025-06-01 02:37:52+00:00
42,Activistpost.com,Editor,BBC’s ‘independent’ Russian partner begged UK ...,Leaked documents show the supposedly self-reli...,"Mediazona, the self-styled independent Russian...",https://www.activistpost.com/bbcs-independent-...,2025-06-02 10:00:00+00:00
10,Erickimphotography.com,ERIC KIM,Vision.,So it looks like we have crossed the chasm in ...,So it looks like we have crossed the chasm in ...,https://erickimphotography.com/blog/2025/06/02...,2025-06-03 00:10:15+00:00
0,Al Jazeera English,Shola Lawal,Why are humanitarian crises in African countri...,The latest Most Neglected Crises report on wor...,African countries have again topped the list o...,https://www.aljazeera.com/news/2025/6/3/why-ar...,2025-06-03 08:05:37+00:00


In [None]:
"""# --- SAVE TO CSV ---
df = pd.DataFrame(all_articles)
df['published_at'] = pd.to_datetime(df['published_at'])
df.to_csv('data/newsapi_articles.csv', index=False)

print(f"✅ Fetched {len(df)} articles. Saved to data/newsapi_articles.csv.")"""


In [25]:
import requests
import pandas as pd
from datetime import datetime

# --- PARAMETERS ---
api_key = 'bc6c52fd05ee4e63827b7cf45fa0bdb2'

# More focused query
query = '(USAID OR donor aid OR foreign aid OR healthcare funding) AND Kenya'
from_date = '2025-05-04'
page_size = 100  # NewsAPI max per page
max_pages = 1    # Free tier limit is 100 articles total

# Recommended reliable/regional domains (you can expand)
domains = 'nation.africa,standardmedia.co.ke,citizen.digital,aljazeera.com,bbc.com,reuters.com,who.int,devex.com,un.org'

# --- FETCH ARTICLES ---
all_articles = []

for page in range(1, max_pages + 1):
    url = (
        f'https://newsapi.org/v2/everything?'
        f'q={query}&'
        f'from={from_date}&'
        f'sortBy=publishedAt&'
        f'domains={domains}&'
        f'pageSize={page_size}&'
        f'page={page}&'
        f'apiKey={api_key}'
    )

    response = requests.get(url)
    if response.status_code != 200:
        print(f"❌ Error: {response.status_code} - {response.json()}")
        break

    articles = response.json().get('articles', [])
    if not articles:
        break

    for article in articles:
        all_articles.append({
            'source': article['source']['name'],
            'author': article.get('author'),
            'title': article.get('title'),
            'description': article.get('description'),
            'content': article.get('content'),
            'url': article.get('url'),
            'published_at': article.get('publishedAt')
        })

# --- Convert to DataFrame ---
df = pd.DataFrame(all_articles)
#print(df[['title', 'source', 'published_at']])
df.sample(n=10,random_state=42)


Unnamed: 0,source,author,title,description,content,url,published_at
5,Standard Digital,Mercy Kahenda,"Experts warn of polio, measles comeback as don...",Kenya’s routine childhood vaccinations are adm...,A Community Health Promoter gives polio drops ...,https://www.standardmedia.co.ke/health/health-...,2025-05-11T14:30:14Z
0,BBC News,,Frugal tech: The start-ups working on cheap in...,Indian start-ups are using local materials and...,Devina Gupta\r\nAn earthquake changed the life...,https://www.bbc.com/news/articles/c20xlqn0e5po,2025-05-26T23:03:01Z
9,Citizen.digital,Ian Omondi,Ministry of Health to transfer UHC staff payro...,The Ministry of Health has announced that Coun...,The Ministry of Health has announced that Coun...,https://www.citizen.digital/news/ministry-of-h...,2025-05-06T13:46:58Z
10,Standard Digital,Mercy Kahenda,Kenya Kwanza's sweet health promise suffers fu...,President Ruto had assured that every Kenyan w...,"Beatrice Kairu, a Health economist during with...",http://www.standardmedia.co.ke/national/articl...,2025-05-04T21:08:14Z
2,World Health Organization,,WHO Director-General's High-Level Welcome at t...,"We are here to serve not our own interests, bu...",Honourable President of the World Health Assem...,https://www.who.int/director-general/speeches/...,2025-05-19T11:34:54Z
1,Al Jazeera English,Patrick Gathara,The Nairobi family values conference: When tra...,Foreign forces continue to push conservative a...,"Across Africa, debates about cultural preserva...",https://www.aljazeera.com/opinions/2025/5/20/t...,2025-05-20T12:39:05Z
8,UN News,United Nations,Daily Press Briefing by the Office of the Spok...,"In Ukraine, the Office for the Coordination of...",The following is a near-verbatim transcript of...,https://press.un.org/en/2025/db250507.doc.htm,2025-05-07T20:30:55Z
4,UN News,United Nations,Daily Press Briefing by the Office of the Spok...,"In Haiti, the UN and its partners continue to ...",The following is a near-verbatim transcript of...,https://press.un.org/en/2025/db250514.doc.htm,2025-05-14T21:11:31Z
7,Citizen.digital,Vincent Obadha,A depressed economy? Employment opportunities ...,The government of Kenya has a broad-based impl...,"In 2024, the World Bank projected that the ove...",https://www.citizen.digital/business/a-depress...,2025-05-08T04:52:53Z
3,Al Jazeera English,Al Jazeera,"Global hunger hits new high amid conflict, ext...",More than 295 million people faced acute hunge...,Global hunger hit a new high last year with th...,https://www.aljazeera.com/news/2025/5/16/globa...,2025-05-16T13:02:12Z


In [21]:
len(df)

11

In [26]:
df_citizen = df[df['source']== 'Citizen.digital']
df_citizen.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 7 to 9
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   source        2 non-null      object
 1   author        2 non-null      object
 2   title         2 non-null      object
 3   description   2 non-null      object
 4   content       2 non-null      object
 5   url           2 non-null      object
 6   published_at  2 non-null      object
dtypes: object(7)
memory usage: 128.0+ bytes


In [34]:
x = df_citizen['content'].to_list()
display( x[0])
len(x[0])

'In 2024, the World Bank projected that the overall unemployment rate of the youth in Kenya was at 5.7 per cent. \r\nThe Federation of Kenya (FKE) says that the youth account for over 35 per cent of the… [+8544 chars]'

214