# Postamate Scrapping
In this notebook we are going to scrape news headlines, dates, contents, and urls from the [PostaMate](https://postamate.com/page/) website. Postamate is a news source in Kenya that delivers satirical and sarcastic news to Kenyans.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time



In [15]:
base_url = "https://postamate.com"
pages_to_scrape = 3  # You can increase later


In [16]:
titles = []
dates = []
contents = []
urls = []
labels = []


In [17]:
response = requests.get(base_url)
soup = BeautifulSoup(response.text, "html.parser")
    
# Find all articles on the page
articles = soup.find_all("h2", class_="entry-title")
    
for article in articles:
    link = article.find("a")["href"]
    urls.append(link)


In [18]:
print(f"Found {len(urls)} article URLs")
print(urls[:5])  # Print a sample of first 5


Found 0 article URLs
[]


In [5]:
for link in urls:
    response = requests.get(link)
    soup = BeautifulSoup(response.text, "html.parser")
    
    # Extract title
    title_tag = soup.find("h1", class_="entry-title")
    title = title_tag.get_text(strip=True) if title_tag else "No Title"
    
    # Extract date
    date_tag = soup.find("time", class_="entry-date published")
    date = date_tag.get_text(strip=True) if date_tag else "No Date"
    
    # Extract article content
    content_div = soup.find("div", class_="td-post-content")
    paragraphs = content_div.find_all("p") if content_div else []
    content = " ".join(p.get_text(strip=True) for p in paragraphs)

    # Save to lists
    titles.append(title)
    dates.append(date)
    contents.append(content)
    labels.append("satire")  # Because all Postamate articles are satire
    
    time.sleep(1)  # Be polite to the server


In [6]:
df = pd.DataFrame({
    "title": titles,
    "date": dates,
    "url": urls,
    "content": contents,
    "label": labels
})

df.to_csv("postamate_satire_articles.csv", index=False)
print("Scraping complete! Saved as 'postamate_satire_articles.csv'")


Scraping complete! Saved as 'postamate_satire_articles.csv'
