# **News Headlines Scraper Workflow**

For each news source:

1. Fetch its RSS feed.

2. Parse it as XML.

3. Take the first 4 news items.

4. Extract title, link, and date.

5. Format the date nicely.

6. Save everything into a list.

7. If something goes wrong, print an error but continue.

In [None]:
!pip install bs4

Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2


# **News Headline Scraper**



In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime


In [None]:
# Define RSS feeds
sources = {
    "TechCrunch": "http://feeds.feedburner.com/TechCrunch/",
    "Economic Times": "https://economictimes.indiatimes.com/rssfeedstopstories.cms"
}

headlines = []

In [None]:
# Fetch and parse 4 headlines per source
for source, url in sources.items():
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()

        soup = BeautifulSoup(response.content, "xml")  # RSS is XML

        items = soup.find_all("item")[:4]  # get first 4 headlines

        for item in items:
            title = item.title.text if item.title else "N/A"
            link = item.link.text if item.link else "N/A"
            pub_date = item.pubDate.text if item.pubDate else "N/A"

            # Try formatting publication date
            if pub_date != "N/A":
                try:
                    pub_date = datetime.strptime(pub_date[:25], "%a, %d %b %Y %H:%M:%S").strftime("%Y-%m-%d %H:%M")
                except:
                    pass

            headlines.append({
                "Source": source,
                "Title": title,
                "Link": link,
                "Published Date": pub_date
            })

    except Exception as e:
        print(f"⚠️ Error fetching {source}: {e}")

In [None]:
# Save results into CSV
df = pd.DataFrame(headlines)
output_file = "news_headlines.csv"
df.to_csv(output_file, index=False, encoding="utf-8")

In [None]:
# Or save as Excel
df.to_excel("news_headlines.xlsx", index=False, engine="openpyxl")

In [None]:
print(f"✅ Headlines saved to {output_file}\n")

✅ Headlines saved to news_headlines.csv



In [None]:
# Print Top 2 Headlines Summary
print("📢 Top 2 Headlines Summary:")
for i, row in df.head(2).iterrows():
    print(f"\n{i+1}. {row['Title']}\n   ({row['Source']}, {row['Published Date']})\n   Link: {row['Link']}")

📢 Top 2 Headlines Summary:

1. Top 10 AI Tools That Will Transform Your Content Creation in 2025
   (TechCrunch, 2025-01-02 09:26)
   Link: https://techncruncher.blogspot.com/2025/01/top-10-ai-tools-that-will-transform.html

2. LimeWire AI Studio Review 2023: Details, Pricing & Features
   (TechCrunch, 2023-12-12 16:10)
   Link: https://techncruncher.blogspot.com/2023/12/limewire-ai-studio-review-2023-details.html


In [None]:
df_csv = pd.read_csv(output_file)
df_csv.head(n=10)

Unnamed: 0,Source,Title,Link,Published Date
0,TechCrunch,Top 10 AI Tools That Will Transform Your Conte...,https://techncruncher.blogspot.com/2025/01/top...,2025-01-02 09:26
1,TechCrunch,"LimeWire AI Studio Review 2023: Details, Prici...",https://techncruncher.blogspot.com/2023/12/lim...,2023-12-12 16:10
2,TechCrunch,Top 10 AI Tools in 2023 That Will Make Your Li...,https://techncruncher.blogspot.com/2023/01/top...,2023-01-25 19:52
3,TechCrunch,Top 10 AI Content Generator & Writer Tools in ...,https://techncruncher.blogspot.com/2022/11/top...,2022-11-15 08:58
4,Economic Times,India Inc shifts strategy after US H-1B fee hike,https://economictimes.indiatimes.com/nri/work/...,2025-10-04 05:30
5,Economic Times,Shooting near Houston leaves 2 children dead,https://economictimes.indiatimes.com/news/inte...,2025-10-04 22:54
6,Economic Times,'Bal Thackeray died 2 days before official date',https://economictimes.indiatimes.com/news/poli...,2025-10-04 22:39
7,Economic Times,Poisoning angle surfaces in death of Zubeen Garg,https://economictimes.indiatimes.com/news/indi...,2025-10-04 22:36


In [None]:
df_xl = pd.read_excel("news_headlines.xlsx")
df_xl.head(n=10)

Unnamed: 0,Source,Title,Link,Published Date
0,TechCrunch,Top 10 AI Tools That Will Transform Your Conte...,https://techncruncher.blogspot.com/2025/01/top...,2025-01-02 09:26
1,TechCrunch,"LimeWire AI Studio Review 2023: Details, Prici...",https://techncruncher.blogspot.com/2023/12/lim...,2023-12-12 16:10
2,TechCrunch,Top 10 AI Tools in 2023 That Will Make Your Li...,https://techncruncher.blogspot.com/2023/01/top...,2023-01-25 19:52
3,TechCrunch,Top 10 AI Content Generator & Writer Tools in ...,https://techncruncher.blogspot.com/2022/11/top...,2022-11-15 08:58
4,Economic Times,India Inc shifts strategy after US H-1B fee hike,https://economictimes.indiatimes.com/nri/work/...,2025-10-04 05:30
5,Economic Times,Shooting near Houston leaves 2 children dead,https://economictimes.indiatimes.com/news/inte...,2025-10-04 22:54
6,Economic Times,'Bal Thackeray died 2 days before official date',https://economictimes.indiatimes.com/news/poli...,2025-10-04 22:39
7,Economic Times,Poisoning angle surfaces in death of Zubeen Garg,https://economictimes.indiatimes.com/news/indi...,2025-10-04 22:36
