## Ten Times online Assignment

Objective: Build an application that collects news articles from various RSS feeds (e.g: listed
below), stores them in a database, and categorizes them into predefined categories.
Categories the news item should fall under are:

● Terrorism / protest / political unrest / riot

● Positive/Uplifting

● Natural Disasters

● Others

In [15]:
# importing libraries
import feedparser
import spacy
import pandas as pd

# Load the English NLP model
nlp = spacy.load("en_core_web_sm")

# List of RSS feed URLs
rss_feeds = [
    "http://rss.cnn.com/rss/cnn_topstories.rss",
    "http://qz.com/feed",
    "http://feeds.foxnews.com/foxnews/politics",
    "http://feeds.reuters.com/reuters/businessNews",
    "http://feeds.feedburner.com/NewshourWorld",
    "https://feeds.bbci.co.uk/news/world/asia/india/rss.xml"
]

# Function to fetch and parse an RSS feed
def parse_rss_feed(feed_url):
    feed = feedparser.parse(feed_url)
    parsed_articles = []

    for entry in feed.entries:
        title = entry.title
        link = entry.link

        parsed_articles.append({
            'Title': title,
            'Link': link
        })

    return parsed_articles

# Function to categorize titles using spaCy
def categorize_title(title):
    doc = nlp(title)

    # Check for named entities or other patterns in the title
    for ent in doc.ents:
        if ent.label_ in ['ORG', 'GPE']:
            return "Terrorism / Protest / Political Unrest / Riot"

    # Check for positive/uplifting keywords
    positive_keywords = ["positive", "uplifting", "inspiring"]
    if any(keyword in title.lower() for keyword in positive_keywords):
        return "Positive/Uplifting"

    # Check for natural disasters keywords
    natural_disasters_keywords = ["natural disasters", "earthquake", "flood", "hurricane"]
    if any(keyword in title.lower() for keyword in natural_disasters_keywords):
        return "Natural Disasters"

    return "Others"

# Parse all RSS feeds
all_articles = []
for feed_url in rss_feeds:
    articles = parse_rss_feed(feed_url)
    all_articles.extend(articles)

# Categorize and create a DataFrame
df = pd.DataFrame(all_articles)
df['Category'] = df['Title'].apply(categorize_title)

# Export the DataFrame to an Excel file
excel_file_path = "parsed_articles.xlsx"
df.to_excel(excel_file_path, index=False)

print(f"Excel file '{excel_file_path}' created successfully.")


Excel file 'parsed_articles.xlsx' created successfully.


In [17]:
data = pd.read_excel("C:/Users/soppoju narender/parsed_articles.xlsx")

In [18]:
data

Unnamed: 0,Title,Link,Category
0,Some on-air claims about Dominion Voting Syste...,https://www.cnn.com/business/live-news/fox-new...,Terrorism / Protest / Political Unrest / Riot
1,Dominion still has pending lawsuits against el...,https://www.cnn.com/business/live-news/fox-new...,Terrorism / Protest / Political Unrest / Riot
2,Here are the 20 specific Fox broadcasts and tw...,https://www.cnn.com/2023/04/17/media/dominion-...,Terrorism / Protest / Political Unrest / Riot
3,Judge in Fox News-Dominion defamation trial: '...,https://www.cnn.com/2023/04/18/media/fox-domin...,Terrorism / Protest / Political Unrest / Riot
4,'Difficult to say with a straight face': Tappe...,https://www.cnn.com/videos/politics/2023/04/18...,Terrorism / Protest / Political Unrest / Riot
...,...,...,...
168,Crowds at India's new Ram temple day after ope...,https://www.bbc.co.uk/news/world-asia-india-68...,Terrorism / Protest / Political Unrest / Riot
169,Watch: Indian PM Modi inaugurates temple in Ay...,https://www.bbc.co.uk/news/world-asia-india-68...,Terrorism / Protest / Political Unrest / Riot
170,Watch: Plane gets jammed under bridge in India,https://www.bbc.co.uk/news/world-asia-india-67...,Terrorism / Protest / Political Unrest / Riot
171,Watch: Intruder jumps on table in Indian parli...,https://www.bbc.co.uk/news/world-asia-india-67...,Others


In [20]:
data.loc[20:30]

Unnamed: 0,Title,Link,Category
20,Two Russians claiming to be former Wagner comm...,https://www.cnn.com/2023/04/17/europe/wagner-c...,Terrorism / Protest / Political Unrest / Riot
21,'My stomach is hurting from laughing': Hear pa...,https://www.cnn.com/videos/business/2023/04/18...,Terrorism / Protest / Political Unrest / Riot
22,GOP prepared to block vote to replace Feinstei...,https://www.cnn.com/2023/04/18/politics/schume...,Terrorism / Protest / Political Unrest / Riot
23,Oklahoma governor calls on officials to resign...,https://www.cnn.com/2023/04/18/us/mccurtain-co...,Terrorism / Protest / Political Unrest / Riot
24,McCarthy slams Biden in handling of US debt,https://www.cnn.com/videos/politics/2023/04/18...,Terrorism / Protest / Political Unrest / Riot
25,US warns Russia not to touch American nuclear ...,https://www.cnn.com/2023/04/18/politics/us-war...,Terrorism / Protest / Political Unrest / Riot
26,Repeated gunshots fired on live TV as ex-lawma...,https://www.cnn.com/videos/world/2023/04/18/in...,Others
27,FDA clears the way for additional bivalent boo...,https://www.cnn.com/2023/04/18/health/fda-biva...,Terrorism / Protest / Political Unrest / Riot
28,Maine authorities detained a person of interes...,https://www.cnn.com/2023/04/18/us/maine-shooti...,Terrorism / Protest / Political Unrest / Riot
29,Southwest says flights resumed after delays ca...,https://www.cnn.com/travel/article/southwest-a...,Terrorism / Protest / Political Unrest / Riot
