In [None]:
!pip install feedparser
!pip install feedparser psycopg2

In [21]:
import feedparser
import psycopg2
categories = {
    'Terrorism / protest / political unrest / riot': ['terrorism', 'protest', 'riot', 'political unrest','election','ruling','thread'],
    'Positive/Uplifting': ['hope', 'achievement', 'good', 'uplifting', 'positive','happy','motivated','medal','succes'],
    'Natural Disasters': ['earthquake', 'flood', 'hurricane', 'natural disaster','tsunami','astriod','blackhole','glacier'],
}

def classify_article(text):
    """Classify the article based on the presence of keywords."""
    text = text.lower() 
    for category, keywords in categories.items():
        for keyword in keywords:
            if keyword.lower() in text:
                return category
    return 'Others'

feed_urls = [
    'http://rss.cnn.com/rss/cnn_topstories.rss',
    'http://qz.com/feed',
    'http://feeds.foxnews.com/foxnews/politics',
    'http://feeds.reuters.com/reuters/businessNews',
    'http://feeds.feedburner.com/NewshourWorld',
    'https://feeds.bbci.co.uk/news/world/asia/india/rss.xml'
]

conn = psycopg2.connect(
    user="postgres", password="divya", host="localhost", port="5432", database="rss"
)
cur = conn.cursor()

for j, feed_url in enumerate(feed_urls):
    print(f"Parsing feed: {feed_url}")
    feed = feedparser.parse(feed_url)
    
    for i, entry in enumerate(feed.entries):
        Title = entry.title if 'title' in entry else 'No title'
        Links = entry.link if 'link' in entry else 'No link'
        Published = entry.published if 'published' in entry else 'No date'
        id = '{0}_{1}'.format(j, i)

      
        category = classify_article(Title + ' ' + (entry.summary if 'summary' in entry else ''))

        try:
             cur.execute(
                """
                INSERT INTO articles (id, "Title", "Links", "Published", "category")
                VALUES (%s, %s, %s, %s, %s)
                ON CONFLICT ("Links") DO UPDATE SET "category" = EXCLUDED."category";
                """,
                (id, Title, Links, Published, category)
            )
            conn.commit()  

            print(f"Inserted article: {Title} with category: {category}")

        except Exception as e:
            print(f"Error inserting article: {Title}. Error: {e}")


cur.close()
conn.close()


Parsing feed: http://rss.cnn.com/rss/cnn_topstories.rss
Inserted article: Some on-air claims about Dominion Voting Systems were false, Fox News acknowledges in statement after deal is announced with category: Others
Inserted article: Dominion still has pending lawsuits against election deniers such as Rudy Giuliani and Sidney Powell with category: Terrorism / protest / political unrest / riot
Inserted article: Here are the 20 specific Fox broadcasts and tweets Dominion says were defamatory with category: Terrorism / protest / political unrest / riot
Inserted article: Judge in Fox News-Dominion defamation trial: 'The parties have resolved their case' with category: Others
Inserted article: 'Difficult to say with a straight face': Tapper reacts to Fox News' statement on settlement with category: Others
Inserted article: Millions in the US could face massive consequences unless McCarthy can navigate out of a debt trap he set for Biden with category: Others
Inserted article: White homeowne

### Getting Started

First things first, the script kicks off by importing a couple of key libraries: `feedparser` and `psycopg2`. The `feedparser` library is used to pull in and parse those RSS feeds, which makes retrieving articles a breeze. On the other hand, `psycopg2` is our go-to for connecting to a PostgreSQL database, letting us easily save our articles.

### Setting Up Categories

Next, we define a dictionary called `categories` that groups different news types into categories based on some relevant keywords. We’ve got categories like "Terrorism / protest / political unrest / riot," "Positive/Uplifting," and "Natural Disasters." Each category has a list of keywords that help us figure out where each article fits.

### Classifying Articles

Now, here’s where the magic happens. We’ve got a function called `classify_article` that takes a piece of text (like an article title) and checks it against our keyword lists. It’s pretty straightforward: if it finds any of the keywords in the text, it returns the associated category. If nothing matches, it just goes with "Others."

### Connecting to the Database

Once we’re set up with our categories, the script connects to the PostgreSQL database using the provided credentials (like username, password, and so on). This connection is crucial because we’ll be storing our articles there.

### Parsing RSS Feeds

The script also has a list of RSS feed URLs from which it pulls articles. It goes through each URL, parsing the feeds with `feedparser`. For every feed, it digs into the entries (or articles) and extracts important details like the title, link, and published date. If any of these are missing, it fills in with a default value like "No title."

### Getting Unique IDs and Classifying

To make each article unique, the script generates an ID by combining the feed index and the entry index. Then it classifies the article using the earlier defined function. This step helps us categorize the articles based on their content.

### Storing in the Database

Next up is the part where the script tries to insert the article data into the database. It does this using an SQL `INSERT` statement that includes the article's ID, title, link, published date, and the category we just assigned. If there’s a conflict—like if the same link is already in the database—it updates the existing record instead of creating a duplicate.

### Committing Changes and Handling Errors

After each successful insertion or update, the script commits those changes to the database, making sure everything is saved. And, of course, it has error handling in place. If something goes wrong during the insertion, it prints an error message to help troubleshoot.

### Wrapping Up

Finally, when everything is done, the script closes the database connection and cleans up. 


