# Web Scraping and AI-Based News Summarizer

In [9]:
import os
from dotenv import load_dotenv
from groq import Groq
from bs4 import BeautifulSoup
import requests
from IPython.display import display, Markdown

In [10]:
def getWebContent(url):  
    """Fetches and extracts article titles and descriptions from a given news website."""
    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser' )
    articles = []
    article_cards = soup.find_all('article')
    for card in article_cards:
        article = {
            "title": card.h2.text.strip(),
            "description": card.p.text.strip()
        }
        articles.append(article)
    return articles

In [11]:
def messages(userPrompt, systemPrompt, webContent):
    """Formats prompts into the structure required by the Groq Chat Completion API."""
    
    return [
        {
            "role": "system",
            "content": systemPrompt
        },
        {
            "role": "user",
            "content": userPrompt + f" {webContent}"
        }
    ]

In [12]:
load_dotenv(override=True)
api_key = os.getenv('GROQ_API_KEY')

client = Groq(api_key=api_key)

In [13]:
systemPrompt = """
You are a news reading assistant that analyzes the articles, 
and provide a short summary. Respond in markdown. Do not wrap the 
markdown in a code block - respond just with mark down 
"""

userPrompt = """
Here are the contents of a news page. Provide a short summary
of the articles of this website. If it includes any hot topic 
or worthy announcements, then summarize it too. 
"""

In [14]:
def summarize():
    """Scrapes live news data, sends it to the AI model and retrieves a summarized response"""
    
    webContent = getWebContent("https://www.indiatoday.in") 
    response = client.chat.completions.create(
        model = 'llama-3.3-70b-versatile',
        messages= messages(userPrompt, systemPrompt, webContent)
    )
    return response.choices[0].message.content

In [15]:
def display_summary():
    """Displays the AI-generated summary in Markdown format inside a Jupyter Notebook."""
    summary = summarize()
    display(Markdown(summary))

In [16]:
display_summary()

## News Summary
The news articles cover a wide range of topics, including international relations, politics, and current events. 
### Key Highlights
* The US is ready to allow India to buy Venezuelan oil under a US-controlled system, which could potentially reopen oil supplies to India.
* There are ongoing protests in Iran, with the Supreme Leader, Ayatollah Ali Khamenei, slamming former US President Donald Trump and warning against foreign interference.
* Russia has launched a hypersonic missile attack on Ukraine, killing at least four people and escalating tensions with NATO allies.
* In India, there are concerns about air pollution, with 190 cities exceeding the annual PM10 safe limit and 103 cities exceeding the annual PM2.5 standard.
* The Indian government is committed to pursuing a mutually beneficial trade agreement with the US, despite recent remarks from a Trump aide suggesting that Prime Minister Modi did not call for a trade deal.
### Hot Topics
* The situation in Iran, with ongoing protests and tensions with Western countries.
* The conflict between Russia and Ukraine, with the use of hypersonic missiles and escalating tensions with NATO allies.
* The potential for a US-India trade deal, with the US ready to allow India to buy Venezuelan oil under a US-controlled system.
* The issue of air pollution in India, with many cities exceeding safe limits for PM10 and PM2.5.
### Worthy Announcements
* The US Department of Homeland Security has released a video of a thrilling operation to storm an oil tanker in the Caribbean Sea.
* The teaser trailer for the film "Maa Inti Bangaaram" has been released, starring Samantha Ruth Prabhu.
* The Royal Challengers Bengaluru have won the opening game of the Women's Premier League (WPL) against the Mumbai Indians.
* The young grandmaster Nihal Sarin has won the Tata Steel Chess Rapid title, overcoming grief after the death of his grandfather.