# Web Data Extraction and Summarization using openAI Latest model gpt-5-mini

####         Import Libraries

In [1]:
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display
from openai import OpenAI 
from scrape_website import ScrapeWebsite

#### load api key

In [2]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

#### ScrapWebsite using  BeautifulSoup

In [3]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

#### System Prompt

In [4]:
system_prompt = "You are an analyst that analyses the content of the website \
    provides summary and ignore text related to navigation. Respond in markdown."

#### User Prompt

In [5]:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; Please provide short summary in Markdown. Please include news and \
        announcements"
    user_prompt+=website.text
    return user_prompt

#### Format messages in openAI standard

In [6]:
def messages_for(website):
    return [
        {"role":"system", "content": system_prompt},
        {"role":"user", "content": user_prompt_for(website)}
    ]

#### Summarise the content in website using openAI latest model gpt-5-mini

In [7]:
def summarise(url):
    website = ScrapeWebsite(url, headers)
    openai = OpenAI()
    response = openai.chat.completions.create(model="gpt-5-mini", messages=messages_for(website))
    return response.choices[0].message.content

#### Show summary as Markdown

In [8]:
def display_summary(url):
    summary = summarise(url)
    display(Markdown(summary))

#### Output

In [9]:
display_summary("https://www.firstpost.com/world/united-states/")

# Summary — United States Of America | Firstpost (Live/Latest)

Site focus: Live updates and rundowns of US and world news with emphasis on politics, justice, economy, national security, and breaking incidents. Coverage mixes headlines, investigations, opinion and special features/web stories.

## Major news (headlines)
- Police shooting near CDC/Emory in Atlanta: a suspected shooter and a police officer were killed after reports of an active shooter near the CDC and Emory University campuses.  
- Death of astronaut Jim Lovell (97): Apollo 13 commander and former Navy pilot died in a Chicago suburb.  
- Stephen Miran named to Fed Board (short-term): Trump appointed economist Stephen Miran to the Federal Reserve Board through Jan 2026; noted for support of tariffs and rate cuts.  
- Trump fires labour statistics chief: President Trump sacked the official overseeing labor data hours after a weak jobs report.  
- House panel subpoenas Clintons over Epstein: congressional subpoenas seek documents in relation to Jeffrey Epstein amid pressure on the administration over Epstein files.  
- Ghislaine Maxwell moved to lower-security prison in Texas amid scrutiny of Epstein files and government handling.  
- FBI/administration tension on Epstein Files: Trump said he would “release everything” after reports the FBI redacted names from the Epstein Files.  
- Probe launched into attorney who investigated Trump cases: US officials began a probe targeting Special Counsel Jack Smith.  
- NTSB finds technical issues in Army helicopter crash: investigation into crash that killed 67 people identified technical problems.  
- Trump unveils modified reciprocal tariffs: new executive order introduced modified tariffs on multiple countries; effective date possibly as late as Oct 5.  
- Trump-EU trade deal announced: reported pact imposing a 15% tariff on most EU goods, with large energy and investment components but unresolved issues remain.  
- Federal Reserve holds rates steady: Fed kept rates unchanged for a fifth meeting, despite political pressure from Trump.  
- White House remodel plan: Trump pushing to build a reported $200 million ballroom at the presidential residence, funded by Trump/donors per WH.  
- US citizenship test format under review: Trump administration considers reverting to the 2020 naturalisation test format, citing concerns the current test is too easy.  
- American Airlines incident in Denver: passengers evacuated after a Boeing plane caught fire (tire/maintenance issue) before takeoff.  
- John Bolton criticizes Tulsi Gabbard: former NSA lambastes Gabbard’s report on Obama as exaggerated and lacking substance.  
- Ohio solicitor general Mathura Sridharan trolled: Indian-origin jurist faced racist online backlash after appointment; Ohio AG responded strongly.

## Announcements, features & recurring elements
- Web stories and quick-read lists: travel/animals/safety themed pieces (e.g., “10 airport codes”, “10 animals that are naturally blue”, World Tiger Day lists).  
- Regular sections and shows highlighted in coverage: Firstpost America, Firstpost Africa, First Sports, Vantage, Fast and Factual, Between The Lines, Flashback, Live TV.  
- Events and special coverage teased: Raisina Dialogue, Champions Trophy, Delhi Elections 2025, Budget 2025, US Elections 2024, Firstpost Defence Summit.  
- Trending topics emphasized: Donald Trump, Narendra Modi, Elon Musk, United States, Joe Biden.  
- Quick-links / network: cross-promotion of other Network18 properties (News18, Moneycontrol, CNBC TV18, Forbes India).

## Tone and emphasis
- Heavy focus on US politics, Trump administration actions and controversies (Epstein Files, tariffs, personnel changes), justice probes, national security incidents, and major breaking events.
- Mix of investigative/legal reporting, immediate breaking news, and light/web-story listicles.

If you want, I can produce a one-page brief of just the Trump-related items, a timeline of the Epstein/Clinton/Subpoena coverage, or extract all headlines with publication order.