## Web reader and content summarise

##### This notebook demonstrates how to read content from the web and generate concise summaries, making it easier to extract key information from online sources. 


In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI


In [3]:
# Check the key

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')


if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [None]:
openai = OpenAI()


In [5]:
message = "Hello, GPT! This is my first ever message to you! Hi!"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hello! Welcome! I'm glad to have you here. How can I assist you today?


In [7]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)



ed = Website("https://www.bikedekho.com/best-scooters?utm_campaign=BD-DSA-Generic-Homepage-Traffic&utm_device=c&utm_term=&network=g&utm_medium=cpc&utm_source=google&agid=159291388033&ap=&aoi=&ci=19731724966&cre=688030077244&fid=&lop=9061770&ma=&mo=&network=g&pl=&ti=dsa-379266056067&gad_source=1&gad_campaignid=19731724966&gbraid=0AAAAADmoTqaNSnpr2QtLae0uINQOcCtU5&gclid=EAIaIQobChMIz8mihMeCjgMVAKVmAh2NdxV4EAAYASAAEgJKFvD_BwE")
print(ed.title)
print(ed.text)

Best Scooters/Scooty In India 2025 - Top 10 Scooters | BikeDekho
English
Login / Register
Bikes
New Bikes
Best Bikes
Hero Splendor Plus
Royal Enfield Hunter 350
Royal Enfield Classic 350
Royal Enfield Continental GT 650
Yamaha MT 15 V2
Honda SP 125
All Best Bikes
Upcoming Bikes
New Bike Launches
Compare Bikes
Popular Brands
Honda Bikes
Royal Enfield Bikes
TVS Bikes
Yamaha Bikes
Hero Bikes
Bike Offers
Showrooms
Scooters
New Scooters
Best Scooters
Honda Activa 6G
Suzuki Access 125
TVS Jupiter
TVS NTORQ 125
Bajaj Chetak
All Best Scooters
Upcoming Scooters
New Scooter Launches
Electric Zone
Electric Bikes
Electric Scooters
Bajaj Chetak
Yulu Wynn
TVS iQube
All Electric Scooters
Electric Cycles
Electric Bike Charging Stations
Bike Finance
Finance Discount Offers
EMI Calculator
Used Bikes
Search Used Bikes
Sell Bike
News & Videos
News & Top Stories
Videos
Web Stories
Select City
What's trending? Check out the most sought after scooters
Ad
Best Scooters In India
There are 15 scooty available i

In [48]:
def fetch_website_content(url):
    try:
        response = requests.get(url, timeout=10, headers=headers)
        soup = BeautifulSoup(response.text, "html.parser")
        title = soup.title.string if soup.title else "No title found"
        text = soup.get_text()
        return {"title": title.strip(), "text": text.strip()}
    except Exception as e:
        return {"title": "Error", "text": f"Could not fetch website content: {str(e)}"}
    


fetch_website_content("https://www.bikedekho.com/best-scooters?utm_campaign=BD-DSA-Generic-Homepage-Traffic&utm_device=c&utm_term=&network=g&utm_medium=cpc&utm_source=google&agid=159291388033&ap=&aoi=&ci=19731724966&cre=688030077244&fid=&lop=9061770&ma=&mo=&network=g&pl=&ti=dsa-379266056067&gad_source=1&gad_campaignid=19731724966&gbraid=0AAAAADmoTqaNSnpr2QtLae0uINQOcCtU5&gclid=EAIaIQobChMIz8mihMeCjgMVAKVmAh2NdxV4EAAYASAAEgJKFvD_BwE")

    


{'title': 'Best Scooters/Scooty In India 2025 - Top 10 Scooters | BikeDekho',
 'text': "Best Scooters/Scooty In India 2025 - Top 10 Scooters | BikeDekho                                  EnglishLogin / RegisterBikes New BikesBest Bikes Hero Splendor PlusRoyal Enfield Hunter 350Royal Enfield Classic 350Royal Enfield Continental GT 650Yamaha MT 15 V2Honda SP 125All Best BikesUpcoming BikesNew Bike LaunchesCompare BikesPopular Brands Honda BikesRoyal Enfield BikesTVS BikesYamaha BikesHero BikesBike OffersShowroomsScooters New ScootersBest Scooters Honda Activa 6GSuzuki Access 125TVS JupiterTVS NTORQ 125Bajaj ChetakAll Best ScootersUpcoming ScootersNew Scooter LaunchesElectric Zone Electric BikesElectric Scooters Bajaj ChetakYulu WynnTVS iQubeAll Electric ScootersElectric CyclesElectric Bike Charging StationsBike Finance Finance Discount OffersEMI CalculatorUsed Bikes Search Used BikesSell BikeNews & Videos News & Top StoriesVideosWeb StoriesSelect CityWhat's trending? Check out the most so

In [39]:
# Define our system prompt 

system_prompt = """ You are an assistant that analyzes the contents of a website.
Aim is to provides a short summary and suggest top three  questions that a user might ask about the website and answer them based on the content.
Respond in markdown.
"""


In [33]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [14]:
print(user_prompt_for(ed))

You are looking at a website titled Best Scooters/Scooty In India 2025 - Top 10 Scooters | BikeDekho
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

English
Login / Register
Bikes
New Bikes
Best Bikes
Hero Splendor Plus
Royal Enfield Hunter 350
Royal Enfield Classic 350
Royal Enfield Continental GT 650
Yamaha MT 15 V2
Honda SP 125
All Best Bikes
Upcoming Bikes
New Bike Launches
Compare Bikes
Popular Brands
Honda Bikes
Royal Enfield Bikes
TVS Bikes
Yamaha Bikes
Hero Bikes
Bike Offers
Showrooms
Scooters
New Scooters
Best Scooters
Honda Activa 6G
Suzuki Access 125
TVS Jupiter
TVS NTORQ 125
Bajaj Chetak
All Best Scooters
Upcoming Scooters
New Scooter Launches
Electric Zone
Electric Bikes
Electric Scooters
Bajaj Chetak
Yulu Wynn
TVS iQube
All Electric Scooters
Electric Cycles
Electric Bike Charging Stations
Bike Finance
Finance Discount Offers
EMI Calculator
Used Bikes
Se

In [30]:
messages = [
    {"role": "system", "content": "You are a cooperative assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

In [31]:
response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

2 + 2 equals 4.


In [43]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)},
        {"role": "assistant", "content": "Sure, here's the top three questions answered from the website content..."}

    ]



messages_for(ed)

[{'role': 'system',
  'content': ' You are an assistant that analyzes the contents of a website.\nAim is to provides a short summary and suggest top three  questions that a user might ask about the website and answer them based on the content.\nRespond in markdown.\n'},
 {'role': 'user',
  'content': "You are looking at a website titled Best Scooters/Scooty In India 2025 - Top 10 Scooters | BikeDekho\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nEnglish\nLogin / Register\nBikes\nNew Bikes\nBest Bikes\nHero Splendor Plus\nRoyal Enfield Hunter 350\nRoyal Enfield Classic 350\nRoyal Enfield Continental GT 650\nYamaha MT 15 V2\nHonda SP 125\nAll Best Bikes\nUpcoming Bikes\nNew Bike Launches\nCompare Bikes\nPopular Brands\nHonda Bikes\nRoyal Enfield Bikes\nTVS Bikes\nYamaha Bikes\nHero Bikes\nBike Offers\nShowrooms\nScooters\nNew Scooters\nBest Scooters\nHonda Activa 

In [44]:
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [45]:
print(summarize("https://www.bikedekho.com/best-scooters?utm_campaign=BD-DSA-Generic-Homepage-Traffic&utm_device=c&utm_term=&network=g&utm_medium=cpc&utm_source=google&agid=159291388033&ap=&aoi=&ci=19731724966&cre=688030077244&fid=&lop=9061770&ma=&mo=&network=g&pl=&ti=dsa-379266056067&gad_source=1&gad_campaignid=19731724966&gbraid=0AAAAADmoTqaNSnpr2QtLae0uINQOcCtU5&gclid=EAIaIQobChMIz8mihMeCjgMVAKVmAh2NdxV4EAAYASAAEgJKFvD_BwE"))

## Summary of Best Scooters/Scooty In India 2025 - Top 10 Scooters | BikeDekho

The website provides a comprehensive guide to the best scooters available in India for 2025, featuring top models along with their prices, specifications, and features. Popular models highlighted include the **Honda Activa 6G**, **Suzuki Access 125**, **TVS Jupiter**, **TVS NTORQ 125**, and **Bajaj Chetak**. Users can explore various scooters categorized by price range, specifications, and usage scenarios. The site also offers comparisons between different models, upcoming scooter launches, and articles related to the latest news in the two-wheeler domain.

### Key Highlights:
- **Top Models**: Detailed lists of popular scooters with ex-showroom prices and specifications.
- **Comparisons**: Options to compare different scooty models based on user interests and requirements.
- **Upcoming Launches**: Information about anticipated scooter launches in 2025, including electric versions.
- **News Section**: Updat

In [46]:
print(summarize("https://github.com/francescodisalvo05/nlp-financial-summarization-rl"))

# Summary of the GitHub Repository

The GitHub repository `francescodisalvo05/nlp-financial-summarization-rl` contains code related to the research paper titled **"Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting"** by Chen and Bansal, presented at ACL 2018. The main focus of this project is on hybrid neural text summarization using reinforcement learning techniques, specifically for financial documents. It includes implementations for an extractor that identifies salient information and an abstractor that paraphrases that information. The repository provides details on setup, training processes, and performance evaluation metrics like **Rouge-L** and **BERTScore**.

## Key Features
- Hybrid approach to text summarization through reinforcement learning.
- Source code for preprocessing, training extractors, and inference evaluation.
- Uses two datasets: Financial Narrative Summarization and CNN/Daily Mail.
- Provides a comprehensive README for installation and i