In [1]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [2]:
# Constants

MODEL = "llama3.2"

In [4]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped
    """
    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [5]:
# Let's try one out

espn = Website("https://espncricinfo.com")
print(espn.title)
print(espn.text)

Today's Cricket Match | Cricket Update | Cricket News | ESPNcricinfo
Matches
(
9
)
AUS vs IND
(
1
)
Super Smash
(
1
)
BPL 2024
(
2
)
BBL 2024
(
1
)
ZIM vs AFG
(
1
)
NZ vs SL
(
1
)
QEA Trophy
(
1
)
Women's Super Smash
(
1
)
Previous
RESULT
•
4th TEST
•
Melbourne
AUS
474
&
234
IND
(T:340)
369
&
155
Australia won by 184 runs
Schedule
Report
Videos
Series
Live
•
4th Match
•
Super Smash
•
T20
•
Alexandra
OTAGO
(5.3/20 ov)
19/3
CD
Otago chose to bat.
Schedule
Table
Series
3rd Match
•
BPL 2024
•
T20
•
Mirpur
Chittagong Kings
Khulna Tigers
Today
6:00 AM
Match starts in
2
hrs
6
mins
Schedule
Table
Series
17th Match
•
BBL 2024
•
T20
•
Adelaide
Adelaide Strikers
Perth Scorchers
Today
8:15 AM
Match starts in
4
hrs
21
mins
Schedule
Table
Series
RESULT
•
1st TEST
•
Bulawayo
ZIM
586
&
142/4
AFG
699
Match drawn
Schedule
Report
Series
RESULT
•
2nd T20I
•
Mount Maunganui
NZ
186/5
SL
(19.1/20 ov, T:187)
141
New Zealand won by 45 runs
Schedule
Report
Series
RESULT
•
Triangular Stage
•
QEA Trophy
•
FC
•
Ka

In [6]:
# Define our system prompt
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [7]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [13]:

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [9]:
def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [10]:
summarize("https://espncricinfo.com")

"The article appears to be a sports news section of an ESPN website, providing updates on various cricket tournaments and teams around the world. Here's a summary of the main points:\n\n**India's Test series against Australia**\n\n* India is struggling in the ongoing Test series against Australia, having lost three consecutive Tests at the MCG.\n* Rohit Sharma has scored only one single-digit score in 2024-25, the most by a Top 7 batter in a Test season.\n* Team India is trying to figure out their approach with Pant and Rohit's partnership.\n\n**India vs. Australia WTC final scenarios**\n\n* If India needs a win in Sydney to stay in contention for the World Test Championship title, they will need to win the last two Tests of the series against Australia.\n\n**New Zealand's 3-0 win over Afghanistan**\n\n* New Zealand has completed a convincing 3-0 win over Afghanistan in their Test series, with no losses at all.\n* Yashasvi Jaiswal dismissed as third umpire sees 'conclusive evidence' to

In [11]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [16]:
display_summary("https://espncricinfo.com")

This appears to be a sports news and information website, ESPNcricinfo, which provides coverage of cricket and other sports from around the world. The text includes various articles, stats, and updates on recent matches, tournaments, and player performances.

Here are some key points that can be gathered from this page:

* Cricket is a popular sport in India, with many top players like Rohit Sharma, Yashasvi Jaiswal, and Nitish Kumar Reddy.
* The website provides information on various international cricket tournaments, including the ICC Champions Trophy, BBL, SA20, and World Test Championship.
* There are also updates on recent matches, including test series between New Zealand and Sri Lanka, Australia and India, and South Africa and Pakistan.
* Player performances are highlighted, with statistics on runs scored, wickets taken, and other key metrics.
* The website also includes news articles, betting odds, and expert analysis from cricket commentators.

Overall, ESPNcricinfo appears to be a comprehensive online resource for cricket enthusiasts, providing up-to-date information on the latest matches, tournaments, and player performances.

In [15]:
display_summary("https://nfl.com")


**Summary of NFL.com**
==========================

The website NFL.com is the official site of the National Football League. It appears to be a comprehensive resource for football news, scores, standings, and team information.

**News and Announcements**
---------------------------

* **Browser Warning**: The website displays a warning that the current browser is no longer supported and recommends using the latest versions of Chrome, Edge (v80 or later), Firefox, or Safari for an optimal viewing experience.
* **Latest News**: Although not explicitly stated, news articles and announcements are likely to be featured on this site, covering recent football events, team updates, and league news. However, the provided content does not include any specific news articles or announcements.