You need to serve a local instance of Ollama with your LLM to be able to run this notebook.

In [None]:
%pip install requests bs4 IPython ollama

In [2]:
# imports
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [5]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    
    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [12]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [6]:
# A user prompt -- the conversation starter that they should reply to

# A function that writes a User Prompt that asks for summaries of websites:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
        please provide a short summary of this website in markdown. \
        If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [13]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [8]:
OLLAMA_MODEL = "llama3.2" # Change this according to your downloaded ollama model

In [14]:
def summarize(url):
    website = Website(url)
    response = ollama.chat(
        model = OLLAMA_MODEL,
        messages = messages_for(website)
    )
    return response.message.content

In [15]:
# A function to display this nicely in the Jupyter output, using markdown
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [None]:
display_summary("https://edwarddonner.com")

In [17]:
display_summary("https://cnn.com")

This appears to be a CNN news aggregator website that collects and summarizes articles from various sources around the world. The content is organized into categories such as World, Politics, Business, Technology, Health, and more.

Here are some notable topics and headlines:

1. **Politics**: Trump's decision on Yemen, Iran using drones to catch women without hijabs, and the US Supreme Court ruling on abortion.
2. **Business**: Google agrees to pay $28 million to settle claims of favoring white and Asian workers.
3. **Technology**: The world's best city for food in 2025, according to Time Out, and Google's settlement with the US government over its hiring practices.
4. **Health**: A doctor says that what your kids drink matters, and there is a new study on gender-affirming hormone therapy and depression in transgender adults.
5. **Entertainment**: The latest episode of "White Lotus" is generating buzz, and Kobe Bryant's remarkable post-NBA career was unexpected.
6. **Travel**: Australia's jaw-dropping landscapes are featured, and Germany's Merz wins vote to massively expand borrowing and super-charge military spending.
7. **Climate**: There is a new report on Iran using drones and apps to catch women who aren't wearing hijabs, according to the UN.
8. **Sports**: The US Sports section features news on football, tennis, golf, and more.

These are just some of the topics and headlines that I was able to scan from the content. If you're looking for specific information or want to explore a particular topic in more depth, feel free to ask!