## A Summarizing Web Browser Running Locally

The following notebook creates a new kind of Web Browser that summarizes things to do in Seattle with kids using Ollama and Llama3.2.

Give it a list of URLs, and it will respond with a summary for each. It also provides a kid-friendly summary.

# Step 1 - Setup imports

In [None]:
# imports

import os
from IPython.display import Markdown, display
from openai import OpenAI


# Step 2 - Validate call to Ollama

In [None]:
OLLAMA_BASE_URL = "http://localhost:11434/v1"

ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

message = "Hello, GPT! This is my first ever message to you! Hi!"
messages = [{"role": "user", "content": message}]
messages
response = ollama.chat.completions.create(model="llama3.2", messages=messages)
response.choices[0].message.content

## Step 3 - Setup System and User Prompts

Models like GPT have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [None]:
# Define our system prompt 

system_prompt = """
You are a very helpful and verbose assistant that analyzes the contents of a website,
and provides a clear, detailed summary, ignoring text that might be navigation related. 
Provide the date and time when the summary was generated.
Next, reformat the summary to be kid friendly.
Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

In [None]:
# Define our user prompt

user_prompt_prefix = """
Here are the contents of a website.
Provide a short summary of this website. If its Google Search, traverse the top 5 results and summarize each.
If it includes news or announcements, then summarize these too.

"""

## Step 4 - Make calls to GPT-4.1-mini

In [None]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_prefix + website}
    ]

In [None]:
from bs4 import BeautifulSoup
import requests


# Standard headers to fetch a website
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


def fetch_website_contents(url):
    """
    Return the title and contents of the website at the given url;
    truncate to 2,000 characters as a sensible limit
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    title = soup.title.string if soup.title else "No title found"
    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""
    return (title + "\n\n" + text)[:2_000]


def fetch_website_links(url):
    """
    Return the links on the webiste at the given url
    I realize this is inefficient as we're parsing twice! This is to keep the code in the lab simple.
    Feel free to use a class and optimize it!
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    links = [link.get("href") for link in soup.find_all("a")]
    return [link for link in links if link]


In [None]:
def summarize(url):
    website = fetch_website_contents(url)
    response = ollama.chat.completions.create(
        model = "llama3.2",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [None]:
# A function to display this nicely in the output, using markdown

def display_summary(urls):
    # Handle both single URL (string) and list of URLs
    if isinstance(urls, str):
        urls = [urls]
    
    for counter, url in enumerate(urls, start=1):
        display(Markdown(f"### {counter}. Summary for: {url}\n"))
        summary = summarize(url)
        display(Markdown(summary))
        display(Markdown("---\n"))

In [None]:
# Example: Using display_summary with multiple URLs stored in variables
display(Markdown("# Top things to do in Seattle with kids\n"))

url1 = "https://www.emeraldpalate.com/seattle-with-kids/"
url2 = "https://www.tripadvisor.com/Attractions-g60878-Activities-zft11306-Seattle_Washington.html"
url3 = "https://www.parentmap.com/calendar"

# Create a list of URLs from the variables
urls = [url1, url2, url3]

# Display summaries for all URLs
display_summary(urls)

# capture output as html
# gansvv% jupyter nbconvert --to html week1-day1.ipynb
# [NbConvertApp] Converting notebook week1-day1.ipynb to html
# [NbConvertApp] Writing 302468 bytes to week1-day1.html