In [14]:
!pip install ollama

Collecting ollama
  Using cached ollama-0.4.8-py3-none-any.whl.metadata (4.7 kB)
Collecting pydantic<3.0.0,>=2.9.0 (from ollama)
  Downloading pydantic-2.11.3-py3-none-any.whl.metadata (65 kB)
Collecting pydantic-core==2.33.1 (from pydantic<3.0.0,>=2.9.0->ollama)
  Downloading pydantic_core-2.33.1-cp312-cp312-macosx_11_0_arm64.whl.metadata (6.8 kB)
Collecting typing-extensions>=4.12.2 (from pydantic<3.0.0,>=2.9.0->ollama)
  Downloading typing_extensions-4.13.2-py3-none-any.whl.metadata (3.0 kB)
Collecting typing-inspection>=0.4.0 (from pydantic<3.0.0,>=2.9.0->ollama)
  Downloading typing_inspection-0.4.0-py3-none-any.whl.metadata (2.6 kB)
Using cached ollama-0.4.8-py3-none-any.whl (13 kB)
Downloading pydantic-2.11.3-py3-none-any.whl (443 kB)
Downloading pydantic_core-2.33.1-cp312-cp312-macosx_11_0_arm64.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m34.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading typing_extensions-4.13.2-py3-none-any

In [15]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [16]:
# Constants

MODEL = "llama3.2"

In [17]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped
    """
    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [18]:
# Let's try one out

web = Website("https://www.cnn.com/")
print(web.title)
print(web.text)

Breaking News, Latest News and Videos | CNN
CNN values your feedback
1. How relevant is this ad to you?
2. Did you encounter any technical issues?
Video player was slow to load content
Video content never loaded
Ad froze or did not finish loading
Video content did not start after ad
Audio on ad was too loud
Other issues
Ad never loaded
Ad prevented/slowed the page from loading
Content moved around while ad loaded
Ad was repetitive to ads I've seen previously
Other issues
Cancel
Submit
Thank You!
Your effort and contribution in providing this feedback is much
                                        appreciated.
Close
Ad Feedback
Close icon
US
World
Politics
Business
Health
Entertainment
Style
Travel
Sports
Science
Climate
Weather
Ukraine-Russia War
Israel-Hamas War
Underscored
Games
More
US
World
Politics
Business
Health
Entertainment
Style
Travel
Sports
Science
Climate
Weather
Ukraine-Russia War
Israel-Hamas War
Underscored
Games
Watch
Listen
Live TV
Subscribe
Sign in
My Account
Settin

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [19]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [20]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

## Messages

The API from Ollama expects the same message format as OpenAI:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [28]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Time to bring it together - Ollama 

In [29]:
# And now: call the Ollama function instead of OpenAI

def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [30]:
summarize("https://www.cnn.com/")

'The news articles covered a wide range of topics, including:\n\n* Politics:\n\t+ New research suggests that a distant planet has an ocean "teeming with life," but some are skeptical.\n\t+ Trump administration releases thousands of pages on RFK\'s 1968 assassination.\n\t+ SCOTUS temporarily pauses deportations under Alien Enemies Act.\n\t+ Already facing Trump administration cuts, US colleges risk losses from another revenue source.\n* Crime & Justice:\n\t+ Portrait of a wounded Palestinian boy wins Press Photo of the Year\n* World:\n\t+ Pope Francis\' Easter is going to look a little different this year. Here\'s how.\n\t+ Colombian government declares health emergency due to increase in yellow fever cases\n* Business:\n\t+ 5 savings mistakes people make when building their financial life\n* Health & Wellness:\n\t+ New research suggests that a distant planet has an ocean "teeming with life," but some are skeptical.\n\t+ Climate\n* Science:\n\t+ Space\n\t+ Ukraine-Russia War\n\t+ Israel

In [31]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [32]:
display_summary("https://www.cnn.com/")

Here are some of the top headlines from CNN and other sources:

**Politics**

* Trump administration releases thousands of pages on RFK's 1968 assassination (USA)
* SCOTUS temporarily pauses deportations under Alien Enemies Act (USA)
* Pope Francis' Easter is going to look a little different this year. Here's how (Vatican City)

**Business**

* Already facing Trump administration cuts, US colleges risk losses from another revenue source (USA)
* New research suggests a distant planet has an ocean 'teeming with life,' but some are skeptical (International)

**Health**

* New research on COVID-19 may lead to more effective treatments for the virus (Global)
* Colombian government declares health emergency due to increase in yellow fever cases (South America)

**Science**

* Russia sentences 19-year-old woman to nearly three years in a penal colony after poetic anti-war protest (Russia)
* Climate change is causing more frequent and intense wildfires, scientists warn (Global)

**Sports**

* NBA: Lakers defeat Clippers 124-121 in overtime thriller (USA)
* College Football: Alabama crushes Tennessee 38-10 (USA)

**Entertainment**

* "The Weeknd" drops surprise album 'Dawn FM' (Music)
* "The Last of Us" wins 10 Emmy Awards, including Outstanding Drama Series (TV)

**Other News**

* New York City's mayor-elect Eric Adams promises to tackle homelessness and crime (USA)
* India reports record 50,000 new COVID-19 cases in a single day (India)

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [36]:
display_summary("https://www.disney.com/")

**Summary of Disney.com**
=====================================

### Overview

Disney.com is the official website for all things Disney, featuring a wide range of content including:

* **Parks & Travel**: information on Walt Disney World Resort, Disneyland Resort, and other Disney travel experiences
* **Movies**: news and updates on upcoming films, as well as access to Disney's film library
* **Shop**: online shopping for Disney merchandise, including clothes, accessories, toys, and more
* **News**: the latest updates from Disney, including announcements, press releases, and industry news

### News & Announcements

#### Upcoming Releases
- Andor Season 2: a live-action series set in the Star Wars universe
- Disneynature's Sea Lions of the Galapagos: an animated documentary film
- Ironheart: a superhero film based on the Marvel Comics character
- Percy Jackson and The Olympians Season 2: a live-action series based on the popular book series

#### Other News
- Disney+ and ESPN+ Subscriber Agreement: important information for subscribers to these services
- Restrictive content notices: warnings about access restrictions, location data requirements, and other terms and conditions related to Disney's streaming services.