# EXERCISE SOLUTION

Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI

You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.

**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

In [1]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [2]:
# Constants

MODEL = "llama3.2"

In [3]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped
    """
    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [4]:
# Let's try one out

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [5]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [6]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

## Messages

The API from Ollama expects the same message format as OpenAI:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [7]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Time to bring it together - now with Ollama instead of OpenAI

In [8]:
# And now: call the Ollama function instead of OpenAI

def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [9]:
summarize("https://edwarddonner.com")



In [10]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [11]:
display_summary("https://edwarddonner.com")

# Website Summary

### Overview
The website belongs to Edward Donner, a co-founder and CTO of Nebula.io. He is an enthusiast of writing code, experimenting with Large Language Models (LLMs), DJing, and amateur electronic music production.

### Projects and Ventures
- **Nebula.io**: Applying AI to help people discover their potential and pursue their reason for being.
- **Untapt**: An AI startup that was acquired in 2021. Donner served as the founder and CEO before its acquisition.

### News and Announcements

* December 21, 2024: LLM Workshop – Hands-on with Agents – resources
* November 13, 2024: Welcome, SuperDataScientists!
* October 16, 2024: Mastering AI and LLM Engineering – Resources
* From Software Engineer to AI Data Scientist – resources (no date provided)
* January 23, 2025: Upcoming event with a specific topic (not specified)

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [12]:
display_summary("https://cnn.com")

Here are the top news stories from CNN.com:

**World**

* US and Iran exchange blows in Middle East, as tensions rise over Iranian missile tests
* Trump orders "decisive" strikes against Yemen's Houthis, killing dozens
* Dozens reported killed after Trump's airstrikes target Yemen rebels
* Yemen's president calls for international intervention to stop Houthi rebels

**Politics**

* Americans think Trump's cuts will hurt (and help) the country. Here's how
* A father of a trans man voted for Trump. Now he fears an order targeting gender-affirming care will upend his son's treatment
* After a long night of partying, things get WEIRD for the Ratliff brothers in 'White Lotus' episode 5

**Business**

* Mark Zuckerberg gets big win in Utah, but other tech giants aren't so happy
* They helped start Twitter. They didn’t realize what it would become

**Health**

* Medical myths are surfacing again. 5 tips to inoculate yourself against them
* Wide-awake at night but tired in the morning? Here’s what could help, according to experts
* Forget ‘sparking joy.’ Try this for easier decluttering

**Entertainment**

* What your kids drink matters, this doctor says
* Kobe Bryant rarely shared his interests beyond the game of basketball. His post-NBA career was ‘remarkable’ and unexpected

**Sports**

* Australia's jaw-dropping landscapes, from east to west
* In pictures: The life and career of Donatella Versace

**Science**

* Climate Change
* Ukraine-Russia War

**Technology**

* Iran is using drones and apps to catch women who aren’t wearing hijabs, says UN report

These are just a few of the top news stories from CNN.com. You can access more information by visiting the website or following their social media channels.

In [13]:
display_summary("https://anthropic.com")

**Summary of Anthropic's Website**

### Overview

Anthropic is an AI safety and research company based in San Francisco, focusing on developing reliable and beneficial AI systems.

### News

* **Claude 3.7 Sonnet**: Our most intelligent AI model is now available. It's a hybrid reasoning model and the first of its kind.
* **Claude Code**: An agentic tool for coding has been launched, allowing users to create custom experiences with Claude.
* **Alignment Research**: Research on constitutional AI safety, including "Constitutional AI: Harmlessness from AI Feedback" published in 2022.

### Products

* **Claude**: A suite of AI tools and products that prioritize safety at the frontier.
* **Claude for Enterprise**: An enterprise-focused version of Claude, designed for large-scale deployments.
* **Claude apps**: Downloadable apps for accessing Claude's features.

### Research

* **Research Overview**: Anthropic conducts research across various fields, including machine learning, physics, policy, and product development.
* **Economic Index**: A research platform exploring the economic implications of AI systems.

### Commitments

* **Transparency**: Anthropic is committed to transparency in their operations and policies.
* **Security and Compliance**: The company prioritizes security and compliance with relevant regulations.

# Sharing your code

I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.

If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.

PR instructions courtesy of an AI friend: https://chatgpt.com/share/670145d5-e8a8-8012-8f93-39ee4e248b4c