# Website Summarizer
This project aims to build a Website Summarizer Chatbot that extracts and condenses information from webpage into a clear, concise summary. By combining web scraping and AI-powered summarization, the chatbot helps users save time and quickly grasp the key insights from long-form online content.


### Importing libraries

In [1]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [2]:
# Constants

MODEL = "llama3.2"

### Creating a class name Website to represent a Webpage

In [3]:
# A class to represent a Webpage

class Website:
    url: str
    title: str
    text: str

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [4]:
ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

### Defining the system prompt

In [5]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

### Defining some user prompts

In [6]:
# A function to write a User Prompt that asks for summary of a website:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [7]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

### Creating a function that takes a URL, turns it into a Website object, formats that into a prompt for the model and returns the summary generated by the model(Ollama).

In [8]:
def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [9]:
summarize("https://edwarddonner.com")

'# Website Summary\n### Overview\n\nThe website "Home - Edward Donner" appears to be a personal blog or portfolio site of its creator, Ed. The site showcases his interests in writing code, experimenting with Large Language Models (LLMs), and DJing.\n\n### Features\n\n* An "Outsmart" section featuring an arena where LLMs compete against each other in diplomacy and deception.\n* A "Connect Four" section, although the purpose of this section is unclear without more context.\n* A "Posts" section containing various articles about Ed\'s projects, including Nebula.io and his AI-related endeavors.\n* News and announcements:\n\t+ May 28, 2025: "Connecting my courses – become an LLM expert and leader"\n\t+ May 18, 2025: "2025 AI Executive Briefing"\n\t+ April 21, 2025: "The Complete Agentic AI Engineering Course"\n\t+ January 23, 2025: "LLM Workshop – Hands-on with Agents – resources"\n\n### Social Media and Contact Information\n\n* Ed\'s contact information includes an email address and a websi

### Creating a function to display the summary into nicer form using markdown language.

In [10]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [11]:
display_summary("https://edwarddonner.com/")

**Summary of Website**

* The website is run by Edward Donner, a co-founder and CTO of Nebula.io, an AI company applying AI to help people discover their potential.
* Edward is also a DJ, amateur electronic music producer, and enjoys reading Hacker News.
* His main focus on the website appears to be promoting his own projects and expertise in Large Language Models (LLMs).

**Upcoming Events**

* **Connecting my courses – become an LLM expert and leader**: A course by Edward Donner.
* **2025 AI Executive Briefing**: An event or possibly a report, no further details provided.
* **The Complete Agentic AI Engineering Course**: Another course by Edward Donner.
* **LLM Workshop – Hands-on with Agents – resources**: Resources for an LLM workshop, possibly related to one of the courses.

**Contact Information**

* Email address: ed [at] edwarddonner [dot] com
* Website: www.edwarddonner.com
* Social media links:
	+ LinkedIn
	+ Twitter
	+ Facebook

# Trying different websites.

In [12]:
display_summary("https://cnn.com")

# Summary of CNN Website
## Front Page Content

*   **Main Sections:**
    *   Breaking News
    *   Videos
    *   US Politics
    *   World
    *   Business
    *   Health
    *   Entertainment
    *   Sports
    *   Science
    *   Travel
    *   Culture
*   **Featured Stories:**
    *   "China’s leader shows rare moment of emotion during summit"
    *   "US Open drama: Why is there so much tension?"
    *   "Deadly protests in Indonesia paused but resentment remains"

## News Articles

*   **Top Stories:**
    *   "Major China summit"
    *   "India's lion population turning into a deadly problem for people in Gujarat"
    *   "Rare police shooting reveals how a US imported belief system is becoming violent in Australia"
*   **International News:**
    *   "Israel considers West Bank annexation as Palestinian statehood recognition gains momentum"
    *   "Global media outlets unite to protest Israel's killing of journalists in Gaza"
    *   "Flotilla leaves Barcelona in biggest attempt yet to break Israeli blockade of Gaza"

## Videos

*   **Featured Videos:**
    *   "Crowd watches in terror as lost child walks across monorail track at amusement park"
    *   "Plane flips over during emergency landing"
    *   "Indonesia's protests paused but deep resentment remains. What to know"

In [13]:
display_summary("https://anthropic.com")

# Anthropic Website Summary
## Mission and Goals

Anthropic aims to build AI that serves humanity's long-term well-being. The company focuses on designing powerful technologies with human benefit at their foundation.

## Products and Research

Anthropic develops products like Claude, an AI model family that includes Opus 4.1, Sonnet 4, and Haiku 3.5. These models enable coding and AI agents to handle hours of work across various applications.

## Views on AI Safety

The company has core views on AI safety, which include:

*   Designing powerful technologies with human benefit at their foundation
*   Building tools with intentional pauses to consider the effects on society
*   Focusing on responsible AI development in practice

## Responsible Scaling Policy

Anthropic has a responsible scaling policy that prioritizes human well-being and minimizes risks associated with AI development.

## Announcements

Recent announcements include:

*   Claude Opus 4.1, the most intelligent AI model available
*   Introduction of Project Vend, a policy initiative
*   Updates on Anthropic's views on agentic misalignment and alignment

## Resources and Learning

Anthropic offers resources such as Anthropic Academy, which provides access to learning materials for building with Claude.

## Company Information

The company is publicly traded as Anthropic PBC (2024) and has a range of products and services in AI development.