In [1]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [2]:
load_dotenv(override=True)

python-dotenv could not parse statement starting at line 11


True

In [34]:
openai = OpenAI()

In [25]:
class Website:
    """
    A utility class to represent a Website that we have scraped.
    """

    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library.
        """

        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [26]:
# Let's try this out

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

### Types of Prompts

Models like GPT4o have been trained to receive instructions in a particular way:  
+ `System Prompt`: tells LLms what task they are performing and what tone they should use.  
+ `User Prompt`: the conversation starter that they should reply to.

In [27]:
system_prompt = "You are an assistant that analyses the contents of a website \
    and provides a short sumamry, ignoring text that might be navigation related.\
    Respond in Markdown."

In [28]:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows: \
        please provide a short summary of this website in markdown. \
        If it includes news or announcements, then summarise these too. \n\n"
    
    user_prompt += website.text
    return user_prompt

In [29]:
print(user_prompt_for(ed))

You are looking at a website titled Home - Edward Donner
The contents of this website is as follows:         please provide a short summary of this website in markdown.         If it includes news or announcements, then summarise these too. 

Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI star

### Messages

The API from OpenAI expects to receive messages in a particular structure. Many of the other APIs share this structure:

[  
    {"role": "system", "content": "system message goes here"}  
    {"role": "user", "content": "user message goes here"}    
]

In [30]:
def messages_for(website):
    return [
        {'role': 'system', 'content': system_prompt}, 
        {'role': 'user', 'content': user_prompt_for(website)}
    ]

In [31]:
messages_for(ed)

[{'role': 'system',
  'content': 'You are an assistant that analyses the contents of a website     and provides a short sumamry, ignoring text that might be navigation related.    Respond in Markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled Home - Edward Donner\nThe contents of this website is as follows:         please provide a short summary of this website in markdown.         If it includes news or announcements, then summarise these too. \n\nHome\nConnect Four\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. We’re applying AI to a f

In [32]:
# Summarising the scraped content

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model='gpt-4.1-nano',
        messages=messages_for(website)
    )

    return response.choices[0].message.content

In [35]:
summarize("https://edwarddonner.com")

'# Summary of Edward Donner\'s Website\n\nThis website serves as a personal and professional portfolio for Ed Donner, highlighting his background, projects, and interests. Ed is passionate about coding and experimenting with Large Language Models (LLMs). He is the co-founder and CTO of Nebula.io, a company utilizing AI to enhance talent discovery and management, with proprietary, verticalized LLMs and patented matching models. \n\nThe site features sections like "Connect Four," which is an AI contest where LLMs compete in diplomacy and strategy. It also provides links to his various courses and events related to AI and LLMs, including upcoming workshops and executive briefings.\n\n### Notable Updates and Announcements:\n- **May 28, 2025:** Announces a course on becoming an LLM expert.\n- **May 18, 2025:** Announces the 2025 AI Executive Briefing.\n- **April 21, 2025:** Promotes a comprehensive AI engineering course.\n- **January 23, 2025:** Shares resources for a hands-on LLM workshop 

In [36]:
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [37]:
display_summary("https://edwarddonner.com")

# Summary of Edward Donner's Website

The website belongs to Edward Donner, a developer and AI enthusiast with a focus on large language models (LLMs). He is the co-founder and CTO of Nebula.io, a company leveraging AI to enhance talent discovery and management. Edward has a background as the founder and CEO of AI startup untapt, which was acquired in 2021. 

The site features sections on projects like **Connect Four** (an arena for LLMs competing in diplomacy and deviousness) and **Outsmart**. It also showcases his writings, courses, and workshops on AI and LLM expertise, indicating active engagement in education and community outreach.

### Recent Announcements and Courses
- **May 28, 2025:** Announced "Connecting my courses – become an LLM expert and leader"
- **May 18, 2025:** "2025 AI Executive Briefing"
- **April 21, 2025:** "The Complete Agentic AI Engineering Course"
- **January 23, 2025:** LLM Workshop resources on agents

### Additional Information
Edward enjoys coding, experimenting with LLMs, DJing, electronic music production, and engaging with Hacker News discussions. The site provides contact details, social media links, and a newsletter subscription option for visitors interested in AI and related topics.

In [39]:
# Let's try on some other website

display_summary("https://github.com/Jai-Keshav-Sharma")

# Summary of the Website "Jai-Keshav-Sharma · GitHub"

This GitHub profile belongs to Jai Keshav Sharma, a sophomore student at UTD and Chhatisgarh Swami Vivekananda Technical University Bhilai. The profile highlights his focus on developing machine learning models for social good and his interest in AI, coding, and software development. 

## Key Highlights:
- Active repositories include projects in data science, AI-powered music composition, and C programming.
- Currently learning advanced topics like machine learning, data structures, algorithms, object-oriented programming, and databases.
- Interested in collaborating on hackathons and impactful AI projects.
- Engages in coding challenges on platforms like LeetCode and gaming strategies.
- Maintains social media presence on Instagram and LinkedIn.

## Repositories:
- Notable repositories include:
  - **R-for-Data-Science**
  - **AI-powered-Music-Composer**
  - **AI-Agent-with-MCP_Servers**
  - Several C and C++ projects focusing on problem-solving and coding challenges.

The profile emphasizes his enthusiasm for AI and coding, his ongoing education, and collaborative aspirations in tech and AI development. There are no recent news or announcements on this profile.

You may notice that if you try display_summary("https://openai.com") - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)