Objective:

We want to build an LLM based webpage summarizer.
We'll use BS4 to scrape the webpage and OpenAI call to summarize it.

In [43]:
from bs4 import BeautifulSoup
from openai import OpenAI
import requests
from dotenv import load_dotenv
from IPython.display import display, Markdown

In [2]:
url = "https://edwarddonner.com/"

In [22]:
class WebScraper:
    url: str
    text: str
    
    def __init__(self, url):
        self.url = url
        self.text = ""


    def basic_scraper(self):
        response = requests.get(self.url)
        soup = BeautifulSoup(response.content, 'lxml')
        body = soup.body
        self.text = body.get_text(separator="\n", strip=True)

In [27]:
web_text = WebScraper(url=url)
web_text.basic_scraper()
web_text.text

'Home\nConnect Four\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,\nacquired in 2021\n.\nWe work with groundbreaking, proprietary LLMs verticalized for talent, we’ve\npatented\nour matching model, and our award-winning platform has happy customers and tons of press coverage.\nConnec

In [None]:
system_prompt = """You are a helpful assistant who is able to read text from a website and summarize it. 
    Provide only relevant information and ignore page structure related content such as navigation, script etc. 
    Respond in Markdown."""

user_prompt = f"You are looking at a website with text {web_text.text}"
user_prompt += "\nProvide a short summary of the webpage in markdown. Include all details available in a summarized form to provide maximum information in a crisp manner"
user_prompt += '\n If any links or urls are mentioned, please output them in format: {"Link": url}'

In [41]:
messages = [
    {"role":"system", "content": system_prompt},
    {"role":"user", "content":user_prompt}
]

In [None]:
load_dotenv(override=True)
model = OpenAI()

response = model.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages
)

"# Summary of Ed's Website\n\n## About \n- **Ed's Interests**: Coding, experimenting with LLMs, DJing, and electronic music production.\n- **Professional Background**:\n  - Co-founder and CTO of [Nebula.io](https://www.nebula.io), focused on applying AI to talent discovery and management.\n  - Former founder and CEO of untapt, an AI startup acquired in 2021.\n- **Achievements**: \n  - Developed proprietary LLMs for talent management with a patented matching model.\n  - The platform has received positive media coverage and has satisfied customers.\n\n## Recent Posts\n- May 28, 2025: Connecting my courses – become an LLM expert and leader\n- May 18, 2025: 2025 AI Executive Briefing\n- April 21, 2025: The Complete Agentic AI Engineering Course\n- January 23, 2025: LLM Workshop – Hands-on with Agents – resources\n\n## Contact Information\n- Email: ed [at] edwarddonner [dot] com\n- Website: [www.edwarddonner.com](https://www.edwarddonner.com)\n\n## Social Media\n- LinkedIn, Twitter, Faceboo

In [44]:
display(Markdown(response.choices[0].message.content))

# Summary of Ed's Website

## About 
- **Ed's Interests**: Coding, experimenting with LLMs, DJing, and electronic music production.
- **Professional Background**:
  - Co-founder and CTO of [Nebula.io](https://www.nebula.io), focused on applying AI to talent discovery and management.
  - Former founder and CEO of untapt, an AI startup acquired in 2021.
- **Achievements**: 
  - Developed proprietary LLMs for talent management with a patented matching model.
  - The platform has received positive media coverage and has satisfied customers.

## Recent Posts
- May 28, 2025: Connecting my courses – become an LLM expert and leader
- May 18, 2025: 2025 AI Executive Briefing
- April 21, 2025: The Complete Agentic AI Engineering Course
- January 23, 2025: LLM Workshop – Hands-on with Agents – resources

## Contact Information
- Email: ed [at] edwarddonner [dot] com
- Website: [www.edwarddonner.com](https://www.edwarddonner.com)

## Social Media
- LinkedIn, Twitter, Facebook

## Notable Feature
- **Connect Four**: An arena featuring LLMs in a competition of diplomacy and cunning.