In [1]:
pip install -q -U google-genai

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import google.generativeai as genai

In [3]:
load_dotenv()
api_key = os.getenv("GEMINI_API_KEY")

if not api_key:
    print("Api key not found. Please set the GEMINI_API_KEY environment variable.")
else:
    print("API key loaded successfully.")

API key loaded successfully.


In [4]:
client = genai.GenerativeModel('gemini-2.5-pro')

In [5]:
# A class to represent a webpage

class Website:
    """
    A class to represent a website that we have scraped
    """
    def __init__(self, url):
        """
        Initializes the Website object with a URL and fetches its content.
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else 'No Title found'
        for irrelevant in soup(['script', 'style', 'img', 'input']):
            irrelevant.decompose()
        self.text = soup.get_text(separator='\n', strip=True)

In [6]:
page = Website("https://cnn.com")
print(page.title)
print(page.text[:500])

Breaking News, Latest News and Videos | CNN
Breaking News, Latest News and Videos | CNN
CNN values your feedback
1. How relevant is this ad to you?
2. Did you encounter any technical issues?
Video player was slow to load content
Video content never loaded
Ad froze or did not finish loading
Video content did not start after ad
Audio on ad was too loud
Other issues
Ad never loaded
Ad prevented/slowed the page from loading
Content moved around while ad loaded
Ad was repetitive to ads I've seen previously
Other issues
Cancel
Submit
Thank You!


System Prompt - Tells what task they are performing and what tone they should use

A user prompt - The conversation starter that they should reply to

In [7]:
# Create a system prompt for the model

sys_prompt = "You are a helpful assistant that analyses and summarizes web pages \
        and provides a concise summary of the content.\
        Respond in markdown"

In [8]:
# A function that writes user prompt that asks for summarization

def user_prompt(website):
    prompt = f"""You are a helpful assistant that analyses and summarizes web pages.
    Please provide a concise summary of the content in markdown format.

    You are looking at a website titled: {website.title}.

    The content of this website is as follows:
    {website.text}
    """
    return prompt

In [9]:
def summarize_website(url):
    website = Website(url)

    # Create the prompt
    content = user_prompt(website)

    # Generate the content
    response = client.generate_content(content)

    # Return the text from the response
    return response.text

In [10]:
summarize_website("https://cnn.com")

'Of course, here is a concise summary of the CNN website\'s content.\n\n***\n\n### Summary of CNN Homepage\n\nThis is the main page for CNN, a major news organization. The content covers a wide range of breaking news and current events from the US and around the world.\n\n#### Top Headlines\n*   **US Politics:** The top stories focus on a potential government shutdown, its impact on travel and federal workers, and related political maneuvering. There are reports on a contentious Senate hearing for Pam Bondi and a Supreme Court decision regarding conversion therapy bans.\n*   **International Affairs:** Live updates are provided on Gaza ceasefire talks. The ongoing wars in Ukraine and Israel-Hamas are prominent navigation topics.\n*   **Trump-related News:** Multiple articles cover former President Trump, including his comments on the shutdown, National Guard deployments, and the Epstein investigation.\n\n#### Key Sections\n*   **US & World News:** Covers a variety of domestic and intern

In [11]:
# A Function to display this nicely in markdown

def display_summary(url):
    summary = summarize_website(url)
    display(Markdown(summary))

In [12]:
display_summary("https://edwarddonner.com")

This is the personal website of **Ed Donner**, a technologist and entrepreneur specializing in AI and Large Language Models (LLMs).

### Key Information:
*   **Profession:** Co-founder and CTO of **Nebula.io**, an AI company that helps recruiters source and manage talent. He was previously the founder and CEO of AI startup **untapt**, which was acquired in 2021.
*   **Expertise:** His work involves proprietary LLMs verticalized for the talent industry and patented matching models.
*   **Interests:** Writing code, experimenting with LLMs, DJing, and electronic music production.
*   **Website Content:**
    *   **Projects:** Includes "Connect Four" and "Outsmart," an arena where LLMs compete against each other.
    *   **Posts:** Features articles and briefings on AI topics, such as "Gen AI and Agentic AI on AWS" and "The Complete Agentic AI Engineering Course."
    *   **Contact:** Provides links to his social media (LinkedIn, Twitter) and an option to subscribe to his newsletter.