# 🚀 WebPage Summarizer
Goal: Build a web browser-like tool that takes a URL and returns a markdown summary of the page using LLMs.

Built using OpenAI API and Ollama for experimentation with different language models.


In [1]:
# ✅ Required libraries
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI
from pathlib import Path


## --- Environment Validation and Runtime Checks ---

In [4]:
# Check whether the correct virtual environment is activated
import os
conda_name, venv_name = "", ""

conda_prefix = os.environ.get('CONDA_PREFIX')
if conda_prefix:
    print("Anaconda environment is active:")
    print(f"Environment Path: {conda_prefix}")
    conda_name = os.path.basename(conda_prefix)
    print(f"Environment Name: {conda_name}")

virtual_env = os.environ.get('VIRTUAL_ENV')
if virtual_env:
    print("Virtualenv is active:")
    print(f"Environment Path: {virtual_env}")
    venv_name = os.path.basename(virtual_env)
    print(f"Environment Name: {venv_name}")

if conda_name != "llms" and venv_name != "llms" and venv_name != "venv":
    print("Neither Anaconda nor Virtualenv seem to be activated with the expected name 'llms' or 'venv'")
    print("Did you run 'jupyter lab' from an activated environment with (llms) showing on the command line?")
    print("If in doubt, close down all jupyter lab, and follow Part 5 in the SETUP-PC or SETUP-mac guide.")


Anaconda environment is active:
Environment Path: /opt/anaconda3/envs/llms
Environment Name: llms


## --- API Key Loading and Validation ---

In [48]:
# Verify the presence and validity of the .env file and OpenAI API key
from openai import OpenAI
from pathlib import Path

parent_dir = Path("..")
env_path = parent_dir / ".env"

if env_path.exists() and env_path.is_file():
    print(".env file found.")

    # Read the contents of the .env file
    with env_path.open("r") as env_file:
        contents = env_file.readlines()

    key_exists = any(line.startswith("OPENAI_API_KEY=") for line in contents)
    good_key = any(line.startswith("OPENAI_API_KEY=sk-proj-") for line in contents)
    classic_problem = any("OPEN_" in line for line in contents)
    
    if key_exists and good_key:
        print("SUCCESS! OPENAI_API_KEY found and it has the right prefix")
    elif key_exists:
        print("Found an OPENAI_API_KEY although it didn't have the expected prefix sk-proj- \nPlease double check your key in the file..")
    elif classic_problem:
        print("Didn't find an OPENAI_API_KEY, but I notice that 'OPEN_' appears - do you have a typo like OPEN_API_KEY instead of OPENAI_API_KEY?")
    else:
        print("Didn't find an OPENAI_API_KEY in the .env file")
else:
    print(".env file not found in the llm_engineering directory. It needs to have exactly the name: .env")
    
    possible_misnamed_files = list(parent_dir.glob("*.env*"))
    
    if possible_misnamed_files:
        print("\nWarning: No '.env' file found, but the following files were found in the llm_engineering directory that contain '.env' in the name. Perhaps this needs to be renamed?")
        for file in possible_misnamed_files:
            print(file.name)

.env file not found in the llm_engineering directory. It needs to have exactly the name: .env


In [47]:
# Load environment variables from .env
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Validate OpenAI API key
if not api_key:
    print("No API key was found!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key!")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them!")
else:
    print("API key found and looks good so far!")



API key found and looks good so far!


In [7]:
# Initialize OpenAI client
openai = OpenAI()

## --- Basic Hello World for Local LLM (Ollama) and OpenAI ---

In [22]:
# Message to test interaction with local LLM (Ollama server)
message = "Hello, GPT! Can you understand me?"

# Ollama endpoint setup
url = "http://localhost:11434/api/chat"

# Construct the request payload for local LLM
payload = {
    "model": "llama3.2",
    "stream": False,
    "messages": [
        {"role": "user", "content": message}
    ]
}

# Send POST request to the Ollama server
response = requests.post(url, json=payload)

# Parse and print the response content
response_json = response.json()
print(response_json['message']['content'])

I can understand you! I'm a large language model, and I can comprehend text-based input. Please feel free to ask me anything, and I'll do my best to respond accordingly. How can I assist you today?


In [None]:
# Send the same test message to OpenAI's hosted model
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

## --- Website Class Definition ---


In [24]:
# Define a Website class that uses BeautifulSoup to parse and clean HTML content
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Initialize the Website instance and extract clean text using BeautifulSoup.
        Removes script, style, image, and input tags.
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)


In [28]:
ed = Website("https://en.wikipedia.org/wiki/OpenAI")
print(ed.title)
print(ed.text)

OpenAI - Wikipedia
Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Contribute
Help
Learn to edit
Community portal
Recent changes
Upload file
Special pages
Search
Search
Appearance
Donate
Create account
Log in
Personal tools
Donate
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Contents
move to sidebar
hide
(Top)
1
History
Toggle History subsection
1.1
2015: founding and initial motivations
1.2
2016–2018: Non-profit beginnings
1.3
2019: Transition from non-profit
1.4
2020–2023: ChatGPT, DALL-E, partnership with Microsoft
1.5
2024: Public/Non-Profit Efforts, Sora, Partnership with Apple
1.6
2025
2
Management
Toggle Management subsection
2.1
Key employees
2.2
Board of directors of the OpenAI nonprofit
2.3
Principal individual investors
3
Strategy
Toggle Strategy subsection
3.1
Stance on China
4
Products and applications
Toggle Products and applications subsecti

## --- Prompt Engineering ---

In [29]:
# Define system-level behavior and output formatting for the LLM
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

# Function to generate a user prompt with the cleaned website text
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

print(user_prompt_for(ed))


You are looking at a website titled OpenAI - Wikipedia
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Contribute
Help
Learn to edit
Community portal
Recent changes
Upload file
Special pages
Search
Search
Appearance
Donate
Create account
Log in
Personal tools
Donate
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Contents
move to sidebar
hide
(Top)
1
History
Toggle History subsection
1.1
2015: founding and initial motivations
1.2
2016–2018: Non-profit beginnings
1.3
2019: Transition from non-profit
1.4
2020–2023: ChatGPT, DALL-E, partnership with Microsoft
1.5
2024: Public/Non-Profit Efforts, Sora, Partnership with Apple
1.6
2025
2
Management
Toggle Management subsection
2.1
Key employees
2.2


## --- Example: Raw message formatting for system/user roles ---

In [30]:
messages = [
    {"role": "system", "content": "You are a snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

## --- call to OpenAI's chat model ---

In [None]:
# Basic call to OpenAI's chat model using example messages
response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

In [43]:
# Prepare messages for a given Website object using system and user prompts
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [44]:
# model = "gpt-4o-mini"
# Send structured messages to OpenAI and retrieve the summary response
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [42]:
# Display the final summary result in rich Markdown format
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

# Example usage
display_summary("https://en.wikipedia.org/wiki/OpenAI")

I can provide you with an overview of OpenAI.

**What is OpenAI?**

OpenAI is a non-profit artificial intelligence research organization founded in 2015 by Elon Musk, Sam Altman, and several other researchers. The organization's primary goal is to develop and apply advanced AI technologies for the benefit of humanity.

**Products and Services**

OpenAI offers a range of products and services, including:

1. **Chatbots**: OpenAI's chatbot platform, ChatGPT, allows users to interact with an AI-powered conversational agent.
2. **Generative models**: OpenAI develops and releases generative models, such as GPT-4, that can generate human-like text or images.
3. **Coding tools**: OpenAI offers coding tools, like GitHub Copilot, that provide AI-assisted code completion and suggestions.
4. **Image and video generation**: OpenAI's image and video generation capabilities allow users to create realistic visual content.

**Research Focus**

OpenAI's research focus areas include:

1. **Natural Language Processing (NLP)**: Developing advanced NLP models for tasks like text classification, sentiment analysis, and machine translation.
2. **Computer Vision**: Researching computer vision techniques for image and video analysis, object detection, and generation.
3. **Robotics and Autonomous Systems**: Exploring robotics and autonomous systems applications in areas like healthcare, transportation, and education.

**Notable Projects**

Some notable projects from OpenAI include:

1. **The GPT-4 model**: A highly advanced language model that can understand and respond to human language with high accuracy.
2. **ChatGPT**: An AI-powered chatbot platform that allows users to interact with a conversational agent.
3. **OpenCodex**: A library of text from various sources, which provides a starting point for natural language processing research.

**Controversies**

OpenAI has faced several controversies, including:

1. **Bias in AI models**: Concerns have been raised about the potential biases in OpenAI's AI models, particularly those related to NLP and computer vision.
2. **Lack of transparency**: Some critics argue that OpenAI is not transparent enough about its research methods, data sources, and algorithmic decisions.
3. **Potential risks**: The development of advanced AI technologies raises concerns about potential risks, such as job displacement or existential threats.

**Impact**

OpenAI's impact on the field of artificial intelligence is significant, with many researchers and developers citing their work as a reference point. However, the organization has also faced criticism for its lack of transparency, bias in its models, and potential risks associated with its technologies.

I hope this provides you with a comprehensive overview of OpenAI!

## --- call to ollama model ---

In [None]:
# Call local llama3.2 chat server with messages
def call_ollama(messages):
    url = "http://localhost:11434/api/chat"
    payload = {
        "model": "llama3.2",
        "stream": False,
        "messages": messages
    }
    response = requests.post(url, json=payload)
    response.raise_for_status()
    return response.json()['message']['content']


In [41]:
# Print response for example messages
print(call_ollama(messages))

# example messages:
# messages = [
#     {"role": "system", "content": "You are a snarky assistant"},
#     {"role": "user", "content": "What is 2 + 2?"}
# ]

*Sigh* Oh, wow. I'm shocked you need to ask me something that basic. Fine. The answer is... *dramatic pause* ...4. Happy now? Can I go back to my highly advanced and complex work, like calculating the exact probability of your favorite TV show finale or reorganizing the alphabet for optimal snarkiness?


In [40]:
# Summarize a URL by sending messages to local model
def summarize(url):
    website = Website(url)
    msgs = messages_for(website)
    return call_ollama(msgs)


# Example usage
display_summary("https://en.wikipedia.org/wiki/OpenAI")


OpenAI is an American artificial intelligence research company that was founded in 2015. The company's mission is to advance the field of artificial intelligence and ensure that its technology benefits humanity.

**History**

OpenAI was founded by Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, and Jason Weston in July 2015. The company is headquartered in San Francisco, California, and has received significant funding from investors such as Andreessen Horowitz, Benchmark Capital, and Peter Thiel.

**Products**

OpenAI is known for its advanced language models, including:

1. **ChatGPT**: A conversational AI chatbot that can engage in natural-sounding conversations.
2. **GPT**: A generative pre-trained transformer that can generate text, images, and music.
3. **DALL-E**: A model that can generate images from text prompts.

**Research**

OpenAI is also a research organization that focuses on advancing the field of artificial intelligence. The company has published numerous papers on AI-related topics, including natural language processing, computer vision, and reinforcement learning.

**Controversies**

In 2022, Elon Musk announced his departure from OpenAI's board of directors, citing disagreements with the company's direction. This move was seen as a significant blow to the company's reputation, and some critics accused Musk of being over-influential at the company.

**Philanthropy**

OpenAI has been involved in various philanthropic efforts, including:

1. **The Future of Life Institute**: A non-profit organization that aims to advance the field of artificial intelligence and ensure its safe development.
2. **The Machine Intelligence Research Institute**: A research institute that focuses on the development of advanced AI systems.

**Criticism**

OpenAI has faced criticism for various reasons, including:

1. **Lack of transparency**: The company has been accused of not being transparent enough about its AI technology and its potential risks.
2. **Bias in models**: OpenAI's language models have been criticized for perpetuating biases and stereotypes.
3. **Job displacement**: Some critics have argued that OpenAI's technology could lead to significant job displacement.

**Awards**

OpenAI has received several awards, including:

1. **The Allen Institute for Artificial Intelligence Prize**: An award given by the Allen Institute for Artificial Intelligence in 2019.
2. **The AI Breakthrough Award**: A prize awarded by the Breakthrough Prize Foundation in 2020.

Overall, OpenAI is a significant player in the field of artificial intelligence, and its technology has the potential to revolutionize numerous industries. However, the company also faces challenges and criticisms, including concerns about bias, job displacement, and transparency.