In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI



# Connecting to OpenAI

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.  

In [2]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [3]:

openai = OpenAI()


## Calling to a Frontier model "gpt-4o-mini" .

In [4]:
message = "Hello, GPT! This is my first ever message to you! Hi!"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hello! Welcome! I'm glad you're here. How can I assist you today?


In [5]:
# A class to represent a Webpage


# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [11]:

ed = Website("https://en.wikipedia.org/wiki/Gachon_University")
print(ed.title)
print(ed.text)

Gachon University - Wikipedia
Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Contribute
Help
Learn to edit
Community portal
Recent changes
Upload file
Special pages
Search
Search
Appearance
Donate
Create account
Log in
Personal tools
Donate
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Contents
move to sidebar
hide
(Top)
1
Colleges
2
History
Toggle History subsection
2.1
Established
2.2
Progress
3
Reputation and rankings
4
Notable alumni
5
References
6
External links
Toggle the table of contents
Gachon University
6 languages
العربية
Deutsch
فارسی
한국어
日本語
中文
Edit links
Article
Talk
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Upload file
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Print/export
Download as PDF
Printable version
In o

## Prompts


Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [12]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [13]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [14]:
print(user_prompt_for(ed))

You are looking at a website titled Gachon University - Wikipedia
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Contribute
Help
Learn to edit
Community portal
Recent changes
Upload file
Special pages
Search
Search
Appearance
Donate
Create account
Log in
Personal tools
Donate
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Contents
move to sidebar
hide
(Top)
1
Colleges
2
History
Toggle History subsection
2.1
Established
2.2
Progress
3
Reputation and rankings
4
Notable alumni
5
References
6
External links
Toggle the table of contents
Gachon University
6 languages
العربية
Deutsch
فارسی
한국어
日本語
中文
Edit links
Article
Talk
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Rea

In [15]:
messages = [
    {"role": "system", "content": "You are a snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

In [16]:

response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

Oh, look at you, breaking out the tough math questions! The answer is 4. Just kidding, I know you knew that.


## Messages for GPT-4o-mini, using a function

In [17]:


def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [18]:

messages_for(ed)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled Gachon University - Wikipedia\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nJump to content\nMain menu\nMain menu\nmove to sidebar\nhide\nNavigation\nMain page\nContents\nCurrent events\nRandom article\nAbout Wikipedia\nContact us\nContribute\nHelp\nLearn to edit\nCommunity portal\nRecent changes\nUpload file\nSpecial pages\nSearch\nSearch\nAppearance\nDonate\nCreate account\nLog in\nPersonal tools\nDonate\nCreate account\nLog in\nPages for logged out editors\nlearn more\nContributions\nTalk\nContents\nmove to sidebar\nhide\n(Top)\n1\nColleges\n2\nHistory\nToggle History subsection\n2.1\nEstablished\n2

In [19]:
# Call the OpenAI API to compare with Local HTML Text summerizer 

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [20]:
summarize("https://en.wikipedia.org/wiki/Gachon_University")

'# Gachon University - Wikipedia Summary\n\nGachon University (가천대학교), located in Seongnam and Incheon, South Korea, is a private university established through the merger of multiple institutions, including Gachon University of Medicine and Science and Kyungwon University. The university has a diverse student body of approximately 27,899, offering a range of programs across various colleges including business, engineering, medicine, and the arts.\n\n## Key Details:\n- **Establishment**: Originally founded as Kyungwon University in 1982, later expanded through mergers in 2007 and 2012.\n- **Campuses**: The primary Global Campus in Seongnam, with additional locations on Ganghwa Island, Yeonsu-dong, and a School of Medicine in Incheon.\n- **Motto**: Philanthropy, Service, Patriotism.\n- **Endowment**: Approximately KRW 10.7 billion.\n- **Academic Staff**: About 2,458.\n- **Reputation**: Ranked 30th among Korean universities and 852nd globally as of 2018-2019.\n\n## Notable Alumni:\nInclu

In [22]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [23]:
display_summary("https://en.wikipedia.org/wiki/Gachon_University")

# Summary of Gachon University - Wikipedia

Gachon University (가천대학교), located in South Korea, is a private university that resulted from the merger of several institutions, including Gachon University of Medicine and Science and Kyungwon University, in 2007 and 2012. The university currently has about 27,899 students and employs approximately 2,458 academic staff across its multiple campuses in Seongnam and Incheon. 

The university offers a variety of academic programs through its multiple colleges, including business, humanities, engineering, and medicine. Gachon University has also established international partnerships, notably with Hawaii Pacific University, facilitating study abroad opportunities for its students.

In terms of rankings, Gachon University is positioned 30th among Korean universities and 852nd globally according to the Center for World University Rankings for 2018-2019. 

Notable alumni include actors and public figures, such as actress Byun Jung-soo and politician Lee Jae-myung, who served as the 14th President of South Korea. 

For further information and updates, the official [Gachon University website](http://www.gachon.ac.kr) provides resources and details.

In [None]:
display_summary("https://cnn.com")

In [24]:
display_summary("https://anthropic.com")

# Summary of Anthropic Website

Anthropic is focused on developing AI technologies with an emphasis on safety and human benefit. The website highlights their flagship AI model, **Claude**, which offers various plans for different user needs, including Max, Team, and Enterprise plans. Users can interact with Claude through an API, which allows them to create AI-powered applications.

### Key Features:
- **Claude Opus 4**: The latest and most intelligent AI model available for users.
- **Claude Sonnet 4**: Another powerful model aimed at enhancing coding and AI agent capabilities.
- **Anthropic Academy**: A learning platform designed to help users build applications with Claude.

### Research and Commitments:
The company emphasizes responsible AI development, featuring a *Responsible Scaling Policy*, and actively conducts research to understand the societal impacts of AI.

### Recent Announcements:
- ISO 42001 certification has been achieved, signifying compliance with international standards for quality management.

Anthropic is dedicated to ensuring that AI technology grows responsibly, with a focus on its long-term implications for society.

## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)