# Instant Gratification!

Let's build a useful LLM solution - in a matter of minutes.

Our goal is to code a new kind of Web Browser. Give it a URL, and it will respond with a summary. The Reader's Digest of the internet!!

Before starting, be sure to have followed the instructions in the "README" file, including creating your API key with OpenAI and adding it to the `.env` file.

## If you're new to Jupyer Lab

Welcome to the wonderful world of Data Science experimentation! Once you've used Jupyter Lab, you'll wonder how you ever lived without it. Simply click in each "cell" with code in it, like the cell immediately below this text, and hit Shift+Return to execute that cell. As you wish, you can add a cell with the + button in the toolbar, and print values of variables, or try out variations.

If you need to start again, go to Kernel menu >> Restart kernel.

In [6]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [7]:
# Load environment variables in a file called .env

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')#, 'your-key-if-not-using-env')
openai = OpenAI()

In [8]:
# A class to represent a Webpage

class Website:
    url: str
    title: str
    text: str

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [10]:
# Let's try one out

#ed = Website("https://edwarddonner.com")
nb = Website("https://www.novabase.com/en/")
print(nb.title)
print(nb.text)

Home | Novabase
Novabase's site uses Cookies to improve your navigation experience and statistics. By visiting us you are allowing their use.
Learn more
Ok, understood
Investors
ESG
Celfocus
Novabase Capital
en
pt
Search
Financial services
Celfocus
Investors
ESG
Celfocus
Novabase Capital
en
pt
en
pt
1H24 Results
See here
Discover
Capital
Discover
Novabase investors
Discover
Annual report 2023
Non official version
View report
Safehub, supported by the European Commission
Discover
Let's talk today
Find out how we can help your business
info@novabase.com
Contact
Contacts
Novabase | Head Office
	Av. D. João II, nº34,
Parque das Nações
1998-031 Lisboa
+351 213 836 300
Contact us
Novabase
Board of Directors
Code of conduct
Certifications
ESG
Communication of Irregularities
Investors
Financial Calendar
Financial Information
CMVM Press Releases
Corporate Governance
Novabase Share
Analysts
Investor Relations Office
FAQ
Our companies
Celfocus
Novabase Capital
Candidates
Applications
media
About 

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [11]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [12]:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [13]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Time to bring it together - the API for OpenAI is very simple!

In [15]:
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [17]:
#summarize("https://edwarddonner.com")
summarize("https://www.novabase.com/en/")

'# Summary of Novabase Website\n\nNovabase is a company focused on delivering financial services and solutions. The website features various sections aimed at investors, including recent financial results and annual reports. One key highlight is the announcement of the **1H24 Results**, which suggests recent performance metrics may be available for review. Additionally, there’s mention of **Safehub**, a project supported by the European Commission, indicating Novabase’s involvement in significant initiatives.\n\nThe site also emphasizes their commitment to **ESG (Environmental, Social, and Governance)** practices and provides a contact point for businesses seeking assistance. \n\nThe company maintains a transparent approach with resources for investors, including press releases and information on corporate governance.'

In [18]:
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [19]:
display_summary("https://edwarddonner.com")

# Summary of Edward Donner's Website

Edward Donner's website serves as a personal platform where he shares his interests and work in the field of AI, particularly with Large Language Models (LLMs). He introduces himself as a coder and LLM experimenter, also revealing a background in DJing and amateur music production. As the co-founder and CTO of Nebula.io, a company focused on using AI for talent discovery, he emphasizes the positive impact of technology in this area.

### Recent Posts
1. **August 6, 2024**: *Outsmart LLM Arena* - An introduction to an innovative platform that pits LLMs against each other in scenarios requiring diplomacy and strategic thinking.
2. **June 26, 2024**: *Choosing the Right LLM: Toolkit and Resources* - A resourceful guide for selecting the appropriate LLM for projects.
3. **February 7, 2024**: *Fine-tuning an LLM on your texts: a simulation of you* - A detailed discussion on personalizing LLMs using individual text data.
4. **January 31, 2024**: *Fine-tuning an LLM on your texts: part 4 – QLoRA* - The fourth installment in a series on techniques for fine-tuning LLMs.

The site reflects Donner's enthusiasm for AI and its applications, while also providing educational content for enthusiasts in the field.

In [20]:
display_summary("https://cnn.com")

# Summary of CNN Website

CNN provides comprehensive coverage of breaking news and features across multiple topics, including US and world news, politics, business, health, entertainment, sports, science, and weather.

## Key Highlights:

- **Conflict Coverage**: 
  - Tensions related to the Israel-Hamas War and the Ukraine-Russia War are prominently featured.
  - Recent developments include Israeli retaliation plans following a missile attack from Iran, with live updates on the situation.

- **Natural Disasters**: 
  - Coverage of Hurricane Helene, which has caused significant destruction and led to a rising death toll.

- **Political News**: 
  - Noteworthy political events include Melania Trump entering discussions on abortion rights and candidates campaigning across various states, including collaborations between political figures like Liz Cheney and Kamala Harris.

- **Global Affairs**: 
  - Articles on international issues such as North Korea's provocative actions and the ongoing economic implications of labor strikes related to port operations.

- **Science and Technology**: 
  - Features on recent scientific discoveries and innovations in technology, including advancements in AI and space exploration.

- **Health Updates**: 
  - Discussions surrounding public health legislation, particularly in the context of opioid overdoses and allergens.

CNN also emphasizes audience engagement, encouraging visitors to provide feedback on their ads and website experience.

In [21]:
display_summary("https://anthropic.com")

# Summary of Anthropic Website

Anthropic is an AI safety and research company based in San Francisco, focused on creating reliable and beneficial AI systems. The company emphasizes AI safety in its research and product development.

## Key Offerings
- **Claude**: Anthropic's AI model, with its latest version, Claude 3.5 Sonnet, introduced as their most intelligent model available.
- **API**: Services allowing businesses to integrate Claude into their operations for enhanced efficiency and revenue generation.

## Recent Announcements 
- **Claude 3.5 Sonnet Release**: Announced on June 21, 2024, highlighting advancements in their AI capabilities.
- **Research Publications**:
  - *Constitutional AI: Harmlessness from AI Feedback* (Dec 15, 2022)
  - *Core Views on AI Safety: When, Why, What, and How* (Mar 8, 2023)

The team consists of experts in machine learning, physics, policy, and product development, aspiring to drive advancements in AI safety.

## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. Please push your code afterwards so I can share it with other students!

In [22]:
display_summary("https://www.novabase.com/en/")

# Novabase Website Summary

Novabase is a company that provides services and solutions in the financial sector. The website offers information on various aspects of the company, including:

- **Investor Relations**: Access to financial information, annual reports, and press releases.
- **ESG (Environmental, Social, and Governance)**: Commitment and information regarding corporate responsibility and sustainability.
- **Celfocus and Novabase Capital**: Insights into their subsidiaries and services offered under these brands.

## News and Announcements
- **1H24 Results**: A dedicated section to view financial results for the first half of 2024.
- **Annual Report 2023**: An available non-official version of the company's annual report.
- **Safehub**: Mention of being supported by the European Commission, indicating involvement in a particular initiative.

The website reflects a commitment to transparency with sections for investors and communication regarding corporate governance. Contact information and accessibility options are also provided for potential clients and partners.

In [23]:
display_summary("https://openai.com")

# Summary

The website currently displays a message informing visitors to enable JavaScript and cookies in their web browsers in order to access the site's content. No specific information or announcements are available on the page at this time.