# Instant Gratification!

Let's build a useful LLM solution - in a matter of minutes.

Our goal is to code a new kind of Web Browser. Give it a URL, and it will respond with a summary. The Reader's Digest of the internet!!

Before starting, be sure to have followed the instructions in the "README" file, including creating your API key with OpenAI and adding it to the `.env` file.

## If you're new to Jupyter Lab

Welcome to the wonderful world of Data Science experimentation! Once you've used Jupyter Lab, you'll wonder how you ever lived without it. Simply click in each "cell" with code in it, like the cell immediately below this text, and hit Shift+Return to execute that cell. As you wish, you can add a cell with the + button in the toolbar, and print values of variables, or try out variations.

If you need to start again, go to Kernel menu >> Restart kernel.

In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

# Connecting to OpenAI

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.

## Troubleshooting if you have problems:

1. OpenAI takes a few minutes to register after you set up an account. If you receive an error about being over quota, try waiting a few minutes and try again.
2. Also, double check you have the right kind of API token with the right permissions. You should find it on [this webpage](https://platform.openai.com/api-keys) and it should show with Permissions of "All". If not, try creating another key by:
- Pressing "Create new secret key" on the top right
- Select **Owned by:** you, **Project:** Default project, **Permissions:** All
- Click Create secret key, and use that new key in the code and the `.env` file (it might take a few minutes to activate)
- Do a Kernel >> Restart kernel, and execute the cells in this Jupyter lab starting at the top
4. As a fallback, replace the line `openai = OpenAI()` with `openai = OpenAI(api_key="your-key-here")` - while it's not recommended to hard code tokens in Jupyter lab, because then you can't share your lab with others, it's a workaround for now
5. Contact me! Message me or email ed@edwarddonner.com and we will get this to work.

Any concerns about API costs? See my notes in the README - costs should be minimal, and you can control it at every point.

In [2]:
# Load environment variables in a file called .env

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY','your-key-if-not-using-env')
openai = OpenAI()

In [3]:
# A class to represent a Webpage

class Website:
    url: str
    title: str
    text: str

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [4]:
# Let's try one out

ed = Website("https://www.agitavit.si/")
print(ed.title)
print(ed.text)

Vašo vizijo podpremo s tehnologijo - Agitavit
Agitavit
Kaj delamo
Storitve
Digitalizacija po meri
Razvoj mobilnih aplikacij
Umetna inteligenca
Rešitve in produkti
Digitalizacija HR z eHRM
Digital Agora
V 20 letih še nisem imel priložnosti sodelovati s tako agilnim, zanesljivim in na stranko osredotočenim podjetjem. Agitavit je res pravi partner, ko gre za doseganje zahtevnih poslovnih ciljev."
Stefan Luhede, NBS - Head Sandoz IT CC R&D, HEXAL AG
O nas
Zaposlimo
Stranke
Vsebine
Dogodki
Kontakt
sl
Slovenščina
English
Open menu
Close menu
Kaj delamo
Storitve
Digitalizacija po meri
Razvoj mobilnih aplikacij
Umetna inteligenca
Rešitve in produkti
Digitalizacija HR z eHRM
Digital Agora
O nas
Zaposlimo
Stranke
Vsebine
Dogodki
Slovenščina
English
Vašo vizijo podpremo s tehnologijo
Agitavit Solutions
Storitve
20 let izkušenj
Že dve desetletji ustvarjamo uspešne zgodbe za digitalno prihodnost.
130 sodelavcev
Smo izkušeni strokovnjaki, ki nas združuje želja stranke navdušiti z najboljšimi digital

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [5]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [6]:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [7]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Time to bring it together - the API for OpenAI is very simple!

In [8]:
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [9]:
summarize("https://edwarddonner.com")

"# Summary of Edward Donner's Website\n\nEdward Donner is an AI enthusiast and co-founder of Nebula.io, a company focused on leveraging AI to assist individuals in discovering their potential and improving talent management for recruiters. The website features personal insights into his interests, including coding, LLM (Large Language Model) experimentation, and music. \n\n## Key Features:\n- **Outsmart**: A platform designed for LLMs to compete in a diplomatic and strategic environment.\n- **Posts**: Various articles and resources, including:\n  - **December 21, 2024**: A welcome message to SuperDataScientists.\n  - **November 13, 2024**: Resources for mastering AI and LLM engineering.\n  - **October 16, 2024**: Guidance for transitioning from a software engineer to an AI data scientist.\n  - **August 6, 2024**: Information about the Outsmart LLM Arena.\n\nEdward invites visitors to connect with him and stay updated through his posts and newsletter."

In [10]:
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [11]:
display_summary("https://edwarddonner.com")

# Summary of Edward Donner's Website

The website is dedicated to the personal and professional interests of Ed, who is passionate about coding and experimenting with large language models (LLMs). He is the co-founder and CTO of Nebula.io, an AI company focused on helping individuals discover their potential and manage talent, utilizing patented LLM technologies. Previously, he was the founder and CEO of the AI startup untapt, acquired in 2021.

## Key Topics
- **Outsmart**: A platform that features a competitive environment for LLMs where they engage in strategic diplomacy and cunning tactics.
- **Blog Posts**: Ed shares various resources and insights, specifically tailored for data scientists and AI engineers.

## Recent Announcements
- **December 21, 2024**: Welcome message for "SuperDataScientists."
- **November 13, 2024**: Sharing resources for mastering AI and LLM Engineering.
- **October 16, 2024**: Resources catered towards transitioning from software engineering to AI data science.
- **August 6, 2024**: Introduction to the Outsmart LLM Arena.

In [12]:
display_summary("https://cnn.com")

# CNN Website Summary

CNN's website serves as a comprehensive source for breaking news, covering a wide array of topics including U.S. and world politics, business, health, entertainment, sports, science, and climate news. It provides live updates on significant events such as the ongoing wildfires in Los Angeles and developments in international conflicts like the Ukraine-Russia and Israel-Hamas wars.

## Key Updates:
- **Los Angeles Wildfires**: Reports state at least 24 fatalities and numerous missing persons. Authorities indicate that assessing the situation is currently unsafe.
- **Trump Investigation**: A judge has approved the release of the special counsel's report related to former President Trump and allegations of election subversion.
- **Gaza Ceasefire**: U.S. officials have indicated that a ceasefire deal regarding Gaza may be imminent, offering a rare glimpse of optimism.
- **International News**: New prime minister named in Lebanon, and a tsunami advisory lifted post a 6.9 magnitude earthquake in Japan.

CNN continues to provide live broadcasts, video content, and in-depth coverage of significant events, fostering engagement through its platform.

In [14]:
display_summary("https://anthropic.com")

# Anthropic Overview

Anthropic is an AI safety and research company based in San Francisco, focused on developing reliable and beneficial AI systems. The company offers various AI products, prominently featuring their latest models, Claude 3.5 Sonnet and Claude 3.5 Haiku, aimed at facilitating safe and effective human-AI interactions.

## Key Updates

- **Claude 3.5 Sonnet Launch**: The latest AI model, Claude 3.5 Sonnet, has been made available for users.
- **Introduction of Computer Use**: Alongside Sonnet, a new feature allowing computer use has been introduced.
- **Product for Enterprises**: The company provides tailored solutions for enterprise needs.
- **Research Focus**: Anthropic prioritizes AI safety, engaging in interdisciplinary research to ensure systems remain beneficial and harmless.

## Recent Announcements

- **Claude 3.5 Sonnet and Haiku**: These updates were announced on October 22, 2024, highlighting improvements in the AI models.
- **Constitutional AI Research Paper**: Released on December 15, 2022, addressing AI harmlessness through feedback mechanisms.
- **Core Views on AI Safety**: A paper outlining essential perspectives on AI safety was published on March 8, 2023.

Anthropic invites collaboration and offers careers in various fields related to AI and safety.

## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. Please push your code afterwards so I can share it with other students!

In [None]:
display_summary("https://openai.com")

In [15]:
#Parse webpages which is designed using JavaScript heavely
# download the chorme driver from here as per your version of chrome - https://developer.chrome.com/docs/chromedriver/downloads
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

PATH_TO_CHROME_DRIVER = 'c:\\projects\\llm_engineering\\chromedriver.exe'

class Website:
    url: str
    title: str
    text: str

    def __init__(self, url):
        self.url = url

        options = Options()

        #options.add_argument("--no-sandbox")
        #options.add_argument("--disable-dev-shm-usage")
        options.add_argument('--disable-blink-features=AutomationControlled')
        options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.71 Safari/537.36')

        service = Service(PATH_TO_CHROME_DRIVER)
        driver = webdriver.Chrome(service=service, options=options)
        driver.get(url)

    #    input("Please complete the verification in the browser and press Enter to continue...")
        page_source = driver.page_source
        driver.quit()

        soup = BeautifulSoup(page_source, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.get_text(separator="\n", strip=True)

In [18]:
display_summary("https://www.kompas-xnet.si/izobrazevanja/za-it-strokovnjake")

# Summary of Za IT strokovnjake

The website **Kompas Xnet** offers various educational programs and IT services designed for professionals and businesses. Their offerings include:

- **Software Development**
- **Educational and Examination Center**
- **Infrastructure Services**
- **SharePoint and BI Solutions**
- **Custom Training and Business Skills Workshops**

## Courses and Training
The educational center provides a wide range of courses catering to both beginners and experienced IT professionals. Key features include:

- Official Microsoft courses are a significant part of their curriculum, recognized worldwide.
- Practical and interactive training emphasizing real-world application.
- Masterclasses from internationally recognized experts in IT security and management.
- Courses in database management (SQL), programming (Java, Python), and Microsoft Dynamics CRM.

## Recent News
The website mentions an upcoming course titled **Power Excel** on **January 16, 2024**, encouraging users to register.

## User Testimonials
Valuable feedback from participants highlights the effectiveness and professionalism of their training, solidifying Kompas Xnet's reputation as a reliable partner in IT education.

## Contact Information
For inquiries and further information, users can reach out via phone or email listed on the website. 

Overall, **Kompas Xnet** stands out as a professional hub for IT education and training, responding to the evolving needs of the industry.

In [17]:
display_summary("https://www.predictiveindex.com/software/")

# Summary of Talent Optimization Platform - The Predictive Index

The Predictive Index offers a Talent Optimization Platform designed to enhance hiring, employee development, and retention through personalized HR software. It utilizes validated assessments and behavioral science to improve recruitment and team dynamics while fostering employee engagement. Key features include:

- **Validated Hiring**: Assessments to identify top candidates, reducing the risk of mishires.
- **Leadership and Employee Development**: Tools for personalized growth, effective communication, and team alignment.
- **Employee Engagement**: Science-driven surveys to understand and improve employee satisfaction.
- **People Management**: Resources for managers to develop their teams and enhance accountability.

The platform is actionable throughout the employee lifecycle, making it a comprehensive solution for optimizing talent management in organizations.

### Customer Success Stories
Notable customer success stories highlight the effectiveness of the platform:
- **Totango**: Achieved a 20% reduction in attrition.
- **Builtech**: Saved nearly $200,000 in turnover costs.

### Science and Methodology
The Predictive Index is backed by extensive behavioral science, with over 60 years of research and 300 validity studies informing its assessments. 

### Resources and Learning
The platform provides educational resources, certifications, webinars, and industry blogs to help users maximize their understanding and application of its features.

### Call to Action
Visitors to the website can request a demo to see how the platform can optimize their talent management processes.