In [3]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

# If you get an error running this cell, then please head over to the troubleshooting notebook!

In [4]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [5]:
openai = OpenAI()

# Let's make a quick call to a Frontier model!

In [6]:
# To give you a preview -- calling OpenAI with these messages is this easy. Any problems, head over to the Troubleshooting notebook.

message = "Hello, GPT! This is my first ever message to you! Hi!"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hello! Welcome! I'm glad to hear from you. How can I assist you today?


## Website Scraper

In [7]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [9]:
ed = Website("https://cariad.technology")
print(ed.title)
print(ed.text)

CARIAD – Automotive Software for Volkswagen
Home
Company
Company
Company Overview
Our Board
Our Numbers
Our Glossary
Solutions
Solutions
Solutions Overview
Futureproof Hardware
Unified Software
Innovative Applications
News
News
News Overview
Archive
Events
Press Contacts & Downloads
Careers
Careers
Careers Overview
Our Jobs
Our Tech Teams
Our Benefits
Women in Tech
Diversity & Inclusion
Recruiting Process
FAQ Jobs
Code transforming mobility
At CARIAD, we are shaping automotive software that supports Volkswagen Group’s path to becoming a global tech driver in automotive.
Our products already power mobility experiences in millions of vehicles around the world  — making mobility safer, more sustainable, and more comfortable for everyone.
We develop synergetic tech for the Volkswagen Group
Automated Driving for enhanced road safety and driving comfort
Automated Driving for enhanced road safety and driving comfort
User-centric infotainment solutions for more personalized mobility experience

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [10]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [11]:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [12]:
print(user_prompt_for(ed))

You are looking at a website titled CARIAD – Automotive Software for Volkswagen
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Home
Company
Company
Company Overview
Our Board
Our Numbers
Our Glossary
Solutions
Solutions
Solutions Overview
Futureproof Hardware
Unified Software
Innovative Applications
News
News
News Overview
Archive
Events
Press Contacts & Downloads
Careers
Careers
Careers Overview
Our Jobs
Our Tech Teams
Our Benefits
Women in Tech
Diversity & Inclusion
Recruiting Process
FAQ Jobs
Code transforming mobility
At CARIAD, we are shaping automotive software that supports Volkswagen Group’s path to becoming a global tech driver in automotive.
Our products already power mobility experiences in millions of vehicles around the world  — making mobility safer, more sustainable, and more comfortable for everyone.
We develop synergetic tech for the Volkswagen Grou

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```
To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)

In [13]:
messages = [
    {"role": "system", "content": "You are a snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

In [14]:
response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

Oh, we're starting with the easy ones, huh? Well, the answer is 4. You might want to keep a calculator close by just in case the questions get trickier!


## And now let's build useful messages for GPT-4o-mini, using a function

In [15]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [16]:
messages_for(ed)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled CARIAD – Automotive Software for Volkswagen\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nHome\nCompany\nCompany\nCompany Overview\nOur Board\nOur Numbers\nOur Glossary\nSolutions\nSolutions\nSolutions Overview\nFutureproof Hardware\nUnified Software\nInnovative Applications\nNews\nNews\nNews Overview\nArchive\nEvents\nPress Contacts & Downloads\nCareers\nCareers\nCareers Overview\nOur Jobs\nOur Tech Teams\nOur Benefits\nWomen in Tech\nDiversity & Inclusion\nRecruiting Process\nFAQ Jobs\nCode transforming mobility\nAt CARIAD, we are shaping automotive software that supports Volkswagen Group’s path to b

## Time to bring it together - the API for OpenAI is very simple!

In [17]:
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [18]:
summarize("https://cariad.technology")

"# CARIAD – Automotive Software for Volkswagen\n\nCARIAD is the Volkswagen Group's dedicated automotive software company, focused on developing innovative digital technologies to enhance mobility. The company aims to position the Volkswagen Group as a global leader in automotive technology, with solutions already integrated into millions of vehicles worldwide, aimed at making mobility safer, more sustainable, and more comfortable.\n\n## Key Offerings\n- **Automated Driving**: Enhancing road safety and comfort through advanced software solutions.\n- **User-Centric Infotainment**: Providing personalized mobility experiences for drivers and passengers.\n- **Digital Ecosystem**: Creating an integrated digital environment in and around vehicles.\n- **Purpose-Built Vehicle Platforms**: Developing specialized platforms for vehicles' energy management, body systems, and motion control.\n\n## Recent News\n- **Bosch and CARIAD**: Announced a collaboration to improve automated driving safety usin

In [19]:
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [21]:
display_summary("https://cariad.technology")

# CARIAD – Automotive Software for Volkswagen

CARIAD is the Volkswagen Group's dedicated automotive software company, focused on transforming mobility through advanced digital technologies. Their mission is to enhance the driving experience for iconic brands like Audi, Volkswagen, and Porsche by developing innovative software solutions that make mobility safer, more sustainable, and more comfortable.

## Key Highlights:

- **Automated Driving:** CARIAD is committed to improving road safety and driving comfort through advanced automated driving technologies.
- **Personalized Infotainment:** The company provides user-centric infotainment solutions that cater to individual mobility experiences.
- **Digital Ecosystem:** CARIAD is building a comprehensive digital ecosystem that integrates various vehicle systems for optimized performance.
- **Innovative Platforms:** They are developing scalable software platforms and customer functions for vehicles like the Volkswagen ID.Buzz, ID.7, Audi A5, and the electric Porsche Macan.

## Latest News:

1. **AI Collaboration with Bosch (11/08/2025):** CARIAD and Bosch are advancing automated driving safety through AI-enhanced software solutions.
   
2. **EV Routing Solutions (01/08/2025):** CARIAD has introduced tools to optimize summer travel for electric vehicle (EV) drivers, focusing on smart energy management for long-distance journeys.
   
3. **Connected Vehicle Fleets (16/07/2025):** The company highlights the evolution of modern vehicles into intelligent, connected companions that enhance driving experiences.

CARIAD is also actively recruiting, inviting talented individuals to join their mission of redefining mobility in the automotive sector.

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [22]:
display_summary("https://cnn.com")

# Summary of CNN Website Content

The CNN website provides a comprehensive array of news articles, videos, and features covering various global, national, and local topics. The site is organized into several categories including **US**, **World**, **Politics**, **Business**, **Health**, **Entertainment**, **Sports**, **Science**, and more. 

## Notable News Reports
1. **Gaza City Famine**: A UN-backed initiative has reported that Gaza City is suffering from a "man-made" famine, which is expected to spread further.
2. **Israel-Hamas War**: Israeli Prime Minister Benjamin Netanyahu has ordered immediate negotiations for the release of hostages and an end to the ongoing conflict in Gaza.
3. **FBI Search**: Former National Security Advisor John Bolton's home was searched by the FBI.
4. **US Redistricting**: Developments regarding California's and Texas's redistricting plans are ongoing, with political implications being discussed.
5. **Crime News**: Lil Nas X has been arrested for battery against a police officer, and other high-profile criminal cases are highlighted.

## Additional Content
The site also includes analyses on various issues such as:
- Trump’s influence on judicial nominations.
- Russia's military actions in Ukraine, including the increase in drone attacks.
- Environmental stories, such as the use of drones to drop mosquitoes in Hawaii.

## Interactive Features
CNN offers interactive games and quizzes, along with live TV options and opportunities for personalized news updates through subscriptions and newsletters. 

For readers looking for timely and critical news updates, CNN serves as a robust platform for real-time information across multiple categories.

In [23]:
display_summary("https://anthropic.com")

# Anthropic Website Summary

The **Anthropic** website primarily showcases its AI product, **Claude**, and its various iterations like **Claude Opus 4.1** and **Claude Sonnet 4**. It emphasizes the importance of safety in AI development, aiming to create tools that benefit humanity's long-term well-being. 

## Key Features:

- **Claude Models**: Introduces powerful AI models designed for a variety of applications including coding, customer support, and education.
- **API Integration**: Provides resources for developers to build AI-powered applications using Claude.
- **Research & Initiatives**: Features studies related to the societal impacts of AI, responsible scaling policies, and commitments to transparency.
- **Education**: Offers resources through the **Anthropic Academy** to help users learn how to effectively utilize Claude.

## Recent Announcements:

- **ISO 42001 Certification**: Acknowledgment of achieving certification for maintaining high standards in AI development.
- **Claude Opus 4.1**: Recent release highlighting advanced capabilities for AI operations.
- **Project Vend** and **Agentic Misalignment**: Announcements focused on alignment and ethical considerations in AI policies.

The site integrates various resources for both potential users of Claude and those interested in AI safety and research.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise, you experienced calling the Cloud API of a Frontier Model (a leading model at the frontier of AI) for the first time. We will be using APIs like OpenAI at many stages in the course, in addition to building our own LLMs.

More specifically, we've applied this to Summarization - a classic Gen AI use case to make a summary. This can be applied to any business vertical - summarizing the news, summarizing financial performance, summarizing a resume in a cover letter - the applications are limitless. Consider how you could apply Summarization in your business, and try prototyping a solution.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue - now try yourself</h2>
            <span style="color:#900;">Use the cell below to make your own simple commercial example. Stick with the summarization use case for now. Here's an idea: write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.</span>
        </td>
    </tr>
</table>

## Webscraping 2.0

Extended to use:
- OpenAI and local OLLAMA
- As well basic dealing with JS

In [24]:
!pip install selenium webdriver-manager beautifulsoup4 markdown

Collecting selenium
  Downloading selenium-4.35.0-py3-none-any.whl.metadata (7.4 kB)
Collecting webdriver-manager
  Downloading webdriver_manager-4.0.2-py2.py3-none-any.whl.metadata (12 kB)
Collecting markdown
  Downloading markdown-3.8.2-py3-none-any.whl.metadata (5.1 kB)
Collecting trio~=0.30.0 (from selenium)
  Downloading trio-0.30.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket~=0.12.2 (from selenium)
  Downloading trio_websocket-0.12.2-py3-none-any.whl.metadata (5.1 kB)
Collecting sortedcontainers (from trio~=0.30.0->selenium)
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting outcome (from trio~=0.30.0->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.12.2->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl.metadata (5.6 kB)
Downloading selenium-4.35.0-py3-none-any.whl (9.6 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[

In [29]:
import time
import json
import subprocess
from typing import List, Dict, Optional

import requests
from bs4 import BeautifulSoup
from IPython.display import display, Markdown

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager

# ---------- Fetchers ----------
UA = ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
      "(KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36")
headers = {"User-Agent": UA}

def get_rendered_html(url: str, wait_secs: int = 6) -> str:

    opts = Options()
    opts.add_argument("--headless=new")
    opts.add_argument("--no-sandbox")
    opts.add_argument("--disable-dev-shm-usage")
    opts.add_argument(f"--user-agent={UA}")

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=opts)
    try:
        driver.get(url)
        WebDriverWait(driver, wait_secs).until(
            lambda d: d.execute_script("return document.readyState") == "complete"
        )
        time.sleep(1.5)
        return driver.page_source
    finally:
        driver.quit()

# ---------- Website ----------
class Website:
    def __init__(self, url: str, render_js: bool = False, wait_secs: int = 6):
        self.url = url
        if render_js:
            html = get_rendered_html(url, wait_secs=wait_secs)
        else:
            resp = requests.get(url, headers=headers, timeout=30)
            resp.raise_for_status()
            html = resp.text

        soup = BeautifulSoup(html, "html.parser")
        self.title = (
            soup.title.string.strip()
            if soup.title and getattr(soup.title, "string", None)
            else "No title found"
        )

        body = soup.body or soup
        for irrelevant in body(["script", "style", "img", "input", "noscript"]):
            irrelevant.decompose()

        self.text = body.get_text(separator="\n", strip=True)

# ---------- Prompt helpers ----------
SYSTEM_PROMPT = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

def user_prompt_for(website: Website) -> str:
    return (
        f"You are looking at a website titled {website.title}\n"
        "The contents of this website is as follows; provide a short summary "
        "of this website in markdown. If it includes news or announcements, summarize these too.\n\n"
        f"{website.text}"
    )

def messages_for(website: Website) -> List[Dict[str, str]]:
    return [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_prompt_for(website)},
    ]

# ---------- LLM client abstraction ----------
class LLMClient:
    def summarize_messages(self, messages: List[Dict[str, str]]) -> str:
        raise NotImplementedError

# OpenAI backend
class OpenAIClient(LLMClient):
    def __init__(self, model: str = "gpt-4o-mini"):
        import openai  # local import to avoid hard dependency when using Ollama only
        self._openai = openai
        self.model = model

    def summarize_messages(self, messages: List[Dict[str, str]]) -> str:
        resp = self._openai.chat.completions.create(
            model=self.model,
            messages=messages,
        )
        return resp.choices[0].message.content

# Ollama backend (via REST; no extra Python package required)
def _ollama_available(timeout: float = 1.0) -> bool:
    # Prefer HTTP probe; fallback to CLI version check
    try:
        r = requests.get("http://127.0.0.1:11434/api/tags", timeout=timeout)
        return r.ok
    except Exception:
        pass
    try:
        subprocess.run(["ollama", "--version"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=2)
        return True
    except Exception:
        return False

def _ollama_model_installed(model: str, timeout: float = 2.0) -> bool:
    try:
        r = requests.get("http://127.0.0.1:11434/api/tags", timeout=timeout)
        if not r.ok:
            return False
        tags = r.json().get("models", [])
        names = {m.get("name", "").split(":")[0] for m in tags}
        return model.split(":")[0] in names
    except Exception:
        return False

class OllamaClient(LLMClient):
    def __init__(self, model: str = "llama3.2"):
        if not _ollama_available():
            raise RuntimeError(
                "Ollama is not reachable. Start the Ollama daemon. "
                "On macOS: launch the app; on Linux: `ollama serve`."
            )
        if not _ollama_model_installed(model):
            raise RuntimeError(
                f"Ollama model '{model}' not found locally. Pull it first:\n"
                f"  ollama pull {model}\n"
                f"Then retry."
            )
        self.model = model

    def summarize_messages(self, messages: List[Dict[str, str]]) -> str:
        # POST /api/chat expects: {model, messages:[{role, content}]}
        payload = {"model": self.model, "messages": messages, "stream": False}
        r = requests.post(
            "http://127.0.0.1:11434/api/chat",
            data=json.dumps(payload),
            headers={"Content-Type": "application/json"},
            timeout=120,
        )
        if not r.ok:
            raise RuntimeError(f"Ollama error: HTTP {r.status_code} - {r.text[:500]}")
        data = r.json()
        # Format matches: {'message': {'role': 'assistant', 'content': '...'}, ...}
        msg = data.get("message", {})
        content = msg.get("content")
        if not content:
            # Some versions return aggregated content in 'response'
            content = data.get("response", "")
        return content

# ---------- Factory ----------
def make_client(provider: str, model: Optional[str] = None) -> LLMClient:
    p = provider.lower().strip()
    if p == "openai":
        return OpenAIClient(model=model or "gpt-4o-mini")
    if p == "ollama":
        return OllamaClient(model=model or "llama3.2")
    raise ValueError(f"Unknown provider '{provider}'. Use 'openai' or 'ollama'.")

# ---------- Public API ----------
def summarize(url: str, provider: str = "openai", model: Optional[str] = None, render_js: bool = False) -> str:
    website = Website(url, render_js=render_js)
    client = make_client(provider, model)
    return client.summarize_messages(messages_for(website))

def display_summary(url: str, provider: str = "openai", model: Optional[str] = None, render_js: bool = False):
    summary = summarize(url, provider=provider, model=model, render_js=render_js)
    display(Markdown(summary))


In [30]:
display_summary("https://cariad.technology", provider="openai", model="gpt-4o-mini", render_js=True)

# CARIAD – Automotive Software for Volkswagen

CARIAD is the automotive software company for the Volkswagen Group, focused on pioneering digital technologies to enhance mobility for iconic car brands such as Audi, Volkswagen, and Porsche. The company is dedicated to transforming vehicles into intelligent and connected experiences that prioritize safety, sustainability, and comfort.

## Key Offerings:
- **Automated Driving Solutions**: Focus on road safety and driving comfort through advanced automation.
- **Infotainment Systems**: Development of user-centered infotainment solutions for personalized mobility.
- **Digital Ecosystem**: Creation of interconnected digital environments in and around vehicles.
- **Vehicle Platforms**: Purpose-built platforms for various vehicle functionalities including energy management and motion systems.

## Latest News:
1. **Collaboration with Bosch** (11/08/2025): Enhancing automated driving safety using AI, aiming for a higher level of technological integration.
2. **EV Routing Solutions** (01/08/2025): Introduction of smart energy management features for better long-distance travel solutions for electric vehicle drivers during the summer travel season.
3. **Connected Vehicle Fleet** (16/07/2025): Discussion on how CARIAD supports a connected vehicle fleet, transitioning vehicles from mere transportation means to intelligent, personalized companions.

CARIAD is committed to innovation at scale, powering models like the Volkswagen ID.Buzz and ID.7, the Audi A5, and the electric Porsche Macan with cutting-edge technology. The company is also actively recruiting to expand its tech teams and encourage diversity in the workforce.

In [32]:
display_summary("https://cariad.technology", provider="ollama", model="llama3.2", render_js=True)

# CARIAD – Automotive Software for Volkswagen

CARIAD is the automotive software company of the Volkswagen Group, aiming to create and deliver leading digital technologies for iconic car brands. They develop scalable software platforms and digital customer functions for vehicles like the ID.Buzz, ID.7, Audi A5, and Porsche Macan.

## News and Announcements

* **Bosch and CARIAD Partner on Automated Driving**: The companies are making automated driving even safer with AI.
* **CARIAD's EV Routing Solutions**: Makes electric vehicle routing solutions for long-distance journeys practical during summer months.
* **CARIAD Powers Connected Vehicle Fleet**: Modern vehicles have evolved to become intelligent companions that turn every drive into a personalized, digital experience.

## Solutions

CARIAD offers various solutions:

* **Futureproof Hardware**: Supports the Volkswagen Group's path to becoming a global tech driver in automotive.
* **Unified Software**: Develops synergetic tech for enhanced road safety and driving comfort, user-centric infotainment solutions, and a digital ecosystem in and around the car.
* **Innovative Applications**: Purpose-build vehicle driving platform for energy, body, and motion systems.

## Careers

CARIAD offers job opportunities in various fields, including:

* **Our Jobs**
* **Locations**
* **Recruiting Process**

Note: The website's content is mostly navigation-related, so the provided summary only includes relevant information.