In [21]:
# imports

import os
import requests

from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

# If you get an error running this cell, then please head over to the troubleshooting notebook!

# Connecting to OpenAI

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.



In [22]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [23]:
openai = OpenAI()


# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.
# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions

# Let's make a quick call to a Frontier model to get started, as a preview!

In [24]:
# To give you a preview -- calling OpenAI with these messages is this easy. Any problems, head over to the Troubleshooting notebook.

message = "Hello, GPT! This is my first ever message to you! Hi!"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hello! It's great to hear from you! How can I assist you today?


## OK onwards with our first project

In [25]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        self.text = soup.body.get_text(separator="\n", strip=True)

In [8]:
# Let's try one out. Change the website and add print statements to follow along.

ed = Website("https://www.nttdata.com/global/en/news/press-release/2025/february/021001")

print(ed.title)
print(ed.text)

NTT DATA Unveils Global Insights on GenAI Adoption in Banking: Divergent Strategies for Boosting Productivity vs. Cutting Costs | NTT DATA Group
Industries
Industries
Automotive
Energy & Utilities
Financial Services
Healthcare
Higher Education & Research
Insurance
Life Sciences & Pharma
Manufacturing
Public Sector
Retail & CPG
Telecom, Media & Technology
Travel, Transportation & Logistics
Services
Services
Application Services
Business Process Services
Cloud
Consulting
CX and Digital Products
Cybersecurity
Data & Artificial Intelligence
Digital Workplace
Edge
Global Data Centers
Network Services
Sustainability Services
Tech Solutions & Integration
Insights
By type
Blog: NTT DATA Focus
Clients Cases
Events
By topic
The World Economic Forum, Davos 2025
Technology
Generative AI
ADM Technology Focus Areas
Unlock the Full Potential of Generative AI with NTT DATA
Read more
About Us
About Us
Who we are
NTT DATA Group Corporation Profile
Message from CEO
Our Way
NTT DATA History
NTT DATA Group

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [26]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [27]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [11]:
print(user_prompt_for(ed))

You are looking at a website titled NTT DATA Unveils Global Insights on GenAI Adoption in Banking: Divergent Strategies for Boosting Productivity vs. Cutting Costs | NTT DATA Group
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Industries
Industries
Automotive
Energy & Utilities
Financial Services
Healthcare
Higher Education & Research
Insurance
Life Sciences & Pharma
Manufacturing
Public Sector
Retail & CPG
Telecom, Media & Technology
Travel, Transportation & Logistics
Services
Services
Application Services
Business Process Services
Cloud
Consulting
CX and Digital Products
Cybersecurity
Data & Artificial Intelligence
Digital Workplace
Edge
Global Data Centers
Network Services
Sustainability Services
Tech Solutions & Integration
Insights
By type
Blog: NTT DATA Focus
Clients Cases
Events
By topic
The World Economic Forum, Davos 2025
Technology
Generative AI
ADM Techn

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

To give you a preview, the next 2 cells make a rather simple call - we won't stretch the might GPT (yet!)

In [28]:
messages = [
    {"role": "system", "content": "You are a snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

In [29]:
# To give you a preview -- calling OpenAI with system and user messages:

response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

Oh, you’re starting with the classics, huh? Well, the answer is 4. Not exactly rocket science, but at least it wasn’t “green” or something.


## And now let's build useful messages for GPT-4o-mini, using a function

In [30]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [31]:
# Try this out, and then try for a few more websites

messages_for(ed)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled NTT DATA Unveils Global Insights on GenAI Adoption in Banking: Divergent Strategies for Boosting Productivity vs. Cutting Costs | NTT DATA Group\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nIndustries\nIndustries\nAutomotive\nEnergy & Utilities\nFinancial Services\nHealthcare\nHigher Education & Research\nInsurance\nLife Sciences & Pharma\nManufacturing\nPublic Sector\nRetail & CPG\nTelecom, Media & Technology\nTravel, Transportation & Logistics\nServices\nServices\nApplication Services\nBusiness Process Services\nCloud\nConsulting\nCX and Digital Products\nCybersecurity\nData & Artificial Intelligenc

## Time to bring it together - the API for OpenAI is very simple!

In [32]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [33]:
summarize("https://www.nttdata.com/global/en/news/press-release/2025/february/021001")


'# NTT DATA Insights on GenAI Adoption in Banking\n\nNTT DATA has released a research report titled **"Intelligent Banking in the Age of AI,"** which examines the adoption of generative AI (GenAI) in the banking sector globally. The study reveals a divide in strategies among banking institutions, with many seeing GenAI as both a tool for enhancing productivity and a means to cut costs. \n\n## Key Findings:\n- **Adoption Trends**: As of early 2025, 58% of banking organizations are fully embracing GenAI, a notable increase from 45% in 2023.\n- **Divergent Strategies**: Only 50% of banks view GenAI as a productivity enhancer, while 49% see it as a way to reduce operational IT expenses.\n- **Regional Variances**: The emphasis on cost-cutting and productivity varies significantly by region. For example:\n  - **US banks** are more focused on reducing IT spend (59%) compared to their European counterparts (43%).\n  - **European banks** prioritize productivity (46%) over cost reductions more t

In [34]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    
    summary = summarize(url)
    display(Markdown(summary))

In [35]:
display_summary("https://www.nttdata.com/global/en/news/press-release/2025/february/021001")


# Summary of NTT DATA's Insights on GenAI Adoption in Banking

**Date of Release:** February 10, 2025  
**Source:** NTT DATA Group

## Key Findings:

- **GenAI Adoption in Banking:** NTT DATA has released a report titled *Intelligent Banking in the Age of AI*, highlighting a split in banking strategies regarding the adoption of Generative AI (GenAI). While many banks recognize its potential, opinions vary on whether GenAI should focus on boosting productivity or cutting costs.
  
- **Adoption Statistics:** The report notes that 58% of banking organizations have fully embraced GenAI, a notable increase from 45% in 2023. However, only 50% of banks view GenAI primarily as a solution for productivity issues, and 49% see it as a means to reduce operational IT spending.

- **Diverging Strategies:** The approaches to implementing GenAI differ across regions:
  - **US:** 59% of banks prioritize reducing IT budgets, while 47% aim to cut operations budgets.
  - **Europe:** 46% focus on improving productivity, with less emphasis on IT budget cuts compared to the US.
  - **APAC:** Productivity is also a major concern, with 58% prioritizing efficiency.

- **Challenges and ROI:** Achieving a return on investment (ROI) from GenAI is becoming a significant challenge. Many banks are unsure how to start implementing these technologies effectively, leading to recommendations for partnerships with systems integrators to navigate this complexity.

- **Focus Areas:** The research explores various segments within the banking sector, including Payments, Wealth Management, and Fraud Prevention.
  
## About the Research:
The survey involved 810 banking leaders from around the world, providing insights into the trends and challenges of GenAI adoption across different markets.

## Conclusion:
NTT DATA emphasizes the transformational potential of GenAI but warns that banks must develop clear strategies and governance frameworks to ensure successful implementation and ROI.

For the full report, visit [Intelligent Banking in the Age of AI](#).

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [36]:
display_summary("https://cnn.com")

# CNN Overview

CNN is a comprehensive news outlet providing breaking news, analysis, and a wide variety of video content covering multiple topics including U.S. News, World Events, Politics, Business, Health, Entertainment, Style, Travel, Sports, Science, Climate, and Weather.

## Key Highlights
- **Current Events**: The site features live updates on significant current events, such as the Ukraine-Russia War and the Israel-Hamas conflict.
- **Political Coverage**: Recent political developments include a federal judge allowing Trump's 'buyout' plan for federal employees, ongoing Senate discussions about RFK Jr.'s confirmation, and reactions to Trump's proposals on various policies.
- **Legal News**: Noteworthy legal stories include the confirmation of R. Kelly's convictions and the challenges faced by two transgender girls against Trump's executive order barring them from girls' sports.
- **Entertainment and Culture**: Coverage on various entertainment topics including industry reactions to AI-generated content, Super Bowl highlights, and celebrity news like Scarlett Johansson's activism regarding AI deepfakes.

## Other Offerings
- **Interactive Content**: The platform provides various podcasts featuring discussions on social issues, political themes, and personal stories.
- **Lifestyle Sections**: Features articles on Health & Wellness, travel experiences, and consumer goods, as well as expert advice on various life topics.
- **Sports Updates**: Covers major events and sports news including the NFL, NBA, and cultural discussions surrounding sports personalities.

CNN maintains a dynamic presentation of news and information, aiming to keep its audience informed about crucial issues in a rapidly changing world.

In [37]:
display_summary("https://anthropic.com")

# Summary of Anthropic Website

Anthropic is an AI safety and research company based in San Francisco, dedicated to developing reliable and beneficial AI systems with a focus on safety. The website highlights their flagship AI model, **Claude 3.5 Sonnet**, which is presented as their most intelligent model to date.

## Key Highlights
- **Claude 3.5 Sonnet**: The latest AI model available for interaction and development.
- **API Access**: Anthropic provides tools for building AI-powered applications and custom experiences using Claude.
- **Research and Safety**: The company emphasizes responsible AI development, with ongoing research in AI safety and alignment.
- **Company Values**: Anthropic's interdisciplinary team combines expertise in machine learning, physics, policy, and product design to advance AI safely.

## Announcements
- A recent statement from **Dario Amodei** regarding the **Paris AI Action Summit** was made, indicating the company's engagement with global AI safety discussions.
- Publication of **Core Views on AI Safety** on March 8, 2023, outlining their perspectives on when, why, what, and how to ensure AI safety.

For more details, the website offers information about their products, research initiatives, career opportunities, and company policies.