# Creating a website summariser agent with OpenAI

In [10]:
import os
import requests # For fetching website data
from dotenv import load_dotenv # For accessing our API key
from bs4 import BeautifulSoup # For accessing fetched website data
from IPython.display import Markdown, display # For displaying processed website data
from openai import OpenAI # Our summarizer model

### 1. Accessing the API key 
#### (Ensure the key exists inside a .env file and stored as 'OPENAI_API_KEY')

In [22]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.strip() == api_key and api_key.startswith("sk-proj"):
    print("API key retrieved successfully!")
else:
    print("Error retrieving API key")

API key retrieved successfully!


### 1. Creating the OpenAI client

In [29]:
openai = OpenAI() # API key is fetched autometically, no need to pass here

### 3. Create a header for accessing websites via get (Not always necessary, but may be blocked without it)

In [34]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

### 4. Create a class for storing and retrieving locally stored website information

In [142]:
class Website:
    def __init__(self, website_url:str, headers:dict=None):
        self.website_url = website_url
        self.headers = headers if headers else {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"}
        self.response = requests.get(url=self.website_url, headers = self.headers)
        fetched_data = BeautifulSoup(self.response.content, 'html.parser')
        self.website_title = fetched_data.title.string if fetched_data.title else 'Website title was not found'

        for irrelevant_data in fetched_data.body(['script', 'style', 'img', 'input']):
            irrelevant_data.decompose()

        self.website_text = fetched_data.body.get_text(separator='\n', strip=True)


In [144]:
# Uncomment the following lines to test out the class

# cnn = Website('https://www.cnn.com', headers)
# print(cnn.website_title)
# print(cnn.website_text)

### 5. Create a function that generates user queries for the model

In [147]:
def user_prompt_for(website_data: Website):
    user_prompt = f"Current website title: {website_data.website_title} \
    \n The contents of the website are as follows:\n \
    {website_data.website_text}"
    return user_prompt

### 6. Create a function that generates full prompt for the AI agent, given a url

In [164]:
def create_prompt(url: str, system_prompt:str):
    website_data = Website(url)
    user_prompt = user_prompt_for(website_data)
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
        ]
    return messages

### 7. Creating a system prompt for assigning a specific role to the AI agent

In [167]:
system_prompt = "You are an assistant that analyzes website contents and provides \
accurate and concise, ignoring text related to advetrts, navigation, or menus. \
Responses should be in markdown."

### 8. Put the full prompt together

In [170]:
# Example website url
website = 'https://www.cnn.com'

prompt = create_prompt(url=website, system_prompt=system_prompt)

### Send the query to the agent

In [None]:
response = openai.chat.completions.create(model="gpt-4o-mini", messages=prompt)
#print(response.choices[0].message.content)

### 9. Display the summarised website

In [185]:
display(Markdown(response.choices[0].message.content))

# CNN Website Content Summary

## Feedback Section
- CNN encourages user feedback on advertisements and technical issues encountered on the site.

## News Topics
CNN features a wide variety of news categories, including:
- **US**
- **World**
- **Politics**
- **Business**
- **Health**
- **Entertainment**
- **Style**
- **Travel**
- **Sports**
- **Science**
- **Climate**
- **Weather**
- **Ukraine-Russia War**
- **Israel-Hamas War**

## Notable News Stories
1. Trump and Russia:
   - Accusations of Trump being "not sufficiently informed" about Ukraine by a Kremlin aide.
   - Trump to assess Putin's seriousness on Ukraine peace.
   - Zelensky's departure from Germany with pledges for long-range weapon assistance.

2. Middle East Conflicts:
   - Israeli PM Netanyahu announces the death of a Hamas leader.
   - Appeal by Lebanese leaders for US pressure on Israel regarding troop withdrawals.
   - Concerns voiced by freed Israeli hostages.

3. Global News:
   - Sentencing of a surgeon in France for abuses.
   - Tragic incidents involving passenger boats in the Canary Islands.
   - Medical findings linking THC with heart disease.
   - Warning from major retailers about consumer boycotts.

4. Technology and Business:
   - Discussion about consumer behavior and economic forecasts.
   - Developments regarding Apple and online safety regulations.

## Features
- **Videos and Podcasts:** CNN offers various video segments and podcasts, including "CNN 5 Things," "Chasing Life with Dr. Sanjay Gupta," and "The Assignment."
- **Interactive Games:** Crossword puzzles and quizzes are available for users to engage with.

## Special Reports
- International reports on consumer trends, environmental challenges, and updates on significant geopolitical events.

## Health & Wellness
- Articles discussing health topics, including chronic diseases and the need for emotional well-being in teens.

## Closing Note
For more content, visitors can explore CNN's dedicated sections for updates on ongoing stories and featured topics in various domains.