In [27]:
import requests
import ollama
from IPython.display import Markdown, display
from bs4 import BeautifulSoup

## Website Scraping

In [14]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [15]:
cnn = Website("https://www.cnn.com")

print(cnn.title)
print(cnn.text)

Breaking News, Latest News and Videos | CNN
CNN values your feedback
1. How relevant is this ad to you?
2. Did you encounter any technical issues?
Video player was slow to load content
Video content never loaded
Ad froze or did not finish loading
Video content did not start after ad
Audio on ad was too loud
Other issues
Ad never loaded
Ad prevented/slowed the page from loading
Content moved around while ad loaded
Ad was repetitive to ads I've seen previously
Other issues
Cancel
Submit
Thank You!
Your effort and contribution in providing this feedback is much
                                        appreciated.
Close
Ad Feedback
Close icon
US
World
Politics
Business
Health
Entertainment
Style
Travel
Sports
Science
Climate
Weather
Ukraine-Russia War
Israel-Hamas War
Games
More
US
World
Politics
Business
Health
Entertainment
Style
Travel
Sports
Science
Climate
Weather
Ukraine-Russia War
Israel-Hamas War
Games
Watch
Listen
Live TV
Subscribe
Sign in
My Account
Settings
Newsletters
Topics yo

## Summarization

In [16]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [21]:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
        \nplease provide a short summary of this website in markdown. \
        \nIf it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text

    return user_prompt

In [22]:
print(user_prompt_for(website=cnn))

You are looking at a website titled Breaking News, Latest News and Videos | CNN
The contents of this website is as follows;         
please provide a short summary of this website in markdown.         
If it includes news or announcements, then summarize these too.

CNN values your feedback
1. How relevant is this ad to you?
2. Did you encounter any technical issues?
Video player was slow to load content
Video content never loaded
Ad froze or did not finish loading
Video content did not start after ad
Audio on ad was too loud
Other issues
Ad never loaded
Ad prevented/slowed the page from loading
Content moved around while ad loaded
Ad was repetitive to ads I've seen previously
Other issues
Cancel
Submit
Thank You!
Your effort and contribution in providing this feedback is much
                                        appreciated.
Close
Ad Feedback
Close icon
US
World
Politics
Business
Health
Entertainment
Style
Travel
Sports
Science
Climate
Weather
Ukraine-Russia War
Israel-Hamas War
Ga

In [23]:
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [26]:
messages_for(cnn)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': "You are looking at a website titled Breaking News, Latest News and Videos | CNN\nThe contents of this website is as follows;         \nplease provide a short summary of this website in markdown.         \nIf it includes news or announcements, then summarize these too.\n\nCNN values your feedback\n1. How relevant is this ad to you?\n2. Did you encounter any technical issues?\nVideo player was slow to load content\nVideo content never loaded\nAd froze or did not finish loading\nVideo content did not start after ad\nAudio on ad was too loud\nOther issues\nAd never loaded\nAd prevented/slowed the page from loading\nContent moved around while ad loaded\nAd was repetitive to ads I've seen previously\nOther issues\nCancel\nSubmit\nThank You!\nYour effort and contribut

In [28]:
def summarize(url):
    website = Website(url)
    response = ollama.chat(
        model='llama3.2',
        messages=messages_for(website),
    )

    return response['message']['content']

In [29]:
url = "https://www.cnn.com"

display(Markdown(summarize(url)))

This appears to be a CNN website, and it contains a vast array of news articles, videos, podcasts, and other content. The site is constantly updating with new stories, so I'll provide an overview of the topics and sections available.

**Top Stories**

The main section displays a selection of recent and trending news headlines from around the world. You can click on individual articles to read more in-depth information.

**Categories**

CNN offers various categories to help you navigate their content:

1. **US Politics**: News and analysis on US government, elections, and politics.
2. **World News**: Global coverage of international events, conflicts, and news.
3. **Business**: Financial news, market analysis, and business insights.
4. **Health**: Medical news, health trends, and wellness information.
5. **Entertainment**: Movies, TV shows, music, and celebrity news.
6. **Sports**: News, scores, and updates on various sports and athletes.

**Podcasts**

CNN offers a range of podcasts covering various topics:

1. **The Axe Files with David Axelrod**
2. **Patriot Act with Hasan Minhaj**
3. **The Daily**
4. ** Cuomo Prime Time**
5. **Anderson Cooper 360° Podcast**

**Videos**

You can watch live TV, videos, and news clips from CNN's various channels:

1. **CNN Live**: Watch the latest news updates.
2. **CNN Prime News**: Live coverage of breaking news events.
3. **CNN Tonight with Don Lemon**
4. **New Day with Christiane Amanpour**

In [30]:
url = "https://www.espn.com"

display(Markdown(summarize(url)))


It appears that this is an ESPN website, and the text is a collection of articles and links to various sports-related content. Here are some of the headlines and topics covered:

1. NFL Draft: The website mentions the NFL draft, with articles about picks, reaches, and surprises.
2. NBA Playoffs: There are several articles about the NBA playoffs, including picks for each series, including the Finals.
3. Stanley Cup Playoffs: The website discusses the NHL playoffs, with a focus on goalies and potential storylines.
4. MLB Trade Deadline: An article previews the upcoming trade deadline in Major League Baseball.
5. Fantasy Sports: There are links to fantasy sports platforms, including ESPN+, where users can create leagues, practice with mock drafts, and play games.
6. UFC Fight Night: The website mentions an upcoming UFC fight, Machado Garry vs. Prates.
7. PGA TOUR LIVE: There is a link to PGA TOUR LIVE, which appears to be a streaming service for golf events.

Overall, the website seems to be focused on providing news, analysis, and entertainment for sports fans, with a emphasis on NFL, NBA, MLB, NHL, UFC, and golf content.