In [1]:
import ollama
import requests
from bs4 import BeautifulSoup 
from IPython.display import Markdown, display

In [2]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website():
    def __init__(self, url):
        self.url = url 

        req = requests.get(self.url, headers=headers)
        soup = BeautifulSoup(req.content, "html.parser")
        self.title = soup.title.string if soup.title else "No title found" 
        for item in soup.body(["script", "style", "img", "input"]):
            item.decompose() 
        self.text = soup.body.get_text(separator="\n", strip=True)

In [3]:
def msg(website):
    system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text   

    return [ {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}]
    

In [4]:
def summarizer(website):
    sum = ollama.chat("llama3.2", messages=msg(website))
    display(Markdown(sum['message']['content']))

In [5]:
website = Website("https://cnn.com")
summarizer(website)

Here are the top news stories from CNN:

1. **Trump's energy strategy is 'drill baby drill':** President Trump's plan to increase domestic oil production and reduce environmental regulations has sparked concerns among critics who say it will harm the environment.
2. **Seniors won't pay more than $2,000 for drugs at the pharmacy starting in January:** The Centers for Medicare & Medicaid Services (CMS) announced that seniors with prescription drug coverage through Medicare Part D will not face higher out-of-pocket costs starting in January.
3. **Neil Young pulls out of playing Glastonbury 2025, says it's become a 'corporate turn-off':** Legendary musician Neil Young has cancelled his performance at the upcoming Glastonbury Festival, citing concerns about the festival's increasing commercialization.
4. **Brad Pitt and Angelina Jolie settle their divorce:** After years of litigation, the ex-couple have reached an agreement on their divorce settlement, with details still to be disclosed.
5. **Israel's population growth slows as residents emigrate amid multi-front war:** The number of babies born in Israel has slowed in recent years due to increased migration from the country, experts say.

Additionally, here are some breaking news stories:

1. **Prosecutors say 150 pipe bombs found at Virginia man's home in what could be FBI's largest-ever seizure of explosive devices:** Law enforcement officials have announced a major bust involving hundreds of pipe bombs, raising concerns about domestic terrorism.
2. **Romania and Bulgaria fully join Europe's borderless travel zone:** The two countries will now allow for passport-free travel with other EU member states.

And here are some trending topics:

1. **MrBeast and Thea Booysen are engaged:** Popular YouTuber MrBeast has announced his engagement to his long-time partner, Thea Booysen.
2. **Single cigarette slashes 20 minutes off your life expectancy, UK research suggests:** A new study from the University of Stirling in Scotland has found that smoking a single cigarette can reduce one's life expectancy by up to 20 minutes.

These are just some of the top news stories and trending topics on CNN.

In [6]:
website = Website("https://thestar.com/")
summarizer(website)

This appears to be a Canadian news website's homepage. The site features a mix of national and local news, sports, entertainment, business, and lifestyle articles.

Here are some key features:

1. **Breaking News**: A prominent section displaying the latest news headlines.
2. **First Up**: A daily newsletter sign-up section, which allows users to receive the latest news and updates from the Toronto Star.
3. **News Tips**: A section where readers can submit tips or suggestions for news stories to the publication.
4. **The Star's brand**: The site features a prominent logo and branding elements, including a masthead with contact information and a glossary of terms.

Some popular sections on the website include:

* **Sports**: News and analysis from Canada's top sports leagues, including the Toronto Raptors and Blue Jays.
* **Entertainment**: Movie, TV, music, and arts news.
* **Business**: Financial news and analysis for Canadian business leaders and investors.
* **Lifestyle**: Health, wellness, and relationships content.

Overall, this website appears to be a comprehensive source of news and information for Canadians, with a focus on local stories and issues.

In [7]:
website = Website("https://www.cbc.ca/")
summarizer(website)

This appears to be the homepage of the Canadian Broadcasting Corporation (CBC) website, with a mix of news articles, sports updates, and entertainment features. The content is categorized into various sections, including:

* News: Articles on politics, current events, and national news.
* Sports: Updates on hockey, curling, snowboarding, and other winter sports, as well as Canadian mixed doubles curling trials.
* Entertainment: Features on music, arts, and culture, including albums turning 25 in 2025 and a mom paying her child $100 to read a book.
* Life: Articles on personal finance, relationships, and lifestyle topics, such as how to update your resume even if you haven't had a job in years.
* Politics: Analysis of the Liberal government's plans for 2025, including potential prorogation.
* World News: Updates on international news, including Ukraine halting transit of Russian gas to Europe.

The website also includes various sections and features, such as:

* A section dedicated to Indigenous culture and issues.
* A section on arts and entertainment, including music, books, and film.
* A section on politics and current events.
* A section on sports and fitness.
* A section on personal finance and lifestyle topics.

Overall, the website appears to be a comprehensive source of news, information, and entertainment for Canadians.