# Website Summarization Using llama
### Week 1 Day 2 Exercise

In [2]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [3]:
# Constants

# OLLAMA_API = "http://localhost:11434/api/chat"
# HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"

In [4]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [5]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."


def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt
    

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

def summarize(url):
    website = Website(url)
    response = ollama_via_openai.chat.completions.create(
        model = MODEL,
        messages = messages_for(website)
    )
    return response.choices[0].message.content


def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [6]:
display_summary("https://cnn.com")

The provided text appears to be a snippet of the CNN website, containing various news articles and other content. To provide a more detailed answer, I'll focus on identifying the main sections or categories present in the text.

1. **News**: The main section is dedicated to providing breaking news updates from around the world, covering topics such as:
	* Politics
	* World Affairs (Middle East conflict, Israel-Hamas War, Ukraine-Russia War)
	* Business and Economics (Ford investment, Trump Administration agenda law)
	* Crime and Justice (Gunman in CDC shooting, 2.4 million people lose food stamps due to Trump agenda law)
2. **Entertainment**:
	* Movies (Theatrical releases or upcoming movies like "Weapons")
	* Television (Shows A-Z, Sports programming schedule)
3. **Technology**:
	* Tech (Ford announces $5 billion US investment as part of its 'next Model T moment')
4. **Life & Culture**:
	* Arts and Design
	* Fashion
	* Beauty
	* Health and Fitness
	* Home and Reviews (Deals, Gifts)
	* Travel (Outdoors, Pets, Destinations, Food & Drink)
5. **Sports**:
	* Pro Football
	* College Football
	* Basketball
	* Baseball
6. **Science and Discovery**:
	* Space
	* Climate
	* Solutions

Please note that this categorization is based on the content present in the provided snippet, which may not reflect an exhaustive list of topics or categories available on the CNN website.