# Week 1 - Day 2 - Challenge

This is an implementation of an LLM-enabled website summarizer that uses `llama3.2` to summarize a given website. Here, `llama3.2` is called via three approaches:

1.    local call
2.    `ollama` Python package
3.    `OpenAI` Python library

## 0. Import Libraries

In [None]:
import ollama
import requests

from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

## 1. Load Environment Variables

In [None]:
WEBSITE_URL = "https://arxiv.org/"
llama_methods = ["local-call", "python-package", "openai-python-library"]

## 2. Create a Class to Represent a Webpage

In [None]:
class Website:
    """ """
    def __init__(
        self,
        url: str
        ) -> None:
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " + \
            "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 " + \
            "Safari/537.36"
            }
        self.url = url
        response = requests.get(
            url=url,
            headers=headers
            )
        self.body = response.content
        soup = BeautifulSoup(
            markup=self.body,
            features="html.parser"
            )
        self.title = soup.title.string if soup.title else "No title found!!!"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(
                separator="\n",
                strip=True
                )
        else:
            self.text = ""
        links = [link.get("href") for link in soup.find_all("a")]
        self.links = [link for link in links if link]

    def get_contents(self) -> str:
        return f"Webpage Title:\n{self.title}\n\nWebpage Contents:\n{self.text}\n\n"

In [None]:
website = Website(url=WEBSITE_URL)

## 3. Create Prompts

In [None]:
system_prompt = """You are an assistant that analyzes the contents of a \
website and provides a short summary while ignoring text that might be \
related to navigation. Your response must be in markdown."""

user_prompt = ""
user_prompt += f"You are looking at a website titled '{website.title}'! "
user_prompt += "Based on the contents of the website, please provide a short "
user_prompt += "summary of the website in markdown. If the website includes "
user_prompt += "news or announcements, summarize them, too. The contents of "
user_prompt += "this website are as follows:\n\n"
user_prompt += website.text

## 4. Call "llama3.2"

In [None]:
def ollama_respond(
    method: str,
    system_prompt: str,
    user_prompt: str
    ) -> None:
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
        ]

    if method == "local-call":
        response = requests.post(
            url="http://localhost:11434/api/chat",
            json={
                "model": "llama3.2",
                "messages": messages,
                "stream": False
                },
            headers={"Content-Type": "application/json"}
            )
        display(Markdown(response.json()["message"]["content"]))
        with open(file="w01_d02_ollama_website_summarizer_output.txt", mode="w", encoding="utf-8") as f:
            f.write(response.json()["message"]["content"])
    elif method == "python-package":
        response = ollama.chat(
            model="llama3.2",
            messages=messages
            )
        display(Markdown(response["message"]["content"]))
        with open(file="w01_d02_ollama_website_summarizer_output.txt", mode="w", encoding="utf-8") as f:
            f.write(response["message"]["content"])
    elif method == "openai-python-library":
        api = OpenAI(
            base_url="http://localhost:11434/v1",
            api_key="ollama"
            )
        response = api.chat.completions.create(
            model="llama3.2",
            messages=messages
            )
        display(Markdown(response.choices[0].message.content))
        with open(file="w01_d02_ollama_website_summarizer_output.txt", mode="w", encoding="utf-8") as f:
            f.write(response.choices[0].message.content)

In [None]:
ollama_respond(
    method=llama_methods[2],
    system_prompt=system_prompt,
    user_prompt=user_prompt
    )

## 5. Final Outputs

### 5.1. "llama3.2" via Local Call

# arXiv.org e-Print Archive Summary

### Overview
The arXiv.org e-Print archive is a free distribution service and open-access archive for nearly 2.4 million scholarly articles in various fields, including physics, mathematics, computer science, and economics.

### Subjects
ArXiv is divided into several subjects, including:

* Physics (astro-ph, cond-mat, gr-qc, hep-ex, hep-lat, hep-ph, hep-th, math-ph)
* Mathematics (math)
* Computer Science (CoRR)
* Quantitative Biology (q-bio)
* Quantitative Finance (q-fin)
* Statistics (stat)
* Electrical Engineering and Systems Science (eess)
* Economics (econ)

### Features
ArXiv allows users to:
* Search articles by title, author, or subject
* Browse articles by subject
* Donate to support the organization
* Login and manage account settings

### Announcements
arXiv has received support from the Simons Foundation, member institutions, and contributors. The organization also offers a donation feature for those who wish to contribute to its mission.

Note: Navigation menu and other non-relevant content have been ignored in this summary.

### 5.2. "llama3.2" via `ollama` Python Package

**Summary of arXiv.org e-Print Archive**
=====================================

* **Description**: A free distribution service and open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.
* **Subjects**: Articles are categorized into several subjects, including Astrophysics, Condensed Matter Physics, General Relativity and Quantum Cosmology, High Energy Physics, Mathematics, Nonlinear Sciences, Nuclear Experiment, Nuclear Theory, Physics, and more.
* **News and Announcements**:
	+ arXiv is supported by the Simons Foundation, member institutions, and contributors.
	+ The service is available for free to users worldwide.

Note: I ignored the navigation menu and focus on summarizing the main content of the website.

### 5.3. "llama3.2" via `OpenAI` Python Library

### Summary of arXiv.org e-Print Archive

**Overview**
---------------

The arXiv.org e-Print archive is a free, open-access distribution service and an online archive for peer-reviewed scholarly articles in several fields, including physics, mathematics, computer science, biology, finance, statistics, electrical engineering, and economics.

### Fields of Study

The website provides categorized lists of publications by subject:

* **Physics**: astrophysics, condensed matter, general relativity, high energy physics, etc.
* **Mathematics**: algebraic geometry, dynamical systems, group theory, number theory, etc.
* **Computer Science**: artificial intelligence, machine learning, programming languages, software engineering, etc.
* **Quantitative Biology**: biomolecules, genomics, molecular networks, populations and evolution, etc.
* **Quantitative Finance**: computational finance, economics, mathematical finance, portfolio management, etc.
* **Statistics**: applications, computation, methodology, other statistics, statistical theory, etc.
* **Electrical Engineering and Systems Science**: audio and speech processing, image and video processing, signal processing, systems and control, etc.
* **Economics**: econometrics, general economics, theoretical economics, etc.

### Features

* Search functionality for subjects
* Browse functionality by categories
* Advanced search options
* Support from the Simons Foundation, member institutions, and contributors

**Updates**
---------

Note: The website does not provide explicit "news" or "announcements." However, it does offer a notification service that sends updates via email or Slack for arXiv Operational Status.

### External Links

* arXiv Operational Status notifications via email or Slack