In [1]:
# let's use Selenium

import requests
from bs4 import BeautifulSoup
import json
from IPython.display import display, Markdown

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Constants
OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"

# System prompt
system_prompt = (
    "You are an assistant that analyzes the contents of a website "
    "and provides a short summary, ignoring text that might be navigation related. "
    "Respond in markdown."
)

def get_rendered_html(url, wait_time=5):
    options = Options()
    options.add_argument("--headless") # Essential when running Selenium a containarized environment such as Jupyter Lab.
    options.add_argument("--disable-gpu") # More by tradition than anything else.
    options.add_argument("--no-sandbox") # Disables Chrome's sandbox security feature. We're not in production, so this is safe.
    options.add_argument("--log-level=3") # Warn me only when the system is about to crash

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=options)

    driver.get(url)
    driver.implicitly_wait(wait_time)

    html = driver.page_source
    driver.quit()
    return html

def extract_main_text_from_html(html):
    soup = BeautifulSoup(html, "html.parser")
    for tag in soup(["nav", "header", "footer", "aside", "script", "style"]):
        tag.decompose()
    text = soup.get_text(separator="\n")
    lines = [line.strip() for line in text.splitlines() if line.strip()]
    return "\n".join(lines[:1000])  # Keep input size manageable for the LLM

def summarize_website_selenium(url):
    html = get_rendered_html(url)
    content = extract_main_text_from_html(html)

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": content}
    ]

    payload = {
        "model": MODEL,
        "messages": messages,
        "stream": False
    }

    response = requests.post(OLLAMA_API, headers=HEADERS, data=json.dumps(payload))
    return response.json().get("message", {}).get("content", "No response content")


In [2]:
# Change this to the page you want to summarize
target_url = "https://en.wikipedia.org/wiki/Machine_learning"

In [4]:
summary = summarize_website_selenium(target_url)
display(Markdown(summary))

Here's an overview of machine learning, with key concepts, techniques, and applications:

**What is Machine Learning?**

Machine learning is a subset of artificial intelligence that involves training algorithms to learn from data and make predictions or decisions without being explicitly programmed.

**Types of Machine Learning**

1. **Supervised Learning**: The algorithm learns from labeled data and makes predictions on new, unseen data.
2. **Unsupervised Learning**: The algorithm identifies patterns and structure in unlabeled data.
3. **Semi-Supervised Learning**: A combination of supervised and unsupervised learning, where some data is labeled and some is not.

**Machine Learning Algorithms**

1. **Classification**: Predicting a categorical label for a new instance (e.g., spam vs. non-spam email).
2. **Regression**: Predicting a continuous value for a new instance (e.g., predicting house prices based on features like number of rooms and square footage).
3. **Clustering**: Grouping similar instances into clusters.
4. **Dimensionality Reduction**: Reducing the number of features in a dataset.

**Reinforcement Learning**

1. **Markov Decision Process (MDP)**: An environment with a set of states, actions, and rewards.
2. **Agent**: An entity that interacts with the environment to maximize rewards.

**Applications of Machine Learning**

1. **Image Classification**: Classifying images into categories (e.g., self-driving cars).
2. **Natural Language Processing (NLP)**: Analyzing and generating human language (e.g., chatbots, text summarization).
3. **Recommendation Systems**: Suggesting products or services based on user behavior.
4. **Speech Recognition**: Transcribing spoken words into text.

**Key Concepts**

1. **Bias-Variance Tradeoff**: Balancing the tradeoff between model simplicity and accuracy.
2. **Overfitting**: When a model is too complex and fits the training data but fails to generalize to new data.
3. **Underfitting**: When a model is too simple and fails to capture important patterns in the data.

**Computational Learning Theory**

1. **Probably Approximately Correct (PAC) Learning**: A framework for analyzing the performance of machine learning algorithms.
2. **Bias-Variance Decomposition**: Analyzing the tradeoff between bias and variance in a model.

I hope this provides a good overview of machine learning!