# LLMs and Agents in Production: Day 5 - Building a Website Summarizer with Python, Ollama, and Markdown Streaming

Language models are incredibly powerful when combined with real-world data sources like websites. In this hands-on tutorial, we'll build a modular website summarizer using:
- 🧼 BeautifulSoup for web scraping
- 🦙 Ollama for running open-source LLMs locally
- 📜 IPython.display for live streaming summaries in markdown format

This project blends the world of web scraping, LLM inference, and interactive display - making it ideal for research, automation, or data mining workflows.

## Install packages: requests, beautifulsoup4, openai, IPython

In [1]:
# !pip install requests beautifulsoup4 openai ipython

You also need to install and run Ollama:

In [None]:
# Install Ollama (https://ollama.com)
!curl -fsSL https://ollama.com/install.sh | sh

# Start an LLM model (e.g., LLaMA 3.2 with only 1 billion parameters) 
!ollama run llama3.2:1b

>>> Cleaning up old version at /usr/local/lib/ollama
[sudo] password for ebrahim: 
sudo: a password is required
^C
[?2026h[?25l[1G⠙ [K[?25h[?2026l[?2026h[?25l[1G⠹ [K[?25h[?2026l[?2026h[?25l[1G⠹ [K[?25h[?2026l[?2026h[?25l[1G⠼ [K[?25h[?2026l[?2026h[?25l[1G⠴ [K[?25h[?2026l[?2026h[?25l[1G⠦ [K[?25h[?2026l[?2026h[?25l[1G⠧ [K[?25h[?2026l[?2026h[?25l[1G⠇ [K[?25h[?2026l[?2026h[?25l[1G⠏ [K[?25h[?2026l[?2026h[?25l[1G⠋ [K[?25h[?2026l[?2026h[?25l[1G⠙ [K[?25h[?2026l[?2026h[?25l[1G⠹ [K[?25h[?2026l[?2026h[?25l[1G⠸ [K[?25h[?2026l[?2026h[?25l[1G⠼ [K[?25h[?2026l[?2026h[?25l[1G⠴ [K[?25h[?2026l[?2026h[?25l[1G⠦ [K[?25h[?2026l[?2026h[?25l[1G⠧ [K[?25h[?2026l[?2026h[?25l[1G⠇ [K[?25h[?2026l[?2026h[?25l[1G⠏ [K[?25h[?2026l[?2026h[?25l[1G⠋ [K[?25h[?2026l[?25l[?2026h[?25l[1G[K[?25h[?2026l[2K[1G[?25h[?2004h>>> [38;5;245mSend a message (/? for help)[28D[0m

## Step 1: Website Class - Scrape and Clean

In [2]:
import requests
from typing import List
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [3]:
HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
        "(KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
    )
}
class Website:
    def __init__(self, url: str):
        self.url = url
        self.title, self.text = self._scrape()
        
    def _scrape(self):
        response = requests.get(self.url, headers=HEADERS)
        soup = BeautifulSoup(response.content, 'html.parser')
        title = soup.title.string if soup.title else "No title"
        for tag in soup.body(["script", "style", "img", "input"]):
            tag.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
        return title, text
    
    def as_prompt(self) -> str:
        return (
            f"You are looking at a website titled '{self.title}'.\n"
            "Summarize its main content in markdown. Ignore menus, ads, or navigation.\n\n"
            f"{self.text}"
        )

In [None]:
Website("https://medium.com/@ebimsv/python-for-ai-week-11-working-with-files-in-python-92870e1a410a")

<__main__.Website at 0x73dd0d0c7430>

## Step 2: LLM Setup with Ollama API

In [4]:
from openai import OpenAI

OLLAMA_CLIENT = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
MODEL = "llama3.2:1b"

- No OpenAI key is needed - ollama is local-first.
- You can swap out "llama3" with any model you've pulled with Ollama.

## Step 3: Build the Prompt

In [5]:
SYSTEM_PROMPT = (
    "You are an assistant that analyzes a website's content "
    "and provides a markdown summary. Ignore ads and navigation."
)

def build_messages(website: Website):
    return [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": website.as_prompt()}
    ]

## Step 4: Stream Markdown Output in Jupyter

In [6]:
from IPython.display import Markdown, display, update_display

def summarize_website(url: str):
    site = Website(url)
    messages = build_messages(site)
    stream = OLLAMA_CLIENT.chat.completions.create(
        model=MODEL,
        messages=messages,
        stream=True,
    )
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        content = chunk.choices[0].delta.content or ""
        response += content
        cleaned = response.replace("```", "").replace("markdown", "")
        update_display(Markdown(cleaned), display_id=display_handle.display_id)
    return Markdown(cleaned)

## Example Usage

In [8]:
summarize_website("https://medium.com/@ebimsv/python-for-ai-week-11-working-with-files-in-python-92870e1a410a")

**File Handling in Python for AI Projects**
=====================================================

**Introduction**
---------------

File handling is a fundamental skill in AI development, and it's essential to understand the basics of working with different file formats such as `.txt`, `.csv`, `.json`, `.pickle`, and `.yaml`. In this article, we'll focus on using Python's built-in modules for file operations.

**Common File Modes**
---------------------

Python uses several file modes to open files:

*   `'r'`: Read (default)
*   `'w'`: Write
*   `'a'`: Append
*   `'b'`: Binary mode (Combine with other modes for binary files)
*   `'x'`: Exclusive creation (Create a file, fail if it exists)
*   `'+'`: Read and write combined

**Working with Text Files (.txt)**
----------------------------------

Text files are used for storing logs, plain data, or simple structured information.

*   Example: Reading stopwords from a file:
    python
import openpyxl as opx

with open('stopwords.txt', 'r') as file:
    stopwords = [line.strip() for line in file]

print("Stopwords:", stopwords)


**Working with CSV Files (.csv)**
------------------------------

CSV (Comma-Separated Values) is the most common format for storing tabular datasets.

*   Example: Loading a dataset from a CSV file:
    python
import csv

with open('iris.csv', 'r') as file:
    reader = csv.reader(file)
    header = next(reader)
    data = [row for row in reader]

print("Header:", header)
print("First Row:", data[0])


**Working with JSON Files (.json)**
----------------------------------

JSON (JavaScript Object Notation) is widely used for structured data and configurations.

*   Example: Loading model parameters from a JSON file:
    python
import json

with open('config.json', 'r') as file:
    config = json.load(file)

print("Config:", config)


**Working with Pickle Files (.pickle)**
-----------------------------------------

Pickle is Python's built-in module for serializing and deserializing objects.

*   Example: Saving a trained model using pickle:
    python
import pickle

from sklearn.linear_model import LogisticRegression

# Dummy model
model = LogisticRegression()

with open('model.pkl', 'wb') as file:
    pickle.dump(model, file)


**Working with YAML Files (.yaml)**
-----------------------------------

YAML (Yang's Notation) is human-readable and often used for configuration files in ML experiments.

*   Example: Loading experiment config from a YAML file:
    python
import yaml

with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)

print("Model:", config['model'])
print("Training Config:", config['training'])


**Best Practices for Safe File Handling**
-----------------------------------------------

*   Always use `with open(...)`: Automatically closes files, even if exceptions occur.
*   Handle exceptions (like `FileNotFoundError`, `PermissionError`): Make your code robust.
*   Check file existence using `os.path.exists()`.
*   Keep config/data separate from source code for clean project organization.

**Summary**
------------

File handling is an essential skill in AI development. This article covers common file modes, working with text files, CSV files, JSON files, pickle files, and YAML files using Python's built-in modules. By following best practices for safe file handling, you can ensure reliable performance with real-world data.

**File Handling in Python for AI Projects**
=====================================================

**Introduction**
---------------

File handling is a fundamental skill in AI development, and it's essential to understand the basics of working with different file formats such as `.txt`, `.csv`, `.json`, `.pickle`, and `.yaml`. In this article, we'll focus on using Python's built-in modules for file operations.

**Common File Modes**
---------------------

Python uses several file modes to open files:

*   `'r'`: Read (default)
*   `'w'`: Write
*   `'a'`: Append
*   `'b'`: Binary mode (Combine with other modes for binary files)
*   `'x'`: Exclusive creation (Create a file, fail if it exists)
*   `'+'`: Read and write combined

**Working with Text Files (.txt)**
----------------------------------

Text files are used for storing logs, plain data, or simple structured information.

*   Example: Reading stopwords from a file:
    python
import openpyxl as opx

with open('stopwords.txt', 'r') as file:
    stopwords = [line.strip() for line in file]

print("Stopwords:", stopwords)


**Working with CSV Files (.csv)**
------------------------------

CSV (Comma-Separated Values) is the most common format for storing tabular datasets.

*   Example: Loading a dataset from a CSV file:
    python
import csv

with open('iris.csv', 'r') as file:
    reader = csv.reader(file)
    header = next(reader)
    data = [row for row in reader]

print("Header:", header)
print("First Row:", data[0])


**Working with JSON Files (.json)**
----------------------------------

JSON (JavaScript Object Notation) is widely used for structured data and configurations.

*   Example: Loading model parameters from a JSON file:
    python
import json

with open('config.json', 'r') as file:
    config = json.load(file)

print("Config:", config)


**Working with Pickle Files (.pickle)**
-----------------------------------------

Pickle is Python's built-in module for serializing and deserializing objects.

*   Example: Saving a trained model using pickle:
    python
import pickle

from sklearn.linear_model import LogisticRegression

# Dummy model
model = LogisticRegression()

with open('model.pkl', 'wb') as file:
    pickle.dump(model, file)


**Working with YAML Files (.yaml)**
-----------------------------------

YAML (Yang's Notation) is human-readable and often used for configuration files in ML experiments.

*   Example: Loading experiment config from a YAML file:
    python
import yaml

with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)

print("Model:", config['model'])
print("Training Config:", config['training'])


**Best Practices for Safe File Handling**
-----------------------------------------------

*   Always use `with open(...)`: Automatically closes files, even if exceptions occur.
*   Handle exceptions (like `FileNotFoundError`, `PermissionError`): Make your code robust.
*   Check file existence using `os.path.exists()`.
*   Keep config/data separate from source code for clean project organization.

**Summary**
------------

File handling is an essential skill in AI development. This article covers common file modes, working with text files, CSV files, JSON files, pickle files, and YAML files using Python's built-in modules. By following best practices for safe file handling, you can ensure reliable performance with real-world data.