In [1]:
# A function to display this nicely in the Jupyter output, using markdown
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI
from openai import OpenAI

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation-related. \
Respond in markdown."

# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
    please provide a short summary of this website in markdown. \
    If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## API from local model running on Ollama 
ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

# And now: call the local Llama model.
MODEL = "llama3.2"
def summarize(url, MODEL):
    website = Website(url)
    response = ollama_via_openai.chat.completions.create(
        model = MODEL,
        messages = messages_for(website)
    )
    return response.choices[0].message.content

# Function to call by providing URL
def display_summary(url, Model):
    summary = summarize(url)
    display(Markdown(summary))

### Call function to summarize a webpage
display_summary("https://medium.com/%40tabers77/llm-fine-tuning-in-2025-a-hands-on-test-driven-blueprint-dd1c7887bb99", MODEL)

The text is a tutorial on fine-tuning language models for specific tasks, such as preference tuning, using tools like LoRA (Low-Rank Adaptation) and DPO (Diversity-Promoting Objective). Here's a summary of the key points:

**Introduction**

* Fine-tuning allows for persistent changes in model behavior, such as strict output formats, consistent tone, or firm refusal policies.
* The goal is to ship models that are safer and more reliable without needing a massive compute cluster.

**Choosing Fine-Tuning Methods**

* Pick the fine-tuning method based on your compute budget:
	+ LoRA: suitable for small to medium-sized models and GPUs with 24 GB memory.
	+ QLoRA: suitable for larger models and GPUs with 12 GB or more memory.
	+ Full fine-tune: suitable for large models and deep learning frameworks.

**Tools and Requirements**

* PeFT adapter (tiny): a lightweight, drop-in alternative to DPO.
* LoRA merge shocks (sometimes merged weights drift format adherence).
* DPO/ORPO (Optimization and Reference Point Optimization): helps with preferences, safety, and consistent tone.
* SFT (Self-Tuning Framework) + TDF (Tests-Driven Fine-tuning) harness: for quick fine-tuning iterations.

**Experiment Plan**

* Curate 300-1,000 SFT exemplars for your task. Template them.
* Train SFT for 1-2 epochs. Save adapters every ~200 steps. Run the TDF harness each time.
* Create 200-500 preference pairs (prompt + chosen vs rejected). Start with SFT outputs + human edits to bootstrap.
* Try different LoRA ranks and max sequence lengths.
* Record wall clock and pass rates.

**Common Pitfalls**

* Wrong chat template: always render data with the exact tokenizer template used at inference.
* Data leakage: keep eval prompts disjoint from train; don't mine web Q&A wholesale and report rosy scores.
* Over-regularization in DPO: reduce beta or add fresh SFT steps if outputs become terse.

**Final Thoughts**

* Think of fine-tuning like software engineering: write tests, make small, controlled changes, measure results, and keep good notes.
* Know when to not fine-tune. Use RAG (Retrieval-Augmented Generalist) for flexibility, but when you need persistent changes, fine-tune model weights is the better fit.

The text concludes by thanking the reader for their attention and inviting them to reach out with questions or suggestions on LinkedIn.