# Module 1: Course Introduction & Local Setup
## Lesson 6: Building a Website Summarizer

### üìÑ Overview
In this lesson, we combine all previous concepts‚Äîweb scraping, system prompting, and API calls‚Äîto build a functioning application. We create a reusable pipeline that takes a URL, extracts its text, and uses GPT-4o-mini to generate a summary in a specific tone (e.g., "Snarky" or "Professional").

### üóùÔ∏è Key Concepts
* **Prompt Construction Functions**: Instead of hardcoding prompts, we write Python functions (`messages_for`) to dynamically build the context window based on input data.
* **Tone Engineering**: Modifying the `system` role to drastically change the output style without changing the underlying data.
* **Server-Side Scraping**: The lesson uses a basic `requests` or `BeautifulSoup` approach.
    * *Limitation:* This only works for static HTML. It fails on Single Page Applications (SPAs) like React/Vue sites that require JavaScript rendering.

### üõ†Ô∏è Technical Implementation: The Pipeline
The architecture consists of three stages:
1.  **Fetch**: Get raw text from URL.
2.  **Construct**: Format text into User/System messages.
3.  **Inference**: Send to OpenAI.

In [None]:
import requests
from bs4 import BeautifulSoup
from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# --- Helper 1: The Scraper ---
def fetch_website_contents(url):
    """
    A basic scraper that fetches HTML and strips tags.
    Note: Fails on JS-heavy sites (use Selenium/Playwright for those).
    """
    try:
        response = requests.get(url, timeout=10)
        soup = BeautifulSoup(response.content, 'html.parser')
        # Get text and clean up whitespace
        text = soup.get_text(separator=' ', strip=True)
        # TRUNCATE to avoid blowing up the context window (token limit)
        return text[:10000] 
    except Exception as e:
        return f"Error fetching {url}: {e}"

# --- Helper 2: The Prompt Builder ---
def messages_for(website_text, tone="professional"):
    """
    Dynamically builds the prompt based on the desired tone.
    """
    system_prompt = (
        "You are an assistant that analyzes the contents of a website and provides a short summary. "
        "Ignore navigation text or cookies/ads."
    )
    
    if tone == "snarky":
        system_prompt = (
            "You are a snarky, sarcastic assistant. "
            "Roast the website while summarizing it. Make fun of their marketing buzzwords."
        )
    elif tone == "pirate":
        system_prompt = "You are a pirate captain. Summarize this in sea-speak."

    user_prompt = f"Here is the website text:\n\n{website_text}"

    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]

# --- Main Function ---
def summarize(url, tone="professional"):
    print(f"üåç Fetching {url}...")
    text = fetch_website_contents(url)
    
    print(f"ü§ñ Summarizing (Tone: {tone})...")
    messages = messages_for(text, tone)
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    
    return response.choices[0].message.content

# --- Execution ---
url_to_test = "https://example.com" # Replace with a real news site
print(summarize(url_to_test, tone="snarky"))

### üß™ Lab Notes & Engineering Log

#### Experiment 1: Tone Consistency
**Objective:** See if the "Snarky" persona survives long documents.
**Test:**
* I fed it a serious financial report.
* **Result:** It started snarky but drifted back to serious tone by the end.
* **Insight:** For long contexts, you often need to reiterate the persona at the *end* of the prompt as well: *"Remember to stay snarky!"*
