# AI-powered Brochure Generator
---
- 🌍 Task: Generate a company brochure using its name and website for clients, investors, and recruits.
- 🧠 Model: Toggle `USE_OPENAI` to switch between OpenAI and Ollama models
- 🕵️‍♂️ Data Extraction: Scraping website content and filtering key links (About, Products, Careers, Contact).
- 📌 Output Format: a Markdown-formatted brochure streamed in real-time.
- 🚀 Tools: BeautifulSoup, OpenAI API, and IPython display, ollama.
- 🧑‍💻 Skill Level: Intermediate.

🛠️ Requirements
- ⚙️ Hardware: ✅ CPU is sufficient — no GPU required
- 🔑 OpenAI API Key 
- Install Ollama and pull llama3.2:3b or another lightweight model
---
📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)

## 🧩 System Design Overview

### Class Structure

![](https://github.com/lisekarimi/lexo/blob/main/assets/02_brochure_class_diagram.png?raw=true)

This code consists of three main classes:

1. **`Website`**:  
   - Scrapes and processes webpage content.  
   - Extracts **text** and **links** from a given URL.  

2. **`LLMClient`**:  
   - Handles interactions with **OpenAI or Ollama (`llama3`, `deepseek`, `qwen`)**.  
   - Uses `get_relevant_links()` to filter webpage links.  
   - Uses `generate_brochure()` to create and stream a Markdown-formatted brochure.  

3. **`BrochureGenerator`**:  
   - Uses `Website` to scrape the main webpage and relevant links.  
   - Uses `LLMClient` to filter relevant links and generate a brochure.  
   - Calls `generate()` to run the entire process.

### Workflow

1. **`main()`** initializes `BrochureGenerator` and calls `generate()`.  
2. **`generate()`** calls **`LLMClient.get_relevant_links()`** to extract relevant links using **LLM (OpenAI/Ollama)**.  
3. **`Website` scrapes the webpage**, extracting **text and links** from the given URL.  
4. **Relevant links are re-scraped** using `Website` to collect additional content.  
5. **All collected content is passed to `LLMClient.generate_brochure()`**.  
6. **`LLMClient` streams the generated brochure** using **OpenAI or Ollama**.  
7. **The final brochure is displayed in Markdown format.**

![](https://github.com/lisekarimi/lexo/blob/main/assets/02_brochure_process.png?raw=true)


### Intermediate reasoning

In this workflow, we have intermediate reasoning because the LLM is called twice:

1. **First LLM call**: Takes raw links → filters/selects relevant ones (reasoning step).
2. **Second LLM call**: Takes selected content → generates final brochure.

🧠 **LLM output becomes LLM input** — that’s intermediate reasoning.

![](https://github.com/lisekarimi/lexo/blob/main/assets/02_llm_intermd_reasoning.png?raw=true)

## 📦 Import Libraries

In [None]:
import os
import requests
import json
import ollama
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import display, Markdown, update_display
from openai import OpenAI

## 🧠 Define the Model

The user can switch between OpenAI and Ollama by changing a single variable (`USE_OPENAI`). The model selection is dynamic.

In [None]:
# Load API key
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
if not api_key or not api_key.startswith('sk-'):
    raise ValueError("Invalid OpenAI API key. Check your .env file.")

# Define the model dynamically
USE_OPENAI = True  # True to use openai and False to use Ollama
MODEL = 'gpt-4o-mini' if USE_OPENAI else 'llama3.2:3b'

openai_client = OpenAI() if USE_OPENAI else None

## 🏗️ Define Classes

In [None]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to scrape and process website content.
    """
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        self.text = self.extract_text(soup)
        self.links = self.extract_links(soup)

    def extract_text(self, soup):
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            return soup.body.get_text(separator="\n", strip=True)
        return ""

    def extract_links(self, soup):
        links = [link.get('href') for link in soup.find_all('a')]
        return [link for link in links if link and 'http' in link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [None]:
class LLMClient:
    def __init__(self, model=MODEL):
        self.model = model

    def get_relevant_links(self, website):
        link_system_prompt = """
        You are given a list of links from a company website.
        Select only relevant links for a brochure (About, Company, Careers, Products, Contact).
        Exclude login, terms, privacy, and emails.

        ### **Instructions**
        - Return **only valid JSON**.
        - **Do not** include explanations, comments, or Markdown.
        - Example output:
        {
            "links": [
                {"type": "about", "url": "https://company.com/about"},
                {"type": "contact", "url": "https://company.com/contact"},
                {"type": "product", "url": "https://company.com/products"}
            ]
        }
        """

        user_prompt = f"""
        Here is the list of links on the website of {website.url}:
        Please identify the relevant web links for a company brochure. Respond in JSON format.
        Do not include login, terms of service, privacy, or email links.
        Links (some might be relative links):
        {', '.join(website.links)}
        """

        if USE_OPENAI:
            response = openai_client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": link_system_prompt},
                    {"role": "user", "content": user_prompt}
                ]
            )
            return json.loads(response.choices[0].message.content.strip())
        else:
            response = ollama.chat(
                model=self.model,
                messages=[
                    {"role": "system", "content": link_system_prompt},
                    {"role": "user", "content": user_prompt}
                ]
            )
            result = response.get("message", {}).get("content", "").strip()
            try:
                return json.loads(result)  # Attempt to parse JSON
            except json.JSONDecodeError:
                print("Error: Response is not valid JSON")
                return {"links": []}  # Return empty list if parsing fails


    def generate_brochure(self, company_name, content, language):
        system_prompt = """
        You are a professional translator and writer who creates fun and engaging brochures.
        Your task is to read content from a company’s website and write a short, humorous, joky,
        and entertaining brochure for potential customers, investors, and job seekers.
        Include details about the company’s culture, customers, and career opportunities if available.
        Respond in Markdown format.
        """

        user_prompt = f"""
        Create a fun brochure for '{company_name}' using the following content:
        {content[:5000]}
        Respond in {language} only, and format your response correctly in Markdown.
        Do NOT escape characters or return extra backslashes.
        """

        if USE_OPENAI:
            response_stream = openai_client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                stream=True
            )
            response = ""
            display_handle = display(Markdown(""), display_id=True)
            for chunk in response_stream:
                response += chunk.choices[0].delta.content or ''
                response = response.replace("```","").replace("markdown", "")
                update_display(Markdown(response), display_id=display_handle.display_id)
        else:
            response_stream = ollama.chat(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                stream=True
            )
            display_handle = display(Markdown(""), display_id=True)
            full_text = ""
            for chunk in response_stream:
                if "message" in chunk:
                        content = chunk["message"]["content"] or ""
                        full_text += content
                        update_display(Markdown(full_text), display_id=display_handle.display_id)


In [None]:
class BrochureGenerator:
    """
    Main class to generate a company brochure.
    """
    def __init__(self, company_name, url, language='English'):
        self.company_name = company_name
        self.url = url
        self.language = language
        self.website = Website(url)
        self.llm_client = LLMClient()

    def generate(self):
        links = self.llm_client.get_relevant_links(self.website)
        content = self.website.get_contents()

        for link in links['links']:
            linked_website = Website(link['url'])
            content += f"\n\n{link['type']}:\n"
            content += linked_website.get_contents()

        self.llm_client.generate_brochure(self.company_name, content, self.language)


## 📝 Generate Brochure

In [None]:
def main():
    company_name = "Tour Eiffel"
    url = "https://www.toureiffel.paris/fr"
    language = "French"

    generator = BrochureGenerator(company_name, url, language)
    generator.generate()

if __name__ == "__main__":
    main()