# 📄 Brochure Generator Using LLMs
This project builds an automated solution to generate a company brochure using real website data and OpenAI's GPT-4o-mini model and Ollama.
Business use case: Create engaging marketing brochures for potential clients, investors, and talent based on the company's public website.


# with OpenAI's GPT-4o-mini model

In [5]:
# === Imports ===
# Essential libraries for HTTP requests, HTML parsing, environment config, and OpenAI API interaction

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [34]:
# === Load environment variables ===
# Load OpenAI API key from .env file and validate it
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key) > 10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? ")

There might be a problem with your API key? 


In [32]:
# === Initialize model and OpenAI client ===
MODEL = 'gpt-4o-mini'
openai = OpenAI()

In [33]:
# === Website Representation Class ===
# Encapsulates logic for fetching and parsing a webpage. Extracts main content(no codes and images) and useful hyperlinks.

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    Represents a single website and provides methods to extract readable text and outbound links.
    """
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

# Example initialization
ed = Website("https://en.wikipedia.org/wiki/OpenAI")
ed.links

['#bodyContent',
 '/wiki/Main_Page',
 '/wiki/Wikipedia:Contents',
 '/wiki/Portal:Current_events',
 '/wiki/Special:Random',
 '/wiki/Wikipedia:About',
 '//en.wikipedia.org/wiki/Wikipedia:Contact_us',
 '/wiki/Help:Contents',
 '/wiki/Help:Introduction',
 '/wiki/Wikipedia:Community_portal',
 '/wiki/Special:RecentChanges',
 '/wiki/Wikipedia:File_upload_wizard',
 '/wiki/Special:SpecialPages',
 '/wiki/Main_Page',
 '/wiki/Special:Search',
 'https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en',
 '/w/index.php?title=Special:CreateAccount&returnto=OpenAI',
 '/w/index.php?title=Special:UserLogin&returnto=OpenAI',
 'https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en',
 '/w/index.php?title=Special:CreateAccount&returnto=OpenAI',
 '/w/index.php?title=Special:UserLogin&returnto=OpenAI',
 '/wiki/Help:Introduction',
 '/wiki/Special:MyContributions',
 '/wiki/Special:MyTalk',
 '#',
 '#History',


In [9]:
# === Step 1: Identify Relevant Pages for the Brochure ===
# Use LLM reasoning to classify useful links such as "About", "Careers", or "Company" pages.
# This step demonstrates the value of GPT models in semantic filtering tasks.

link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [11]:
# Create a user prompt dynamically using actual website links
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

print(get_links_user_prompt(ed))

Here is the list of links on the website of https://en.wikipedia.org/wiki/OpenAI - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
#bodyContent
/wiki/Main_Page
/wiki/Wikipedia:Contents
/wiki/Portal:Current_events
/wiki/Special:Random
/wiki/Wikipedia:About
//en.wikipedia.org/wiki/Wikipedia:Contact_us
/wiki/Help:Contents
/wiki/Help:Introduction
/wiki/Wikipedia:Community_portal
/wiki/Special:RecentChanges
/wiki/Wikipedia:File_upload_wizard
/wiki/Special:SpecialPages
/wiki/Main_Page
/wiki/Special:Search
https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en
/w/index.php?title=Special:CreateAccount&returnto=OpenAI
/w/index.php?title=Special:UserLogin&returnto=OpenAI
https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&u

In [13]:
# Query the LLM to extract and return only the relevant brochure links
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [None]:
# Run the function on another site (e.g. HuggingFace)
huggingface = Website("https://huggingface.co")
huggingface.links

get_links("https://huggingface.co")

In [None]:
# === Step 2: Compile Page Contents for Brochure Input ===
# Fetch main content from the homepage and any relevant subpages, to prepare an informative input for brochure generation.

def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

print(get_all_details("https://huggingface.co"))

In [None]:
# Define the system prompt for brochure generation (can be customized for tone and formality)
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Generate user prompt combining homepage and key subpages
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000]  # Truncate prompt if needed
    return user_prompt

get_brochure_user_prompt("HuggingFace", "https://huggingface.co")


In [None]:
# === Step 3: Generate the Brochure ===
# Uses OpenAI's GPT model to synthesize structured business content in Markdown format

def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

create_brochure("HuggingFace", "https://huggingface.co")

# === Optional Enhancement: Streamed Response Output ===
# Streams the brochure content to the user with a live typewriter-style animation

def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```", "").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

# Stream brochure generation for HuggingFace
stream_brochure("HuggingFace", "https://huggingface.co")

# === Tip ===
# Try switching the system prompt to a humorous tone to show how easily the model adapts style.

# with LLaMA 3.2

In [64]:
import subprocess
import json
import requests
from bs4 import BeautifulSoup

# -- headers
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

# -- Website class
class Website:
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text[:2000]}\n\n"  # limit for LLMs

# -- Prompts
link_system_prompt = """You are provided with a list of links found on a webpage. 
You are able to decide which of the links would be most relevant to include in a brochure about the company, 
such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

system_prompt = """You are an assistant that analyzes the contents of several relevant pages from a company website
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.
Include details of company culture, customers and careers/jobs if you have the information.
"""

# -- Call LLaMA 3.2 via ollama
def call_llama32(system_prompt, user_prompt):
    prompt = f"<|system|>\n{system_prompt}\n<|user|>\n{user_prompt}\n"
    result = subprocess.run(
        ["ollama", "run", "llama3.2"],
        input=prompt,
        capture_output=True,
        text=True
    )
    return result.stdout.strip()

# -- Build user prompts
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url}. Please decide which are relevant:\n"
    full_links = [link if link.startswith("http") else website.url.rstrip("/") + "/" + link.lstrip("/") for link in website.links]
    user_prompt += "\n".join(full_links)
    return user_prompt

def get_links(url):
    website = Website(url)
    user_prompt = get_links_user_prompt(website)
    response = call_llama32(link_system_prompt, user_prompt)
    try:
        return json.loads(response)
    except Exception as e:
        print("⚠️ Error parsing JSON from LLaMA:", e)
        print("Raw response:\n", response)
        return {"links": []}

def get_all_details(url):
    result = Website(url).get_contents()
    links_info = get_links(url)
    for link in links_info.get("links", []):
        result += f"\n\n{link.get('type', 'Page')}\n"
        result += Website(link["url"]).get_contents()
    return result

def create_brochure(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages:\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:8000]  # Truncate if necessary
    response = call_llama32(system_prompt, user_prompt)
    print("\n--- Generated Brochure ---\n")
    print(response)

# -- Run
create_brochure("HuggingFace", "https://huggingface.co")



--- Generated Brochure ---

# Hugging Face Brochure

## About Us

Hugging Face is the AI community building the future. Our platform is where machine learning professionals collaborate on models, datasets, and applications. We provide a collaboration environment that enables users to create, discover, and share knowledge in the field of artificial intelligence.

### Company Culture

At Hugging Face, we value innovation, community, and transparency. We believe that open-source technology can democratize access to AI and drive progress in various fields. Our team is comprised of passionate individuals who strive to make a positive impact through our work.

### Customers

We serve a diverse range of customers, including:

* Machine learning researchers
* Developers building AI applications
* Data scientists working on large-scale projects
* Students looking for resources to learn and experiment with AI

Our platform offers a wide range of tools and models that cater to the needs of these

In [70]:
import time
from IPython.display import Markdown, display, update_display

def stream_brochure(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages:\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:8000]

    full_response = call_llama32(system_prompt, user_prompt)

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for char in full_response:
        response += char
        if char in [".", "\n"]:
            update_display(Markdown(response), display_id=display_handle.display_id)
            time.sleep(0.05)
    update_display(Markdown(response), display_id=display_handle.display_id)

stream_brochure("HuggingFace", "https://huggingface.co")


**Welcome to Hugging Face: The AI Community Building the Future**

Hugging Face is a platform that brings together the machine learning community to collaborate on models, datasets, and applications. Our mission is to accelerate the development of artificial intelligence by providing a collaborative environment for researchers, developers, and innovators.

**Our Mission**

At Hugging Face, we believe that the future of AI depends on collaboration, openness, and accessibility. That's why we've created a platform that allows users to create, discover, and collaborate on ML models, datasets, and applications. Our goal is to make AI more accessible and user-friendly, empowering individuals and organizations to build innovative solutions.

**Our Community**

Hugging Face has a vibrant community of researchers, developers, and innovators from around the world. Our community is driven by a shared passion for AI and a commitment to collaboration, openness, and innovation. We believe that our community is the key to unlocking the full potential of AI.

**What We Offer**

* **Models**: We offer a vast library of pre-trained models, including state-of-the-art models in natural language processing, computer vision, and other areas.
* **Datasets**: Our dataset repository contains millions of datasets, making it easy for users to find and utilize high-quality data for their AI projects.
* **Spaces**: Hugging Face Spaces allows users to host and collaborate on unlimited public models, datasets, and applications.
* **Compute**: We provide paid Compute solutions that enable users to deploy and run AI models quickly and efficiently.

**Our Values**

At Hugging Face, we value:

* **Collaboration**: We believe that collaboration is key to unlocking the full potential of AI.
* **Openness**: We're committed to openness and accessibility in all aspects of our platform.
* **Innovation**: We encourage innovation and experimentation, empowering users to build innovative solutions.

**Join Our Community**

If you're passionate about AI and want to be part of a vibrant community of innovators, join us today! Sign up for our platform, explore our models and datasets, and start building your own AI projects.