# BrochureBuilder: AI-Driven Company Brochure Generator

## App Overview  
BrochureBuilder ingests a company name and its primary website, then automatically produces a polished, brand-consistent brochure in multiple formats (PDF, PPTX, web).

## Key Capabilities  
- **Automated Content Extraction**  
  • Parses web pages to identify core information (mission, products, services, leadership bios)  
  • Summarizes key messages and value propositions  

- **Template-Driven Design**  
  • Applies industry-standard layout and styling based on sector or brand identity  
  • Offers a library of customizable color schemes, fonts, and imagery  

- **Multi-Format Export**  
  • Generates print-ready PDFs, editable slide decks (PPTX), and responsive web previews  
  • Enables batch-generation for multiple subsidiaries or product lines  

- **Dynamic Refresh**  
  • Monitors source website for updates and regenerates brochures automatically  
  • Ensures collateral remains accurate without manual intervention  

## Business Use Cases  
- **Sales Enablement**  
  Rapid creation of tailored brochures for client pitches and proposal attachments.  
- **Investor Relations**  
  Consistent, up-to-date briefing materials for funding rounds and board meetings.  
- **Talent Acquisition**  
  Branded informational kits that highlight company culture, values, and benefits.  
- **Marketing Campaigns**  
  Quick-turn collateral for product launches, trade shows, and digital promotions.  

## Value Proposition  
BrochureBuilder cuts brochure-production time by over 70%, enforces brand consistency across all teams, and eliminates the manual effort of keeping printed materials current.  


In [3]:
# Import necessary libraries
import os, requests, json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import display, Markdown, update_display
from openai import OpenAI

In [5]:
# Initialize

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key successfully loaded!")
else:
    print("Failed to load.")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key successfully loaded!


In [6]:
# Webpage class to scrape and process HTML content

headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [11]:
ex = Website("https://trentlimited.com/pages/about-us")
ex.links

['#MainContent',
 '/cart',
 '/',
 '/pages/copy-of-about-us2-0',
 '/pages/board-of-directors',
 '/pages/composition-of-committees',
 '/pages/subsidiaries-associates-and-jvs-new',
 '/pages/about-tata-group-1',
 '/pages/rewards-recognition-new',
 '/pages/fashion-lifestyle',
 '/pages/food-grocery-new',
 '/pages/annual-reports',
 '/pages/financial-information',
 '/pages/corporate-governance-main',
 '/pages/board-meeting-intimation',
 '/pages/agm-documents',
 '/pages/postal-ballot',
 '/pages/policies',
 '/pages/csr-philosophy',
 '/pages/iic',
 '/pages/overview-new',
 '/pages/bag-of-love-1',
 '/pages/live-good-1',
 'https://docs.trent-tata.com/Sustainability-Report.pdf',
 'https://trentlimited.com/pages/careers',
 '/pages/press-releases',
 '/pages/media-gallery',
 '/pages/contact',
 '/cart',
 '/pages/copy-of-about-us2-0',
 '/pages/board-of-directors',
 '/pages/composition-of-committees',
 '/pages/subsidiaries-associates-and-jvs-new',
 '/pages/about-tata-group-1',
 '/pages/rewards-recognition-

In [12]:
ex.url

'https://trentlimited.com/pages/about-us'

In [13]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""


print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [15]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt


print(get_links_user_prompt(ex))

Here is the list of links on the website of https://trentlimited.com/pages/about-us - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
#MainContent
/cart
/
/pages/copy-of-about-us2-0
/pages/board-of-directors
/pages/composition-of-committees
/pages/subsidiaries-associates-and-jvs-new
/pages/about-tata-group-1
/pages/rewards-recognition-new
/pages/fashion-lifestyle
/pages/food-grocery-new
/pages/annual-reports
/pages/financial-information
/pages/corporate-governance-main
/pages/board-meeting-intimation
/pages/agm-documents
/pages/postal-ballot
/pages/policies
/pages/csr-philosophy
/pages/iic
/pages/overview-new
/pages/bag-of-love-1
/pages/live-good-1
https://docs.trent-tata.com/Sustainability-Report.pdf
https://trentlimited.com/pages/careers
/pages/press-releases
/pages/media-gallery
/pages/contact
/cart
/pages/

In [16]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)



    # Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/nanonets/Nanonets-OCR-s',
 '/google/magenta-realtime',
 '/mistralai/Mistral-Small-3.2-24B-Instruct-2506',
 '/MiniMaxAI/MiniMax-M1-80k',
 '/OmniGen2/OmniGen2',
 '/models',
 '/spaces/ilcve21/Sparc3D',
 '/spaces/enzostvs/deepsite',
 '/spaces/tencent/Hunyuan3D-2.1',
 '/spaces/OmniGen2/OmniGen2',
 '/spaces/multimodalart/self-forcing',
 '/spaces',
 '/datasets/EssentialAI/essential-web-v1.0',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/institutional/institutional-books-1.0',
 '/datasets/nvidia/AceReason-1.1-SFT',
 '/datasets/nvidia/OpenScience',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',
 '/docs/transforme

In [17]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'community page', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

In [18]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result


print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
nanonets/Nanonets-OCR-s
Updated
5 days ago
•
177k
•
1.17k
google/magenta-realtime
Updated
3 days ago
•
316
mistralai/Mistral-Small-3.2-24B-Instruct-2506


In [19]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."



def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt


In [20]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nnanonets/Nanonets-OCR-s\nUpdated\n5 days ago\n•\n177k\n•\n1.17k\ngoogle/magenta-realtime\nUpdated\n3 days ago\n•\n316\nmistralai/Mistral-Small-3.2-24B-Instruct-2506\nUpdated\n4 days ago\n•\n5.37k\n•\n241\nMiniMaxAI/MiniMax-M1-80k\nUpdated\n1 day ago\n•\n10.4k\n•\n580\nOmniGen2/OmniGen2\nUpdated\n2 days ago\n•\n1.15k\n•\n157\nBrowse 1M+ models\nSpaces\nRunning\n827\n827\nSparc3D\n🏃\nNext-Ge

In [21]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))


create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/about'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://huggingface.co'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'official Twitter', 'url': 'https://twitter.com/huggingface'}, {'type': 'official LinkedIn', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Company Brochure

## Who We Are
Hugging Face is a leading AI community dedicated to building the future of machine learning. Our platform is a hub for collaboration, where the machine learning community works together on innovative models, datasets, and applications. With over 1 million models and 250,000 datasets, we facilitate progress in AI and machine learning.

## Our Vision
We aim to revolutionize machine learning by providing accessible tools and resources, enhance collaboration among developers, and promote transparency through open-source contributions.

## What We Offer
- **Models**: Explore and utilize over 1 million pre-trained models across diverse AI fields including text, images, audio, and 3D.
- **Datasets**: Access a comprehensive collection of more than 250,000 datasets tailored for specific machine learning tasks.
- **Spaces**: Engage with and deploy applications seamlessly through our robust platform.
- **Enterprise Solutions**: We provide advanced services with enterprise-grade security, dedicated support, and customizable access controls.

## Customer Base
Hugging Face caters to over 50,000 organizations including esteemed names such as:
- **Google**
- **Amazon**
- **Microsoft**
- **Grammarly**
- **Meta**

These organizations leverage our infrastructure to enhance their AI initiatives and services.

## Company Culture
At Hugging Face, we thrive on collaboration and interdisciplinary teamwork. Our environment fosters innovation, creativity, and mutual support among talented professionals. We believe in open-source principles, empowering everyone to share and learn freely.

### Community Engagement
We actively encourage community participation through forums, events, and open-source projects. Our dedicated community is essential to our growth and innovation, and we value every contribution. 

## Careers at Hugging Face
Are you passionate about artificial intelligence and machine learning? Hugging Face is always on the lookout for talented individuals who want to shape the future of AI. Join us to work in an inclusive and dynamic environment that values creativity and collaboration.

### Current Openings
Explore our latest job offerings [here](#). We are seeking candidates who are eager to learn and share their knowledge.

## Join Us
Be a part of the Hugging Face community and help us pave the way for the future of AI. Whether you are a developer, a researcher, or someone curious about AI, there’s a place for you here.

Visit us at [Hugging Face Website](https://huggingface.co) to explore more about our offerings and community initiatives. 

---

*Hugging Face – The AI community building the future.*

In [22]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)



stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}


# Hugging Face Company Brochure

## Welcome to Hugging Face

**The AI Community Building the Future**

At Hugging Face, we are dedicated to fostering a collaborative platform for the machine learning community, providing tools, models, datasets, and applications to help advance the field of artificial intelligence. Join us on our mission to democratize high-quality machine learning for everyone.

---

## What We Offer

### **Models**
- Access **1M+ models** ranging from text to image and more.
- Browse trending models like Nanonets-OCR and google/magenta-realtime which are regularly updated to ensure optimal performance.

### **Datasets**
- Utilize our extensive library of **250k+ datasets** which support various machine learning tasks.
- Easily find resources for your projects and research through our curated collections.

### **Spaces & Applications**
- Explore thousands of applications under our **Spaces**, such as real-time video generation and high-resolution 3D model generation.
- Unleash creativity with tools like DeepSite v2 and OmniGen2, designed for modern AI needs.

### **Enterprise Solutions**
- Tailored enterprise offerings with security, access controls, and dedicated support starting at $20/user/month.
- Our paid compute solutions provide optimized inference endpoints for seamless model deployment.

---

## Company Culture

At Hugging Face, we pride ourselves on creating an inclusive and vibrant community. Our approach is firmly rooted in **collaboration**, **open-source** values, and a passion for machine learning. We encourage innovation and knowledge-sharing, enabling individuals and organizations to thrive together. 

### **Our Community**
- Join over **50,000 organizations** leveraging Hugging Face for their AI needs, including industry leaders like Google, Microsoft, Amazon, and Grammarly.
- Engage with our active community through forums and collaborative activities that foster knowledge and skill-sharing.

---

## Careers at Hugging Face

### **Join Our Team**
We are always on the lookout for passionate individuals who want to make a difference in the realm of AI and machine learning. Working at Hugging Face means joining a diverse team that values creativity, collaboration, and continuous learning. 

- **Roles Available:** Software Engineers, Data Scientists, Community Managers, and more.
- **Why Choose Us:** Opportunity to impact the future of AI while participating in an innovative and flexible work environment.

----

## Get Involved

- **Explore** our [website](https://huggingface.co) to learn about our latest models and tools.
- **Follow us** on social media platforms like GitHub, Twitter, and LinkedIn for updates and community discussions.
- **Join** our mission by signing up today and start your journey into the world of AI and machine learning!

---

Hugging Face – where the future of AI is being collaboratively built, one model at a time.