# Brochure Generator - A fully funcation business solution

### Business Requirement:

* Develope an application that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.
* We will be provided a company name and their primary website.

# Initial setup of imports

In [None]:
import os
from dotenv import load_dotenv

import json
from IPython.display import Markdown, display, update_display

from openai import OpenAI

# Local module imports
import scraper

# Load env properties

In [None]:
load_dotenv()

# check api key
api_key = os.getenv("OPENAI_API_KEY")
gemini_key = os.getenv("GEMINI_API_KEY")

if(api_key and gemini_key and len(api_key) > 10 and len(gemini_key) > 10):
    print("API key look good.")
    print(f"Gemini Key: {gemini_key}, {'\n'}OpenAI Key: {api_key}.")
else:
    print("No API key found. Please set OPENAI_API_KEY or GEMINI_API_KEY in your .env file.")

Construct client

In [None]:
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL")

# Construct client
ollama_client = OpenAI(api_key=api_key, base_url=OLLAMA_BASE_URL)

Test the scraper to load links

In [None]:
links = scraper.fetch_website_links("https://edwarddonner.com")

# Print all extracted links
links

## First step: Use the LLM Models to figure out relevant links
* The LLM should analyse all the extractd links from website and replace relative links such as "/products" with "https://companydomain.com/products"
* In this app we'll use `One shot prompting` where we'll provide and example that how it should respond in the prompt.
* It's an excellent example of LLM use case, because it requires naunced understanding. Imagine the level of work required if we have to write application to parse and analyse all the links manually. It will be very difficult.

>**Note:** *There ia a more advance technique called `Structured Outputs` in which we require the model to response acording to a spec. Which we'll be discussing in upcoming sessions.*

# Write System and user prompts

In [None]:
LINK_ANALYSIS_SYSTEM_PROMPT = """
You are a helpful assistant that helps to analyse website links and convert relative links to absolute links based on the main domain.
You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About 
page, or a Company page, or Career/Jobs page.

You should respond in JSON format with the following structure:
{
  "relevant_links": [
    {"name": "about page",, url": "https://companydomain.com/about", "reason": "This page tells about the company mission and values."},
    {"name": "careers page", "url": "https://companydomain.com/careers", "reason": "This page provides information about job opportunities."}
  ]
}
"""

In [None]:
# Build user prompt by including all the links extracted from the website
def get_links_user_prompt(url):
    user_prompt = f"""
    Here is a list of links extracted from the website {url}:
    Please analyse and decide which of these links are relevant web links for a brochure about the company.
    Replace any relative links with absolute links based on the main domain {url}.
    Do not include Terms of Service, Privacy Policy, Cookie Policy, or email links.

    # Links (some might be relative links):
    """

    links = scraper.fetch_website_links(url)
    user_prompt += "\n".join(links)
    return user_prompt

Test above user_prompt API

In [None]:
print(get_links_user_prompt("https://edwarddonner.com"))

Function to interate LLM and analyse for relative links for brochure

In [None]:
# Integrate with LLM model to select relevant links
def select_relevant_links_ollama(ollama_model, url):
    print(f"Using model: {ollama_model}")
    payload = [
        {"role": "system", "content": LINK_ANALYSIS_SYSTEM_PROMPT},
        {"role": "user", "content": get_links_user_prompt(url)}
    ]

    # Instruct model to respond in JSON format
    json_resp_format = {"type": "json_object"}

    response = ollama_client.chat.completions.create(model=ollama_model, messages=payload, response_format=json_resp_format)
    result = response.choices[0].message.content

    links = json.loads(result)

    return links


Call above API to get the relevant links

In [None]:
LLAMA3_3B_MODEL_KEY = "LLAMA3_3B"
ollama_model = os.getenv(LLAMA3_3B_MODEL_KEY)

if(not ollama_model):
    print(f"No model defined with name {LLAMA3_3B_MODEL_KEY} in .env file.")

In [30]:
select_relevant_links_ollama(ollama_model, "https://edwarddonner.com")

{'relevant_links': [{'name': 'Home page',
   'url': 'https://edwarddonner.com/',
   'reason': 'This is the main homepage of the company.'},
  {'name': 'About me and About Nebula',
   'url': 'https://edwarddonner.com/about-me-and-about-nebula/',
   'reason': 'This page tells about the author and his projects.'},
  {'name': 'Connect our four games',
   'url': 'https://edwarddonner.com/connect-four/',
   'reason': "This page provides information about one of the company's four games."},
  {'name': 'Expert advice',
   'url': 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
   'reason': 'This page shares insights on AI in production.'},
  {'name': 'Learning resources',
   'url': 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
   'reason': 'This page provides information about online courses.'},
  {'name': 'Industry news and updates',
   'url': 'https://www.prnewswire.com/news-releases/wynden-stark-gr

# Test above websites

## Observation
* It's halucinating, appoitment, feature and freemium links are embedded in one single link, creating and invalid URL

In [31]:
select_relevant_links_ollama(ollama_model, "https://nebula.io/")

Using model: llama3.2:3b


{'relevant_links': [{'name': 'Meet the Team',
   'url': 'https://nebula.io/meet-the-team',
   'reason': 'This page provides information about the company people.'},
  {'name': 'Contact Us',
   'url': 'https://nebula.io/contact',
   'reason': 'This page provides information on how to get in touch with the company.'},
  {'name': 'Support Resources',
   'url': 'https://nebula.io/resources-1',
   'reason': 'These resources provide assistance and support for customers.'},
  {'name': 'Frequently Asked Questions',
   'url': 'https://nebula.io/frequently-asked-questions',
   'reason': 'This page answers common questions about the company or its services.'}]}

Another function to connect with Any Frontier Model 

In [32]:
# Integrate with Any LLM model to select relevant links
def select_relevant_links(base_url, model_name, api_key, website_url):
    
    client = OpenAI(api_key=api_key, base_url=base_url)
    print(f"Using model: {model_name}")

    payload = [
        {"role": "system", "content": LINK_ANALYSIS_SYSTEM_PROMPT},
        {"role": "user", "content": get_links_user_prompt(website_url)}
    ]

    # Instruct model to respond in JSON format
    json_resp_format = {"type": "json_object"}

    response = client.chat.completions.create(model=model_name, messages=payload, response_format=json_resp_format)
    result = response.choices[0].message.content

    links = json.loads(result)

    return links


Connect with Gemini

In [33]:
base_url = os.getenv("GEMINI_BASE_URL")
gemini_model = os.getenv("GEMINI_FM")
gemini_api_key = os.getenv("GEMINI_API_KEY")
website_url = "https://nebula.io/"

select_relevant_links(base_url=base_url, model_name=gemini_model, api_key=gemini_api_key, website_url=website_url)

Using model: gemini-2.5-flash


{'relevant_links': [{'name': 'homepage',
   'url': 'https://nebula.io/',
   'reason': 'Provides an initial overview of the company and its main services.'},
  {'name': 'features page',
   'url': 'https://nebula.io/features',
   'reason': 'Highlights the core functionalities and services offered by the company.'},
  {'name': 'resources page',
   'url': 'https://nebula.io/resources',
   'reason': "Offers valuable information, insights, or support related to the company's products/services."},
  {'name': 'frequently asked questions page',
   'url': 'https://nebula.io/frequently-asked-questions',
   'reason': 'Addresses common questions about the company, its offerings, and general operations.'},
  {'name': 'contact page',
   'url': 'https://nebula.io/contact',
   'reason': 'Provides essential contact information for inquiries, partnerships, or support.'},
  {'name': 'meet the team page',
   'url': 'https://nebula.io/meet-the-team',
   'reason': "Introduces the company's leadership and tea

Analyzing with ollama LLMs

In [35]:
select_relevant_links_ollama(url="https://huggingface.co", ollama_model=ollama_model)

Using model: llama3.2:3b


{'relevant_links': [{'name': 'About Us',
   'url': 'https://huggingface.co/about',
   'reason': 'This page tells about the company mission and values.'},
  {'name': 'Company',
   'url': 'https://huggingface.co/company',
   'reason': 'This page provides an overview of the company.'},
  {'name': 'Careers',
   'url': 'https://huggingface.co/careers',
   'reason': 'This page provides information about job opportunities.'},
  {'name': 'Blog',
   'url': 'https://discuss.huggingface.co',
   'reason': 'This page showcases the blog posts of the company.'},
  {'name': 'GitHub',
   'url': 'https://github.com/huggingface',
   'reason': 'This page allows developers to explore and contribute to the Hugging Face projects'},
  {'name': '',
   'url': 'https://twitter.com/huggingface',
   'reason': '-related social media handle, not a core company webpage.'}]}

Google Gemini

In [36]:

base_url = os.getenv("GEMINI_BASE_URL")
gemini_model = os.getenv("GEMINI_FM")
gemini_api_key = os.getenv("GEMINI_API_KEY")
website_url = "https://huggingface.co"

select_relevant_links(base_url=base_url, model_name=gemini_model, api_key=gemini_api_key, website_url=website_url)

Using model: gemini-2.5-flash


{'relevant_links': [{'name': 'Home page',
   'url': 'https://huggingface.co/',
   'reason': 'Serves as the main landing page and general overview of the company.'},
  {'name': 'Enterprise solutions page',
   'url': 'https://huggingface.co/enterprise',
   'reason': 'Provides information on solutions tailored for businesses and organizations.'},
  {'name': 'Pricing page',
   'url': 'https://huggingface.co/pricing',
   'reason': 'Details the costs and plans for various services and products offered by the company.'},
  {'name': 'Careers page',
   'url': 'https://apply.workable.com/huggingface/',
   'reason': 'Lists available job opportunities and information about working at the company.'},
  {'name': 'Brand page',
   'url': 'https://huggingface.co/brand',
   'reason': "Outlines the company's brand identity and values, which is important for understanding the company's image."},
  {'name': 'Learn page',
   'url': 'https://huggingface.co/learn',
   'reason': "Offers educational resources, 

---

# Second Step: Make the broucher!
Assembel all the details info another prompt to LLMs
* API
  * Extract text content of website using scraper and initialize result
  * Get all links and filter relevant one's using Ollama LLM model
  * Read info about each link using scraper and append in result
  * Use Markdown formatting for make it presentable

In [50]:
# Define function to create brochure using selected links
def fetch_page_and_all_relevant_links(ollama_model, website_url):
    content = scraper.fetch_text_contents(website_url)
    ollama_model = os.getenv(LLAMA3_3B_MODEL_KEY)
    relevant_links = select_relevant_links_ollama(ollama_model, website_url)
    print(relevant_links)
    
    result = f"## Landing Page: \n\n{content}\n---\n## Relevnant Links:\n\n"
    for link in relevant_links['relevant_links']:
        result += f"\n\n### Link: {link['name']}\n"
        result += scraper.fetch_text_contents(link['url'])

    return result

Call above API

> Hallucinating
*  Created a invald URL, caused failure - https://discuss.huggingface.co/topics/career-development

In [51]:
print(fetch_page_and_all_relevant_links(ollama_model, "https://huggingface.co"))

Using model: llama3.2:3b
{'relevant_links': [{'name': 'About Page', 'url': 'https://huggingface.co/about', 'reason': 'This page provides information about the company mission and values.'}, {'name': 'Company Page', 'url': 'https://huggingface.co/blog', 'reason': 'This page provides various blog posts and updates from Hugging Face'}, {'name': 'Career/Jobs Page', 'url': 'https://discuss.huggingface.co/topics/career-development', 'reason': 'This page provides information about job opportunities at Hugging Face.'}]}


HTTPError: 404 Client Error: Not Found for url: https://discuss.huggingface.co/topics/career-development

Create another function to connect with any Frontier model

In [47]:
# Define function to create brochure using selected links
def fetch_page_and_all_relevant_links_fmodel(api_key, base_url, model_name, website_url):
    content = scraper.fetch_text_contents(website_url)
    relevant_links = select_relevant_links(base_url=base_url, model_name=model_name, api_key=api_key, website_url=website_url)
    #print(relevant_links)

    result = f"## Landing Page: \n\n{content}\n---\n## Relevnant Links:\n\n"
    for link in relevant_links['relevant_links']:
        result += f"\n\n### Link: {link['name']}\n"
        result += scraper.fetch_text_contents(link['url'])

    return result

Call Gemini LLM

In [52]:
api_key = os.getenv("GEMINI_API_KEY")
base_url = os.getenv("GEMINI_BASE_URL")
gemini_model = os.getenv("GEMINI_FM")
website_url = "https://huggingface.co"

print(fetch_page_and_all_relevant_links_fmodel(api_key=api_key, base_url=base_url, model_name=gemini_model, website_url=website_url))

Using model: gemini-2.5-flash
## Landing Page: 

Hugging Face ‚Äì The AI community building the future.

Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
MiniMaxAI/MiniMax-M2
Updated
5 days ago
‚Ä¢
726k
‚Ä¢
977
deepseek-ai/DeepSeek-OCR
Updated
10 days ago
‚Ä¢
2.06M
‚Ä¢
2.41k
moonshotai/Kimi-Linear-48B-A3B-Instruct
Updated
2 days ago
‚Ä¢
15k
‚Ä¢
319
briaai/FIBO
Updated
about 24 hours ago
‚Ä¢
2.84k
‚Ä¢
193
dx8152/Qwen-Edit-2509-Multiple-angles
Updated
about 8 hours ago
‚Ä¢
157
Browse 1M+ models
Spaces
Running
on
CPU Upgrade
1.15k
1.15k
The Smol Training Playbook: The Secrets to Building World-Class LLMs
üìù
Running
15.6k
15.6k
DeepSite v3
üê≥
Generate any application by Vibe Coding
Running
2.25k
2.25k
Wan2.2 Animate
üëÅ
Wan2.2 Animate
Running
on
Z

## Create system prompt for creating brochure from the content we fetched

In [61]:
brochure_system_prompt = """
    You are an assistand thta analyses the contents of serveral relevant pages from a company website and 
    creates a short brochure about the company for prospective customers, investors and recruiters.
    Respond in markdown format wihout code locks use horizontal rules (---) to separate sections.
    Include details of company culture, customers and careers/jobs if you have that information.
"""

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# brochure_system_prompt = """
# You are an assistant that analyzes the contents of several relevant pages from a company website
# and creates a short, humorous, entertaining, witty brochure about the company for prospective customers, investors and recruits.
# Respond in markdown without code blocks.
# Include details of company culture, customers and careers/jobs if you have the information.
# """

In [62]:
def get_brochure_user_prompt_frontier_models(company_name, website_url):
    user_prompt = f"""
    You are looking at a comany called: {company_name}
    Here's  the contents of it's landing page and other relevant pages;
    use this infformation to build a short broucher of the company in markdown format without code blocks.\n\n
    """
    user_prompt += fetch_page_and_all_relevant_links_fmodel(api_key=api_key, base_url=base_url, model_name=gemini_model, website_url=website_url)
    user_prompt = user_prompt[:5_000]  # Truncate if more than 5000 characters
    return user_prompt

In [63]:
get_brochure_user_prompt_frontier_models("Hugging Face", "https://huggingface.co")

Using model: gemini-2.5-flash


"\n    You are looking at a comany called: Hugging Face\n    Here's  the contents of it's landing page and other relevant pages;\n    use this infformation to build a short broucher of the company in markdown format without code blocks.\n\n\n    ## Landing Page: \n\nHugging Face ‚Äì The AI community building the future.\n\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nMiniMaxAI/MiniMax-M2\nUpdated\n5 days ago\n‚Ä¢\n726k\n‚Ä¢\n979\ndeepseek-ai/DeepSeek-OCR\nUpdated\n10 days ago\n‚Ä¢\n2.06M\n‚Ä¢\n2.41k\nmoonshotai/Kimi-Linear-48B-A3B-Instruct\nUpdated\n2 days ago\n‚Ä¢\n15k\n‚Ä¢\n319\nbriaai/FIBO\nUpdated\n1 day ago\n‚Ä¢\n2.84k\n‚Ä¢\n195\ndx8152/Qwen-Edit-2509-Multiple-angles\nUpdated\nabout 9 hours ago\n‚Ä¢\n159\nBrowse 1M+ model

Create brochure

In [64]:
def create_brochure(company_name, website_url):
    payload = [
        {"role": "system", "content": brochure_system_prompt},
        {"role": "user", "content": get_brochure_user_prompt_frontier_models(company_name, website_url)}
    ]

    json_resp_format = {"type": "text"}
    client = OpenAI(api_key=api_key, base_url=base_url)
    response = client.chat.completions.create(model=gemini_model, messages=payload, response_format=json_resp_format)
    brochure = response.choices[0].message.content

    display(Markdown(brochure))

In [65]:
create_brochure("HuggingFace", "https://huggingface.co")

Using model: gemini-2.5-flash


Hugging Face: The Home of Machine Learning - Building the Future of AI, Together

Hugging Face is the leading platform and vibrant community where the world's machine learning experts, developers, and researchers collaborate on cutting-edge models, diverse datasets, and innovative applications. We are dedicated to accelerating the advancement of artificial intelligence by fostering an open and collaborative environment, truly being "the AI community building the future."

---

**What We Offer**

Hugging Face provides an unparalleled ecosystem designed to create, discover, and collaborate on ML better:

*   **Models:** Explore and utilize over 1 million pre-trained models across various modalities, from advanced language models like MiniMax-M2 to OCR solutions like DeepSeek-OCR.
*   **Datasets:** Access a vast collection of over 250,000 datasets, including specialized resources for autonomous vehicles and curated prompts, to train and fine-tune your ML projects.
*   **Spaces (AI Applications):** Launch and experiment with over 400,000 interactive AI applications, or build your own. These range from tools for generating videos from images to advanced LLM training playbooks, runnable on various compute options.
*   **Collaboration Platform:** Our platform facilitates seamless collaboration, allowing you to host and share an unlimited number of public models, datasets, and applications. Build your ML portfolio and connect with the global AI community.
*   **Open Source Stack:** Leverage the powerful Hugging Face open-source stack to move faster and innovate with confidence.
*   **Multi-Modality Support:** Work with all types of data ‚Äì text, image, video, audio, and even 3D.

---

**For Our Customers & Partners**

Whether you are an individual researcher, a startup, or a large enterprise, Hugging Face offers solutions tailored to your needs:

*   **Individuals & Teams:** Create, discover, and collaborate on ML projects. Our platform is a launchpad for your innovations, enabling you to share your work and build your machine learning profile within a thriving community.
*   **Team & Enterprise Solutions:** Accelerate your organization's AI initiatives with enterprise-grade compute and platform solutions.
    *   **Team Plans:** Starting at $20/user/month, offering advanced collaborative features.
    *   **Enterprise Hub:** For larger organizations, we provide flexible contract options with paramount features like single sign-on (SSO), granular access controls, region selection for data residency, comprehensive audit logs, and dedicated support to ensure security and compliance at scale. Scale your organization with the world‚Äôs leading AI platform, giving your team the most advanced tools to build AI.

---

**Our Community & Culture**

At the heart of Hugging Face is a vibrant, global community driven by the belief that collective intelligence and open collaboration are key to building the future of AI. Our culture fosters sharing, innovation, and continuous learning, providing a space where everyone can contribute to and benefit from the advancements in machine learning. We are genuinely the "AI community building the future."

---

**Join Our Journey**

While specific career opportunities are not detailed in this brochure, Hugging Face is continuously expanding and seeking passionate individuals to contribute to our mission. If you are enthusiastic about machine learning, open source, and building the future of AI, we encourage you to explore career opportunities directly on our website. Become a part of the team that empowers millions of developers and researchers worldwide.

---

**Connect With Hugging Face**

Explore the future of AI with us. Discover models, datasets, and applications, or accelerate your team's ML development today.

Visit HuggingFace.co to learn more, sign up, or contact our sales team for enterprise solutions.

---
---

## Finally - A minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [68]:
def create_brochure_stream(company_name, website_url):
    payload = [
        {"role": "system", "content": brochure_system_prompt},
        {"role": "user", "content": get_brochure_user_prompt_frontier_models(company_name, website_url)}
    ]

    json_resp_format = {"type": "text"}
    client = OpenAI(api_key=api_key, base_url=base_url)
    
    stream = client.chat.completions.create(model=gemini_model, messages=payload, response_format=json_resp_format, stream=True)

    
    display_handle = display(Markdown(""), display_id=True)
    response = ""
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        update_display(Markdown(response), display_id=display_handle.display_id)


In [69]:
create_brochure_stream("HuggingFace", "https://huggingface.co")

Using model: gemini-2.5-flash


Hugging Face: Empowering the Global AI Community

---

**Introduction**

Hugging Face stands as the premier hub for machine learning, a vibrant community dedicated to building the future of artificial intelligence. We provide the essential platform where developers, researchers, and enterprises collaborate, innovate, and deploy cutting-edge AI technologies across all modalities. We are "The Home of Machine Learning," making it easier to create, discover, and collaborate on ML better.

---

**For Our Valued Customers**

Whether you're an individual developer, a research team, or a large enterprise, Hugging Face accelerates your journey in AI.
*   **Discover and Utilize:** Access an unparalleled collection of over 1 million models, 250,000 datasets, and 400,000 AI applications ("Spaces") covering diverse modalities including text, image, video, audio, and 3D. Easily find pre-trained models for tasks like text generation, image-to-text, or text-to-video, with comprehensive filtering options by parameters, libraries, and more.
*   **Collaborate and Create:** Leverage our platform to host unlimited public models, datasets, and applications, fostering seamless collaboration within your team and with the broader AI community.
*   **Accelerate Development:** Move faster with our robust open-source stack, designed to streamline your ML workflows and empower rapid iteration.
*   **Enterprise-Grade Solutions:** For teams requiring advanced capabilities and enhanced support, we offer paid compute resources and enterprise solutions, providing the most sophisticated platform for building AI with confidence and at scale.

---

**For Prospective Investors**

Hugging Face is at the forefront of the AI revolution, serving as the foundational infrastructure for machine learning innovation globally. Our platform demonstrates significant traction and growth, evidenced by:
*   An expansive ecosystem boasting over 1 million models, 250,000 datasets, and 400,000 thriving applications, with millions of interactions weekly.
*   Strategic partnerships and contributions from industry leaders like Nvidia, DeepSeek-AI, and MiniMaxAI, highlighting our critical role in the ML landscape.
*   A clear path to monetization through enterprise solutions and paid compute services, addressing the growing needs of commercial AI development.
*   A committed and ever-expanding global community that drives continuous innovation and adoption. Investing in Hugging Face means investing in the future of AI itself.

---

**For Future Collaborators and Talent (Careers)**

At Hugging Face, our culture is deeply rooted in **community, collaboration, and open-source innovation**. We believe in empowering individuals and fostering an environment where talent thrives.
*   **Community-Driven:** Join a global network of AI enthusiasts, researchers, and developers who are passionate about pushing the boundaries of machine learning. Your contributions help shape the AI landscape.
*   **Impactful Work:** Contribute to the leading platform that is democratizing AI, with your work potentially reaching and benefiting millions worldwide.
*   **Professional Growth:** We encourage you to "Build your portfolio" and "Share your work with the world," providing unparalleled opportunities to showcase your skills and grow your ML profile within a supportive and dynamic environment. We continually seek bright minds dedicated to open-source ML, community building, and advancing AI.

---

**Join the AI Revolution with Hugging Face!**

Whether you're looking to explore, build, invest, or contribute, Hugging Face is your home in the world of machine learning. Discover, create, and collaborate on the AI applications of tomorrow. Sign up today and be part of the community building the future.