# A full business solution

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

In [None]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [None]:
hf = Website("https://huggingface.co")
hf.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',
 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-wit

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [6]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [7]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [None]:
print(get_links_user_prompt(hf))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/
https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/
https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineerin

In [9]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [10]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/mistralai/Devstral-Small-2505',
 '/google/gemma-3n-E4B-it-litert-preview',
 '/ByteDance-Seed/BAGEL-7B-MoT',
 '/google/medgemma-4b-it',
 '/nari-labs/Dia-1.6B',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/Lightricks/ltx-video-distilled',
 '/spaces/NihalGazi/FLUX-Pro-Unlimited',
 '/spaces/stepfun-ai/Step1X-3D',
 '/spaces/ByteDance/DreamO',
 '/spaces',
 '/datasets/disco-eth/EuroSpeech',
 '/datasets/openbmb/Ultra-FineWeb',
 '/datasets/ministere-culture/comparia-conversations',
 '/datasets/nvidia/OpenCodeReasoning',
 '/datasets/nvidia/OpenMathReasoning',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',
 '/docs/tr

In [11]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog', 'url': 'https://huggingface.co/blog'},
  {'type': 'community discussion', 'url': 'https://discuss.huggingface.co'},
  {'type': 'company LinkedIn',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [12]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [13]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'models page', 'url': 'https://huggingface.co/models'}, {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'}, {'type': 'spaces page', 'url': 'https://huggingface.co/spaces'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
mistralai/Devstral-Small-2505
Updated
1 day ago


In [14]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [15]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [16]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'community discussion page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nmistralai/Devstral-Small-2505\nUpdated\n1 day ago\n•\n45.9k\n•\n484\ngoogle/gemma-3n-E4B-it-litert-preview\nUpdated\n4 days ago\n•\n442\nByteDance-Seed/BAGEL-7B-MoT\nUpdated\n2 days ago\n•\n998\n•\n413\ngoogle/medgemma-4b-it\nUpdated\n3 days ago\n•\n8.49k\n•\n174\nnari-labs/Dia-1.6B\nUpdated\n11 days ago\n•\n178k\n•\n2.37k\nBrowse 1M+ models\nSpaces\nRunning\n7.09k\n7.09k\nDeepSite\n🐳\nGen

In [17]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [18]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}]}


# Hugging Face Brochure

## Overview
**Hugging Face** is at the forefront of artificial intelligence and machine learning, dedicated to building the future by creating an inclusive community for AI professionals, researchers, and enthusiasts. Our collaborative platform enables the discovery and utilization of a vast array of models, datasets, and applications.

## What We Offer
- **1M+ Models**: Explore and collaborate on a diverse range of machine learning models.
- **250k+ Datasets**: Access an extensive selection of datasets for various tasks in machine learning.
- **Spaces**: Run applications and showcase your projects with ease.
- **Compute and Enterprise Solutions**: We offer paid solutions for scalable performance, with GPU access and enterprise-grade security features.

## Community Engagement
Join a thriving community of over 50,000 organizations, including industry giants like Google, Amazon, Microsoft, and Meta. Engage in meaningful collaborations to enhance your machine learning journey.

## Company Culture
At Hugging Face, we are committed to:
- **Open Source Development**: We believe in building robust foundations of ML tooling together with our community. Our libraries are open for contributions and innovations.
- **Support for Innovation**: Promote creativity and personalization in AI with our leading-edge platforms.
- **Inclusivity**: Encourage diverse participation in the AI field by providing accessible resources and tools.

## Careers at Hugging Face
We are always on the lookout for talented individuals to join our expanding team. Hugging Face offers:
- **Flexible Working Environment**: Embrace remote work and flexible hours.
- **Collaborative Team Culture**: Work in a team that values knowledge sharing and collaboration.
- **Skill Development**: Opportunities for continuous learning and professional growth.
- **Diverse Roles**: From machine learning engineers to community managers, we have roles tailored to various skill sets.

## Why Choose Hugging Face?
- **Leading Technology**: Get access to state-of-the-art models and tools like Transformers, Diffusers, and more.
- **Strong Community Support**: Benefit from extensive documentation, community forums, and collaborative spaces.
- **Impactful Work**: Make a difference in the AI landscape by contributing to groundbreaking projects and initiatives.

**Join us at Hugging Face and be a part of the movement to democratize machine learning!** 

### Contact Us
For inquiries about our services, careers, or partnerships, please visit our [website](https://huggingface.co) or connect with us on social media (Twitter, GitHub, LinkedIn).

---

**The AI community building the future.**

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [19]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [20]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'community discussion page', 'url': 'https://discuss.huggingface.co'}]}


# Hugging Face Brochure

---

## Welcome to Hugging Face

Hugging Face is an innovative platform at the forefront of the AI community, dedicated to building the future through collaborative machine learning (ML). We offer an extensive range of open-source tools, libraries, and a vibrant community where both newcomers and experts can explore, develop, and deploy state-of-the-art ML models.

### Our Services

- **Models**: Access over 1 million models spanning various functionalities, including text, images, audio, and even 3D data.
- **Datasets**: Browse 250k+ datasets tailored for any ML task.
- **Spaces**: Create and share applications effortlessly, with thousands of live applications running on our platform.
- **Enterprise Solutions**: We provide tailored offerings for enterprises, such as advanced security, access controls, and dedicated support, ensuring your team has the resources required to excel in AI development.

### The Hugging Face Community

More than 50,000 organizations, including leading tech giants like Google, Microsoft, Amazon, and Meta, trust Hugging Face for their machine learning needs. Our collaborative environment empowers individuals and teams to innovate, share knowledge, and drive discovery in AI.

#### Our open-source initiatives include:

- **Transformers**: The state-of-the-art ML library for PyTorch, TensorFlow, and JAX.
- **Diffusers**: Modern implementation of diffusion models for generative tasks.
- **Tokenizers**: Efficiently optimized for both research and production.
- And much more!

### Company Culture

At Hugging Face, we believe in the power of collaboration, transparency, and community-driven growth. Our culture is built on shared learning, where team members encourage each other to share their ideas and breakthroughs. We’re passionate about democratizing access to machine learning, making it possible for everyone to contribute to and benefit from AI advancements.

#### Employee Experience

- **Empowerment**: Team members are encouraged to take ownership of their projects.
- **Supportive Environment**: Continuous learning opportunities and resources to stimulate professional growth.
- **Inclusivity**: A diverse workforce that welcomes individuals from all backgrounds and experiences.

### Join Us

Hugging Face is always on the lookout for talented individuals ready to make a difference in the field of AI. Whether you’re a developer, researcher, or sales professional, we invite you to explore our career opportunities and be part of our mission to create accessible and effective machine learning solutions.

---

**Ready to Collaborate?**

Join us today to explore the world of machine learning, contribute to cutting-edge projects, and become part of a global community dedicated to building the future of AI! 

[Explore Our Site](https://huggingface.co)

--- 

*For inquiries, follow us on our social platforms or visit our blog for the latest updates and insights.*

In [21]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'status page', 'url': 'https://status.huggingface.co'}]}


# Hugging Face Company Brochure

## Welcome to Hugging Face
**The AI community building the future.**   
At Hugging Face, we believe in the power of collaboration to advance the field of machine learning. Our platform connects individuals and organizations to share models, datasets, and applications, creating a vibrant ecosystem for innovation.

---

## Our Offerings

- **Models:**  
  Access and contribute to over **1 million machine learning models**, including industry-leading architectures in text, image, video, audio, and 3D.
  
- **Datasets:**  
  Browse through **250,000+ datasets** available for various machine learning tasks, fostering research and application development.

- **Spaces:**  
  Create and deploy machine learning applications seamlessly. Explore **400k+ applications** built using our models and tools.

- **Enterprise Solutions:**  
  Comprehensive paid compute options and enterprise-grade solutions starting at **$20/user/month**, providing advanced tools, support, and security.

---

## Community Focus
Hugging Face is more than just a platform; we are a **community** of over **50,000 organizations**, including renowned names like Google, Amazon, and Microsoft. Our collaborative spirit drives us to share knowledge, tools, and resources while democratizing access to advanced machine learning capabilities.

We host initiatives that allow users to publish blog articles, share their work, and collaborate on countless projects, reinforcing our commitment to collective growth. 

---

## Company Culture
At Hugging Face, our culture is built on values of **transparency**, **innovation**, and **inclusivity**. We prioritize a collaborative environment where everyone’s voice is heard, encouraging creativity and exploration in the world of AI.

Join our enthusiastic team of **216 members** dedicated to breaking boundaries in machine learning. We are on a mission to democratize access to great machine learning tools and resources.

---

## Careers at Hugging Face
Looking to contribute to the future of AI? We are always on the lookout for passionate individuals to join our growing team. Whether you're an ML engineer, researcher, or developer, there's a place for you at Hugging Face. Explore our [Jobs Page](https://huggingface.co/jobs) to view current opportunities.

---

## Contact Us
For more information, feel free to visit our website [Hugging Face](https://huggingface.co) or follow us on our social media channels to stay updated on the latest news and opportunities.

---

**Join us in building the future of machine learning, one collaboration at a time!**