# A full business solution

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key?")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

In [4]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [10]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [13]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\nLinks (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [14]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result) # turns that string into a Python dictionary.

In [16]:
huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/deepseek-ai/DeepSeek-V3-0324',
 '/Qwen/Qwen2.5-Omni-7B',
 '/manycore-research/SpatialLM-Llama-1B',
 '/ds4sd/SmolDocling-256M-preview',
 '/Qwen/Qwen2.5-VL-32B-Instruct',
 '/models',
 '/spaces/ByteDance/InfiniteYou-FLUX',
 '/spaces/enzostvs/deepsite',
 '/spaces/3DAIGC/LHM',
 '/spaces/Trudy/gemini-codrawing',
 '/spaces/starvector/starvector-1b-im2svg',
 '/spaces',
 '/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1',
 '/datasets/glaiveai/reasoning-v1-20m',
 '/datasets/FreedomIntelligence/medical-o1-reasoning-SFT',
 '/datasets/a-m-team/AM-DeepSeek-R1-Distilled-1.4M',
 '/datasets/PixelAI-Team/TalkBody4D',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',


In [17]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'community page', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [18]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [19]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'discussion page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Br

In [20]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [21]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += "Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [22]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\ndeepseek-ai/DeepSeek-V3-0324\nUpdated\n3 days ago\n•\n60.5k\n•\n1.99k\nQwen/Qwen2.5-Omni-7B\nUpdated\n1 day ago\n•\n27.9k\n•\n822\nmanycore-research/SpatialLM-Llama-1B\nUpdated\n9 days ago\n•\n11.6k\n•\n793\nds4sd/SmolDocling-256M-preview\nUpdated\n7 days ago\n•\n48.4k\n•\n1.04k\nQwen/Qwen2.5-VL-32B-Instruct\nUpdated\n4 days ago\n•\n88.3k\n•\n263\nBrowse 1M+ models\nSpaces\nRunning\non\nZero\n

In [23]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [24]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'home page', 'url': 'https://huggingface.co/'}, {'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'discussion forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

---

## Welcome to Hugging Face

**Hugging Face** is at the forefront of the AI revolution, serving as a platform for the global machine learning community to collaborate, innovate, and build the future of artificial intelligence. Our mission is to democratize AI, empowering individuals and organizations to harness the power of machine learning with ease and efficiency.

---

## Our Offerings:

- **Models:** Browse over **1 million models** including state-of-the-art solutions designed for various machine learning tasks.
- **Datasets:** Access around **250,000 datasets** to facilitate your research and development needs.
- **Spaces:** Run and share applications effortlessly on our platform, with over **400,000 applications** ready to explore.
- **Enterprise Solutions:** Tailored services for organizations ensuring top-notch security and support.

### Trending Models
Some of the most popular models this week include:
- **DeepSeek-V3-0324**
- **Qwen/Qwen2.5-Omni-7B**
- **SpatialLM-Llama-1B**

---

## A Thriving Community

At Hugging Face, we are proud to have over **50,000 organizations** leveraging our platform, including industry leaders like:
- **Google**
- **Meta**
- **Amazon**
- **Microsoft**

This vibrant community of users collaborates, shares, and pushes the boundaries of what AI can achieve.

---

## Company Culture

Hugging Face is built on the principles of **collaboration**, **openness**, and **innovation**. Our culture encourages:
- Open-source contributions and sharing knowledge.
- Engaging discussions leading to groundbreaking ideas.
- Supportive environment that fosters creativity and professional growth.

Join a team where your ideas can flourish and where you can make a meaningful impact on the AI landscape!

---

## Careers at Hugging Face

We are constantly on the lookout for talented individuals to join our team. Whether you are a developer, researcher, or community manager, there are various opportunities available. Explore our career page to find roles that align with your skills and interests.

### Why Join Us?
- Work with cutting-edge technology.
- Collaborate with a community of diverse and brilliant minds.
- Contribute to projects that shape the future of AI.

---

## Join Us!

Become part of the Hugging Face community today! Whether you're interested in using our tools, collaborating on projects, or working with us directly, we've got something for everyone.

**Connect with us:**
- [Sign Up](#)
- [Explore our Models](#)
- [Visit our Blog](#)

---

**Hugging Face – Together, let’s build the future of AI!**

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [25]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [54]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

---

**Hugging Face: The AI Community Building the Future**

Welcome to Hugging Face, where we empower the machine learning community to create, discover, and collaborate on state-of-the-art AI models, datasets, and applications. Our innovative platform hosts over 1 million models and more than 250,000 datasets, making us the go-to destination for developers, researchers, and enterprises alike.

## Key Offerings

- **Models**: Access a wide variety of models tailored for text, image, video, audio, and 3D tasks. We provide a robust platform for building and deploying machine learning solutions.
  
- **Datasets**: Explore our extensive collection of datasets to support your machine learning projects, ensuring you have the right resources for your training and evaluation needs.

- **Spaces**: Collaborate and run applications effortlessly in our dedicated spaces, showcasing examples such as Gemini Co-Drawing and DeepSite.

- **Enterprise Solutions**: Benefit from our enterprise-grade offerings, which include enhanced security, custom support, and a dedicated platform for your organization's needs.

## Community and Culture

At Hugging Face, we pride ourselves on fostering a collaborative and inclusive community. Our team and users span across multiple domains, including academia, industry, and non-profits, with engagement from organizations like Google, Microsoft, Amazon, and more. 

We believe in the power of open-source, contributing to tools like Transformers and Diffusers, which are fundamental to both research and production in the AI landscape. Our mission is to democratize AI, providing a welcoming space for every practitioner—from newcomers to seasoned experts.

## Careers at Hugging Face

Join us as we build the future of AI! We are constantly looking for talented individuals across various roles. Whether you are an AI researcher, software engineer, or a product manager, at Hugging Face, you will find a workplace that values innovation, collaboration, and the drive to push boundaries. 

- **Culture**: Our culture promotes creativity, accountability, and growth. We encourage team members to share their ideas and explore their passions within the AI domain.
- **Benefits**: We offer competitive salaries, comprehensive benefits, and an environment that prioritizes work-life balance.

## Join Us Today!

Explore our offerings, collaborate with fellow enthusiasts, and make an impact in the world of AI. Whether you are looking to build your portfolio, develop new applications, or start a career in an exciting and growing field, Hugging Face is your partner in innovation.

**Visit us at [Hugging Face](https://huggingface.co)**

Together, let's build the future of AI!