### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

In [1]:
# imports
import warnings
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

# Disable warnings (including SSL warnings)
warnings.filterwarnings('ignore')

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please check it is set correctly in your .env file")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers, verify=False)  # verify=False (only for test purposes)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [4]:
an = Website("https://anthropic.com")
an.get_contents()

'Webpage Title:\nHome \\ Anthropic\nWebpage Contents:\nSkip to main content\nSkip to footer\nResearch\nEconomic Futures\nCommitments\nInitiatives\nTransparency\nResponsible Scaling Policy\nTrust center\nSecurity and compliance\nLearn\nLearn\nAnthropic Academy\nEngineering at Anthropic\nDeveloper docs\nCompany\nAbout\nCareers\nEvents\nNews\nTry Claude\nTry Claude\nTry Claude\nLearn more about Claude\nProducts\nClaude\nClaude Code\nClaude Developer Platform\nPricing\nContact sales\nModels\nOpus\nSonnet\nHaiku\nLog in\nClaude.ai\nClaude Console\nEN\nThis is some text inside of a div block.\nLog in to Claude\nLog in to Claude\nLog in to Claude\nDownload app\nDownload app\nDownload app\nResearch\nEconomic Futures\nCommitments\nInitiatives\nTransparency\nResponsible Scaling Policy\nTrust center\nSecurity and compliance\nLearn\nLearn\nAnthropic Academy\nEngineering at Anthropic\nDeveloper docs\nCompany\nAbout\nCareers\nEvents\nNews\nTry Claude\nTry Claude\nTry Claude\nLearn more about Claude\

In [5]:
an.links

['#main',
 '#footer',
 'https://www.anthropic.com/',
 'https://www.anthropic.com/research',
 'https://www.anthropic.com/economic-futures',
 'https://www.anthropic.com/transparency',
 'https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy',
 'http://trust.anthropic.com/',
 'https://www.anthropic.com/learn',
 'https://www.anthropic.com/engineering',
 'https://docs.claude.com',
 'https://www.anthropic.com/company',
 'https://www.anthropic.com/careers',
 'https://www.anthropic.com/events',
 'https://www.anthropic.com/news',
 'https://claude.ai',
 'https://claude.com/product/overview',
 'https://claude.com/product/claude-code',
 'https://claude.com/platform/api',
 'https://claude.com/pricing',
 'https://claude.com/contact-sales',
 'https://www.anthropic.com/claude/opus',
 'https://www.anthropic.com/claude/sonnet',
 'https://www.anthropic.com/claude/haiku',
 'https://claude.ai/login',
 'https://console.anthropic.com',
 '#',
 'https://claude.ai/login',
 'https://cla

## Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec.

In [6]:
# one-shot prompting for link extraction
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [7]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



In [8]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [9]:
print(get_links_user_prompt(an))

Here is the list of links on the website of https://anthropic.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
#main
#footer
https://www.anthropic.com/
https://www.anthropic.com/research
https://www.anthropic.com/economic-futures
https://www.anthropic.com/transparency
https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy
http://trust.anthropic.com/
https://www.anthropic.com/learn
https://www.anthropic.com/engineering
https://docs.claude.com
https://www.anthropic.com/company
https://www.anthropic.com/careers
https://www.anthropic.com/events
https://www.anthropic.com/news
https://claude.ai
https://claude.com/product/overview
https://claude.com/product/claude-code
https://claude.com/platform/api
https://claude.com/pricing
https://claude.com/contact-sales
https://www.anthropic.com/c

In [10]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"} # ensures we get valid JSON back
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [11]:
huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/deepseek-ai/DeepSeek-OCR',
 '/MiniMaxAI/MiniMax-M2',
 '/meituan-longcat/LongCat-Video',
 '/PaddlePaddle/PaddleOCR-VL',
 '/tencent/HunyuanWorld-Mirror',
 '/models',
 '/spaces/khang119966/DeepSeek-OCR-DEMO',
 '/spaces/enzostvs/deepsite',
 '/spaces/Wan-AI/Wan2.2-Animate',
 '/spaces/zerogpu-aoti/wan2-2-fp8da-aoti-faster',
 '/spaces/merterbak/DeepSeek-OCR-Demo',
 '/spaces',
 '/datasets/HuggingFaceFW/finewiki',
 '/datasets/nvidia/PhysicalAI-Autonomous-Vehicles',
 '/datasets/karpathy/fineweb-edu-100b-shuffle',
 '/datasets/HuggingFaceM4/FineVision',
 '/datasets/nick007x/github-code-2025',
 '/datasets',
 '/join',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/inference/models',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/mi

In [12]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'community forum', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [None]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()  # get main page contents
    links = get_links(url)  # model call to get relevant links on page
    print("Found links:", links)
    # iterate through relevant links
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        # use the same Website class to get contents, append to result
        result += Website(link["url"]).get_contents()
    return result

In [14]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'forum', 'url': 'https://discuss.huggingface.co'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
deepseek-ai/DeepSeek-OCR
Updated
5 days ago
•
1.34M
•
2.23k
MiniMaxAI/MiniMax-M2
Updated
about 17 hours ago
•
286k
•
747
meituan-longcat/LongCat-Video
Updated
1 day ago
•
660
•
202
PaddlePaddle/PaddleOCR-VL
Updated
about 20 hours ago
•
22k
•

In [None]:
# use the results gathered to create a brochure
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [25]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [17]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\ndeepseek-ai/DeepSeek-OCR\nUpdated\n5 days ago\n•\n1.34M\n•\n2.23k\nMiniMaxAI/MiniMax-M2\nUpdated\nabout 17 hours ago\n•\n286k\n•\n747\nmeituan-longcat/LongCat-Video\nUpdated\n1 day ago\n•\n660\n•\n202\nPaddlePaddle/PaddleOCR-VL\nUpdated\nabout 20 hours ago\n•\n22k\n•\n1.16k\ntencent/HunyuanWorld-Mirror\nUpdated\n5 days ago\n•\n18.3k\n•\n391\nBrowse 1M+ models\nSpaces\nRunning\non\nZero\n34

In [None]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            # second LLM call to fetch contents via get_all_details()
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [19]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company blog', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface'}]}


# Hugging Face Brochure

Welcome to Hugging Face – The AI Community Building the Future!

## About Hugging Face

Hugging Face is a collaborative platform that empowers the machine learning community to advance the field through sharing models, datasets, and applications. With a commitment to open-source principles, Hugging Face is dedicated to making machine learning accessible and effective for everyone. 

### Key Offerings

- **Models**: Browse over 1 million models, including state-of-the-art AI models for various tasks.
- **Datasets**: Access and share over 250,000 datasets tailored for different ML tasks.
- **Spaces**: Create and host applications effortlessly, with support for multiple media modalities, including text, image, and video.
- **Enterprise Solutions**: Advanced platform with enterprise-grade security, single sign-on (SSO), and dedicated support tailored for teams.

## Company Culture

At Hugging Face, collaboration and community are at the heart of everything we do. Our culture fosters an inclusive environment where everyone is encouraged to contribute ideas, share knowledge, and innovate. We believe in leveraging the collective strength of our community to build the future of AI together.

### Why Join Us?

- **Innovation**: Be part of a company that is at the forefront of AI and ML technology.
- **Community Impact**: Contribute to open-source projects that make a significant difference in the tech world.
- **Growth Opportunities**: Work alongside some of the best minds in the field and constantly evolve your skills.

## Our Customers

Hugging Face serves over 50,000 organizations globally, including notable companies like:

- **Amazon**
- **Google**
- **Microsoft**
- **Grammarly**

Our platform is utilized by both startups and leading enterprises, providing them with the resources to leverage AI innovation effectively.

## Careers at Hugging Face

Are you looking to take your career to the next level? Join Hugging Face! We are always on the lookout for talented and passionate individuals to join our team. 

**Open Positions Include:**
- Machine Learning Engineers
- Data Scientists
- Software Developers

**Benefits of Working with Us:**
- Competitive salary and equity packages
- Flexible work hours and remote work options
- A culture that values continuous learning and development 

### Join Us

Explore our career opportunities [here](https://huggingface.co/jobs) and become part of a thriving community shaping the future of AI.

## Connect With Us!

Stay updated with our activities, resources, and community projects:
- [Blog](https://huggingface.co/blog)
- [GitHub](https://github.com/huggingface)
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://www.linkedin.com/company/huggingface)

---

Join us in our mission as we build the tools for a more innovative and collaborative future in AI. Together, we can achieve more!

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [None]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    # add streaming interface
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [23]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Company Brochure

## Welcome to Hugging Face – The AI Community Building the Future

Hugging Face is at the forefront of machine learning innovation, providing a collaborative platform where professionals, researchers, and enthusiasts from around the world can build, share, and explore state-of-the-art machine learning models and applications. 

---

## What We Do

At Hugging Face, we empower the AI community with tools that democratize artificial intelligence. Our platform hosts over **1 million models**, **250,000 datasets**, and a myriad of **interactive applications** through our Spaces feature. Whether you're looking to create, discover, or collaborate on machine learning projects, we've built the resources you need to succeed.

- **Models:** Access cutting-edge models for various tasks across text, image, audio, and more.
- **Datasets:** Explore thousands of datasets tailored for machine learning.
- **Spaces:** Create and share applications with ease.

---

## Our Customers

Join a diverse mix of organizations using Hugging Face to harness the power of AI:
- **Tech Giants**: Companies like Google, Microsoft, and Amazon utilize our platform for their advanced AI research and applications.
- **The Community**: These include academic teams and non-profit organizations dedicated to propelling AI innovation.

Over **50,000 organizations** trust us as their AI partner, demonstrating widespread confidence in our technology and community.

---

## Company Culture

At Hugging Face, we believe in an open, collaborative culture that encourages creativity and exploration. Our community-driven model fosters inclusivity, collaborative projects, and the sharing of knowledge across borders. We are committed to transparency and support a healthy work-life balance, ensuring all team members contribute to a mission-driven environment.

We recognize that the future of AI is built on a foundation of collaboration and transparency, and we actively promote these principles in everything we do.

---

## Careers and Opportunities

Hugging Face is expanding, and we’re always on the lookout for passionate individuals to join our team! By becoming part of our dynamic group, you'll contribute to the foundation of machine learning tooling and help shape the future of AI.

### Current Openings 
- **Software Engineers**
- **Machine Learning Researchers**
- **Community Managers**
- **Product Managers**

### Why Join Us?
- **Innovative Environment:** Work with cutting-edge technology and contribute to impactful projects.
- **Collaborative Culture:** Be part of a diverse team that encourages sharing insights and knowledge.
- **Growth Opportunities:** We support continuous learning and career advancement.

For potential applicants, you can find our current job openings on our [jobs page](https://huggingface.co/jobs).

---

## Join Us Today!

Become part of the revolution in AI development and collaboration. Whether you're looking to enhance your skills, partner on innovative projects, or join our growing team at Hugging Face, we invite you to explore our platform today.

- **Website:** [huggingface.co](https://huggingface.co)
- **Follow us on social media:** 
  - [Twitter](https://twitter.com/huggingface)
  - [LinkedIn](https://linkedin.com/company/huggingface)
  - [GitHub](https://github.com/huggingface)

Let's build the future together!

In [27]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}



# Welcome to Hugging Face! 🤗

### Hugging Face – The AI Community Building the Future! 🌟

At Hugging Face, we’re not just a company; we’re a family. A tech-savvy, code-crunching, model-loving family! The place where machine learning enthusiasts and AI wizards unite to **build the future**. 🛠️✨

---

### What Do We Do? 🤔

We offer a *wonderland* where you can:

- Discover **1M+ AI Models** (No, that’s not a typo!)
- Access **250k+ Datasets** (Enough data to make even your big data dreams blush!)
- Create cutting-edge **AI Applications** in our magical **Spaces** (Think of them as cozy online coffee shops for coders).

**This week’s hottest models include:**

- **DeepSeek-OCR**: The model that sees everything! (Just kidding! Mostly text.)
- **MiniMax-M2**: Not just a movie title, but a ML superstar! 🍿
- And the ever-mysterious **LongCat-Video**! (We’re still investigating its origins... 🐱)

---

### Why Join the Hugging Face Family? 💖

**Culture of Collaboration**: 
- Here, the only competition is who can come up with the best meme about neural networks! 🤖😄
- We thrive on helping others grow, whether you’re in our office or halfway across the globe!

**For the Customers**:
- **Join over 50,000 organizations** using our platform! From Amazon to Google, we’re quite the popular kids on the block. We’re like the "it" model at the AI prom.
  
---

### Careers at Hugging Face: Your New Home! 🏠

Are you ready to embark on an adventure in AI? We’re on the lookout for bold explorers (that’s you!) eager to dive into Pytorch, help train transformer models, or pair up with SafetyTensor (a model with quite a bit of weight on its shoulders! 🏋️‍♂️).

- **Current Job Openings**: Check out our nifty jobs page because we want to hear from you! Don’t worry if you don’t understand everything – we hire for potential, not perfection. 😉

---

### Join the Community! 

Got an idea for a model that will solve world hunger, or maybe just generate cat memes? We have a **forum** for that! Connect with others, swap tips, and maybe challenge us to a dance-off – though we will not be responsible for any cringe! 💃

---

### Closing Remarks ✌️

Join us at Hugging Face where every day is an opportunity to learn, create, and maybe have a bit of fun along the way! Who says work can't be filled with laughter and AI sorcery? 

*Ready to embrace the future? Sign up today! We promise we’ll give you hugs, virtually speaking of course!*

---

📧 **Contact Us**: [Let’s Talk!](mailto:contact@huggingface.co)   

Follow us on:
- [GitHub](#) | [Twitter](#) | [LinkedIn](#) | [Discord](#)

**Hugging Face**: AI, but make it cozy! 🌈
