## Business Challenge

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits

We will be provided a company name and their primary website


In [4]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [5]:
# Initialize and constants
load_dotenv(override=True)
api_key = os.getenv("OPENAI_API_KEY")

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")

MODEL = 'gpt-4o-mini'
openai = OpenAI()
    

API key looks good so far


In [6]:
# A class to represent a Webpage

#some websites need you to use proper headers when fetching them

headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}
class Website:
    """
    A class called Website that scraps the webpage but this time, it gives links
    """
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get("href") for link in soup.find_all("a")]
        self.links = [link for link in links if link]
    def get_contents(self):
        return f"webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [7]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',
 'https://edwarddonner.com/2025/04/21/the-

## First Step: Sorting Relevant links

In [8]:
link_system_prompt = "you are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as the links to an about page, or a Company page, or careers/Jobs pages. \n"
link_system_prompt += "You should respond in JSON as this example:"
link_system_prompt += """
{
    "links" : [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}   
    ]

}
"""




In [9]:
print(link_system_prompt)

you are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as the links to an about page, or a Company page, or careers/Jobs pages. 
You should respond in JSON as this example:
{
    "links" : [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}   
    ]

}



In [10]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
    Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [11]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format.     Do not include Terms of Service, Privacy, email links.
links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/
https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/
https://edwa

In [12]:
#sending a webpage link to OpenAI to return only relevant links

def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role":"system", "content":link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
        ],
            response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [13]:
get_links("https://www.anthropic.com")

{'links': [{'type': 'homepage', 'url': 'https://www.anthropic.com/'},
  {'type': 'about page', 'url': 'https://www.anthropic.com/company'},
  {'type': 'careers page', 'url': 'https://www.anthropic.com/careers'},
  {'type': 'team page', 'url': 'https://www.anthropic.com/team'},
  {'type': 'research page', 'url': 'https://www.anthropic.com/research'},
  {'type': 'engineering page', 'url': 'https://www.anthropic.com/engineering'},
  {'type': 'events page', 'url': 'https://www.anthropic.com/events'},
  {'type': 'learn page', 'url': 'https://www.anthropic.com/learn'}]}

In [14]:
# Anthropic has made their site harder to scarp, so lets us use HuggingFace

huggingface = Website("https://huggingface.co")
huggingface.links


['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/mistralai/Magistral-Small-2506',
 '/openbmb/MiniCPM4-8B',
 '/Qwen/Qwen3-Embedding-0.6B-GGUF',
 '/fishaudio/openaudio-s1-mini',
 '/deepseek-ai/DeepSeek-R1-0528',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/ResembleAI/Chatterbox',
 '/spaces/multimodalart/wan2-1-fast',
 '/spaces/aisheets/sheets',
 '/spaces/NihalGazi/Text-To-Speech-Unlimited',
 '/spaces',
 '/datasets/open-thoughts/OpenThoughts3-1.2M',
 '/datasets/nvidia/Nemotron-Personas',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/a-m-team/AM-DeepSeek-R1-0528-Distilled',
 '/datasets/yandex/yambda',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',
 '

In [15]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog', 'url': 'https://huggingface.co/blog'},
  {'type': 'docs', 'url': 'https://huggingface.co/docs'}]}

## Second step: Make the brochure

### Assemble all the details into another prompt to GPT4-o

In [16]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [17]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'github page', 'url': 'https://github.com/huggingface'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}
Landing page:
webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
mistralai/Magistral-Small-2506
Updated
1 day ago
•
5.82k
•
348
openbmb/MiniCPM4-8B
Updated
3 days ag

In [18]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown. \
Include details of company culture, customers and careers/jobs if you have the information. "

In [20]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000]
    return user_prompt

In [21]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'contact page', 'url': 'https://discuss.huggingface.co'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nwebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nmistralai/Magistral-Small-2506\nUpdated\n1 day ago\n•\n5.82k\n•\n348\nopenbmb/MiniCPM4-8B\nUpdated\n3 days ago\n•\n3.06k\n•\n207\nQwen/Qwen3-Embedding-0.6B-GGUF\nUpdated\n4 days ago\n•\n14.8k\n•\n341\nfishaudio/openaudio-s1-mini\nUpdated\n10 days ago\n•\n2.38k\n•\n248\ndeepseek-ai/DeepSeek-R1-0528\nUpdated\n15 days ago\n•\n116k\n•\n1.95k\nBrowse 1M+ models\nSpaces\nRunning\n7.98k\n7.98k\nD

In [25]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],    
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [27]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'documentation page', 'url': 'https://huggingface.co/docs'}]}


# Hugging Face Brochure

**Welcome to Hugging Face – The AI Community Building the Future!**

At Hugging Face, we are dedicated to revolutionizing the machine learning landscape by creating an open and collaborative community for developers, researchers, and enterprises. Our platform allows users to explore, create, and share cutting-edge AI technologies.

---

## **Who We Are**

Hugging Face is a leading platform in the AI space, offering a suite of tools and resources aimed at empowering machine learning enthusiasts and professionals. With over 1 million models and250,000+ datasets at your fingertips, we provide everything necessary to foster collaboration and innovation.

### **Our Mission**
We believe in open-source and transparency as cornerstones of technological advancement. Our contributions to the machine learning community include state-of-the-art libraries like Transformers and Diffusers, which have become essential for researchers and developers globally.

---

## **What We Offer**

### **Collaborative Tools**
- **Models & Datasets**: Explore and collaborate on a myriad of machine learning models and datasets.
- **Spaces**: Run and discover diverse applications such as DeepChat, Text-to-Speech, and more innovative tools to maximize user engagement.
- **Compute Solutions**: Deploy ML applications seamlessly with optimized GPU inference endpoints starting at just $0.60/hour, or explore enterprise solutions for advanced needs.

### **Enterprise Solutions**
Our enterprise offerings ensure full-scale deployment of AI solutions with:
- Enterprise-grade security
- Access controls
- Dedicated support starting at $20/user/month

### **Community Engagement**
Join a vibrant community where over 50,000 organizations—including giants like Google and Microsoft—collaborate on, share, and innovate in AI.

---

## **Company Culture**
At Hugging Face, we cultivate an inclusive and dynamic community through collaboration, learning, and respect for diversity. Our workplace is built on the principles of support and creativity, encouraging every team member to contribute their ideas and grow with the organization.

## **Careers at Hugging Face**
We are always on the lookout for passionate individuals to join our team! Whether you’re an engineer, researcher, or part of our community support, we invite you to contribute to the future of AI and machine learning with us.

- **Benefits**: We offer flexible working hours, a creative work environment, and opportunities for growth and development.
- **Open Positions**: Explore a range of exciting positions available on our [jobs page](https://huggingface.co/jobs).

---

## **Get Involved**
1. **Explore**: Visit our platform to browse our models and datasets.
2. **Join the Community**: Sign up and start collaborating with other AI enthusiasts.
3. **Follow Us**: Keep up with our latest news and advancements through our social media channels – GitHub, Twitter, LinkedIn, and Discord.

**Join us at Hugging Face, where we’re not just building AI – we're fostering a community that inspires and empowers everyone.**

## Finally - a minor improvement

We can include animation

In [32]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role":"system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
        stream=True  
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [34]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}]}


# Hugging Face Brochure

**The AI Community Building the Future**

---

## Overview

Hugging Face is at the forefront of artificial intelligence and machine learning, providing a collaborative platform for developers and researchers to share and enhance a vast repository of models, datasets, and applications. Known for its user-friendly interfaces and robust community, Hugging Face empowers individuals and enterprises alike to create and innovate in the realm of AI.

---

## Our Offerings

- **Models**: Access over 1 million models, tailored for a variety of tasks including text, image, audio, and video processing. From innovative state-of-the-art transformers to the latest in diffusion models, our library is constantly evolving.
  
- **Datasets**: Explore and share 250,000+ datasets for machine learning tasks, facilitating seamless collaboration and experimentation.
  
- **Spaces**: Host and deploy applications effortlessly with our Spaces platform, where users can construct applications quickly and effectively.

- **Enterprise Solutions**: Tailored for organizations, our enterprise services include advanced security features and dedicated support to scale AI capabilities efficiently.

---

## Community & Culture

Hugging Face thrives on a vibrant and inclusive community where collaboration and innovation are central to our culture. With over 50,000 organizations using our resources, including industry giants like Google, Amazon, Microsoft, and Meta, we foster a rich environment for knowledge-sharing among AI enthusiasts, professionals, and newcomers alike.

### Join Us!
- **Connect**: Engage with other AI practitioners in our forums and through our GitHub repositories.
- **Contribute**: Whether by improving our open-source tools or sharing your insights, every contribution is valued.

---

## Career Opportunities

Hugging Face is constantly looking for passionate individuals who are eager to contribute to the future of AI. For those who want to be part of a team that values creativity, collaboration, and continuous learning, we offer various opportunities ranging from engineering to product management. 

### Benefits of Working at Hugging Face:
- **Innovative Environment**: Work on cutting-edge technology in a dynamic and supportive setting.
- **Growth Opportunities**: We invest in our employees' professional development and offer ample opportunities for learning.
- **Remote Work Flexibility**: Embrace a work-life balance with flexible working arrangements.

---

## Get In Touch!

Ready to explore the future of AI with us? Check out our website for more information on our offerings, to join our community, or to apply for a job:

[Visit Hugging Face](https://huggingface.co)

---

> **“At Hugging Face, we believe in the transformative power of AI, and we are dedicated to building tools that make it accessible for everyone.”**