# Business Usecase

## Publish a Company Brochure

This product will provide a brochure for the company based on the website URL, to provide a consice and well formatted information.

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? ")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [4]:
websiteContent = Website("https://nvidia.com")
websiteContent.links

['https://www.nvidia.com',
 '#page-content',
 'https://www.nvidia.com/en-us/',
 'https://www.nvidia.com/en-us/clara/biopharma/',
 'https://www.nvidia.com/en-us/data-center/dgx-cloud/',
 'https://www.nvidia.com/en-us/gpu-cloud/nemo-llm-service/',
 'https://www.nvidia.com/en-us/omniverse/cloud/',
 'https://docs.nvidia.com/ngc/gpu-cloud/ngc-private-registry-user-guide/index.html',
 'https://www.nvidia.com/en-us/gpu-cloud/',
 'https://www.nvidia.com/en-us/data-center/',
 'https://www.nvidia.com/en-us/data-center/dgx-platform/',
 'https://www.nvidia.com/en-us/data-center/grace-cpu/',
 'https://www.nvidia.com/en-us/data-center/hgx/',
 'https://www.nvidia.com/en-us/edge-computing/products/igx/',
 'https://www.nvidia.com/en-us/data-center/products/mgx/',
 'https://www.nvidia.com/en-us/data-center/products/ovx/',
 'https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/',
 'https://www.nvidia.com/en-us/solutions/autonomous-vehicles/in-vehicle-computing/',
 'https://www.nvidia.com/en-

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [6]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [7]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [10]:
print(get_links_user_prompt(websiteContent))

Here is the list of links on the website of https://nvidia.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://www.nvidia.com
#page-content
https://www.nvidia.com/en-us/
https://www.nvidia.com/en-us/clara/biopharma/
https://www.nvidia.com/en-us/data-center/dgx-cloud/
https://www.nvidia.com/en-us/gpu-cloud/nemo-llm-service/
https://www.nvidia.com/en-us/omniverse/cloud/
https://docs.nvidia.com/ngc/gpu-cloud/ngc-private-registry-user-guide/index.html
https://www.nvidia.com/en-us/gpu-cloud/
https://www.nvidia.com/en-us/data-center/
https://www.nvidia.com/en-us/data-center/dgx-platform/
https://www.nvidia.com/en-us/data-center/grace-cpu/
https://www.nvidia.com/en-us/data-center/hgx/
https://www.nvidia.com/en-us/edge-computing/products/igx/
https://www.nvidia.com/en-us/data-center/products/mgx/
https://www

In [11]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [12]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/black-forest-labs/FLUX.1-Kontext-dev',
 '/tencent/Hunyuan-A13B-Instruct',
 '/google/magenta-realtime',
 '/nanonets/Nanonets-OCR-s',
 '/google/gemma-3n-E4B-it',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/ilcve21/Sparc3D',
 '/spaces/OmniGen2/OmniGen2',
 '/spaces/tencent/Hunyuan3D-2.1',
 '/spaces/black-forest-labs/FLUX.1-Kontext-Dev',
 '/spaces',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/institutional/institutional-books-1.0',
 '/datasets/EssentialAI/essential-web-v1.0',
 '/datasets/facebook/seamless-interaction',
 '/datasets/FreedomIntelligence/ShareGPT-4o-Image',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammar

In [13]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'blog', 'url': 'https://huggingface.co/blog'},
  {'type': 'company page',
   'url': 'https://www.linkedin.com/company/huggingface/'},
  {'type': 'discussion forum', 'url': 'https://discuss.huggingface.co'},
  {'type': 'status page', 'url': 'https://status.huggingface.co/'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [14]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [15]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/about'}, {'type': 'company page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
black-forest-labs/FLUX.1-Kon

In [16]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [17]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [18]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'discuss page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nblack-forest-labs/FLUX.1-Kontext-dev\nUpdated\nabout 18 hours ago\n•\n12.9k\n•\n826\ntencent/Hunyuan-A13B-Instruct\nUpdated\nabout 12 hours ago\n•\n452\ngoogle/magenta-realtime\nUpdated\n5 days ago\n•\n390\nnanonets/Nanonets-OCR-s\nUpdated\n8 days ago\n•\n202k\n•\n1.22k\ngoogle/gemma-3n-E4B-it\nUpdated\n1 day ago\n•\n5.55k\n•\n222\nBrowse 1M+ models\nSpaces\nRunning\n8.75k\n8.75k\nDeepSite

In [19]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [20]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}]}


```markdown
# Hugging Face Brochure

## Welcome to Hugging Face
**The AI Community Building the Future**

At Hugging Face, we host a collaborative platform that empowers the machine learning (ML) community to innovate and create. With over 1 million models and 250,000 datasets, our mission is to push the boundaries of artificial intelligence through shared knowledge and resources.

---

## Our Offerings
- **Models**: Explore a vast array of machine learning models, from the latest advancements to established classics.
- **Datasets**: Access an extensive collection of datasets tailored for various ML tasks.
- **Spaces**: Engage with applications and tools developed by the community.
- **Enterprise Solutions**: We offer compute and enterprise-grade infrastructure with advanced security features for organizations looking to scale their AI projects.

---

## Our Customers
Hugging Face is trusted by more than 50,000 organizations, including industry leaders like:
- **Meta**
- **Google**
- **Microsoft**
- **Amazon**
- **Intel**

Our client base spans various sectors, including non-profit organizations and tech companies, showcasing the versatility and adaptability of our platform.

---

## Company Culture
We pride ourselves on fostering a diverse and inclusive company culture that nurtures innovation and collaboration. Our community-driven approach emphasizes transparency and participation, allowing individuals from all walks of life to contribute and grow. Our team comprises passionate individuals dedicated to advancing the field of AI and ML.

### Community Engagement
We consistently engage with our community through forums, blogs, and social media platforms, prioritizing open communication and shared learning.

---

## Careers at Hugging Face
We are always looking for smart, driven individuals to join our growing team. At Hugging Face, career possibilities are endless as we constantly innovate and expand our offerings. We value creativity, ambition, and a collaborative spirit. If you're interested in building the future of AI together, explore our [career opportunities](https://huggingface.co/jobs).

---

## Join Us
If you are ready to contribute to a vibrant community that is shaping the future of technology, whether as a customer, investor, or potential team member, Hugging Face is the place for you. 

**Explore our platform today!**

[Visit Hugging Face](https://huggingface.co)

---

_Making machine learning accessible for all._
```

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [21]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [22]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'status page', 'url': 'https://status.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Company Brochure

---

## **About Us**
Hugging Face is a pioneering company in the AI and machine learning community. Our mission is to build the future of AI collaboration by providing a comprehensive platform that connects researchers, developers, and organizations worldwide. From models to datasets and applications, we facilitate discovery and innovation in machine learning, making advanced technology accessible for all.

---

## **Our Offerings**
### **Models**
Explore over 1 million models, including cutting-edge tools from industry leaders like Google, Microsoft, and Amazon. Our models encompass various modalities, including text, image, video, audio, and 3D.

### **Datasets**
Access and share more than 250,000 datasets tailored for every machine learning task. Join our community in building robust datasets to advance research.

### **Spaces**
Utilize our platform’s Spaces to host and collaborate on applications, allowing users to generate and deploy applications quickly.

### **Enterprise Solutions**
With over 50,000 organizations leveraging our tools, we offer enterprise-grade security, access controls, and dedicated support to empower your team in building advanced AI solutions.

- **Compute Services:** Start at $0.60/hour for GPU.
- **Enterprise Pricing:** Starts at $20/user/month.

---

## **Community and Culture**
At Hugging Face, we embrace a vibrant and engaged community committed to open-source values and collaboration. Our team ethos revolves around:

- **Innovation:** Constantly evolving and improving through community feedback.
- **Inclusivity:** Welcoming contributions from diverse backgrounds.
- **Transparency:** Open discussions about our tools and technologies.

Join our community on platforms like GitHub, Twitter, LinkedIn, and Discord to exchange ideas and grow together in AI!

---

## **Careers at Hugging Face**
We are always on the lookout for passionate individuals to join our team. If you believe in the transformative power of AI and want to be part of a supportive and driven community, explore our job openings. We offer flexible work arrangements and a culture that prioritizes growth and innovation.

---

## **Join Us in Building the Future of AI**
Discover the endless possibilities with Hugging Face. Whether you're a researcher, developer, or a business looking to implement AI solutions, we have the tools and community support you need. 

**Contact us today and start your journey with Hugging Face!**

- **Website:** [huggingface.co](https://huggingface.co)
- **Social Media:** [Twitter](https://twitter.com/huggingface) | [LinkedIn](https://www.linkedin.com/company/huggingface/)

---

*Hugging Face - The AI community building the future.*

In [None]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")