# A full business solution


### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.


In [3]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [4]:
# Initialize and constants

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [5]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [6]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/',
 'https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/',
 'https://edwarddonner.com/2024/08/06/outsmart/',
 'https://edwarddonner.com/2024/08/06/outsmart/',
 'https://edwarddonner.com/2024/06/26/choosing-the-right-llm-resources/

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [7]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [8]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [9]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [10]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2024/11/13/llm-engineering-resources/
https://edwarddonner.com/2024/11/13/llm-engineering-resources/
https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/
https://edwarddonner

In [11]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [12]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/meta-llama/Llama-3.3-70B-Instruct',
 '/tencent/HunyuanVideo',
 '/Datou1111/shou_xin',
 '/black-forest-labs/FLUX.1-dev',
 '/Qwen/QwQ-32B-Preview',
 '/models',
 '/spaces/JeffreyXiang/TRELLIS',
 '/spaces/ginipick/FLUXllama',
 '/spaces/multimodalart/flux-style-shaping',
 '/spaces/Kwai-Kolors/Kolors-Virtual-Try-On',
 '/spaces/Yuanshi/OminiControl',
 '/spaces',
 '/datasets/HuggingFaceFW/fineweb-2',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/CohereForAI/Global-MMLU',
 '/datasets/O1-OPEN/OpenO1-SFT',
 '/datasets/amphora/QwQ-LongCoT-130K',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',
 '/docs/transformers',
 '/docs/diffuse

In [13]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'community page', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [14]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [17]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/about'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}]}
Landing page:
Webpage Title:
Hugging Face ‚Äì The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Trending on
this week
Models
meta-llama/Llama-3.3-70B-Instruct
Updated
3 days ago
‚Ä¢
102k
‚Ä¢
936
tencent/HunyuanVideo
Updated
6 days ago
‚Ä¢
3.73k
‚Ä¢
978
Datou1111/shou_xin
Updated
4 days ago
‚Ä¢
7.84k
‚Ä¢
273
black-forest-labs/FLUX.1-dev

In [18]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [19]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [20]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'contact/discussion page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}


"You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face ‚Äì The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nTrending on\nthis week\nModels\nmeta-llama/Llama-3.3-70B-Instruct\nUpdated\n3 days ago\n‚Ä¢\n102k\n‚Ä¢\n937\ntencent/HunyuanVideo\nUpdated\n6 days ago\n‚Ä¢\n3.73k\n‚Ä¢\n978\nDatou1111/shou_xin\nUpdated\n4 days ago\n‚Ä¢\n7.84k\n‚Ä¢\n273\nblack-forest-labs/FLUX.1-dev\nUpdated\nAug 16\n‚Ä¢\n1.38M\n‚Ä¢\n7.22k\nQwen/QwQ-32B-Preview\nUpdated\n14 days ago\n‚Ä¢\n92.8k\n‚Ä¢\n1.26k\nBrowse 400k+ models\nSpaces\nRunning\non\nZero\n1.08k\nüè¢\nTRELLIS\nScalable and Versati

In [21]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [22]:
create_brochure("HuggingFace", "https://huggingface.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.com/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://hugginface.com/enterprise'}]}


# Hugging Face Brochure

## Overview
### Hugging Face ‚Äì The AI community building the future
Hugging Face is a leading platform in the field of artificial intelligence and machine learning, dedicated to creating a collaborative environment where the community can discover, share, and build upon state-of-the-art models and datasets. With over **400,000 models** and **100,000 datasets**, Hugging Face empowers users to harness the power of machine learning across various modalities including text, images, video, audio, and even 3D creations.

## Our Mission
We are on a mission to **democratize good machine learning**, believing that machine learning tools should be accessible for everyone, enabling innovation and collaboration across various sectors. 

## Our Offerings
Hugging Face provides a robust suite of resources:
- **Models**: Access a diverse range of pre-trained models such as the **Llama-3.3-70B-Instruct** and **HunyuanVideo**, and contribute to the evolution of machine learning.
- **Datasets**: Explore and share extensive datasets tailored for multiple applications including computer vision and natural language processing.
- **Spaces**: Deploy and collaborate on applications in a scalable manner.
- **Enterprise Solutions**: Tailored offerings for organizations seeking advanced tools with added security, user access controls, and dedicated support.

### Trending Resources
- **Models**: Stay updated with trending models being actively developed by our community.
- **Spaces**: Discover innovative applications powered by machine learning.

## Company Culture
Hugging Face fosters a vibrant and inclusive company culture that emphasizes collaboration, open-source ethos, and community involvement. With over **224 team members**, our workforce is composed of diverse talents united by a shared purpose and commitment to advancing AI technology. 

We actively encourage our team to contribute to the open-source community, aligning with our mission to create a **community-driven ecosystem of machine learning tools**. 

### Join Us
Interested in being part of our mission? We‚Äôre continuously looking for passionate individuals to join our team. Positions vary across technical, operational, and creative roles, offering the opportunity to shape the future of AI.

## Our Customers
Hugging Face serves **over 50,000 organizations**, including industry giants like **Amazon Web Services, Google, Microsoft**, and several startups striving to leverage AI for transformative purposes. 

## Careers at Hugging Face
Explore job openings across various domains:
- Software Engineering
- Data Science
- Community Management
- Operations and Marketing

Be a part of a dynamic team working on groundbreaking AI technology. 

## Get Started
Join our community today by signing up on our platform or explore our offerings to see how you can leverage machine learning for your own projects.

[**Visit Hugging Face**](https://huggingface.co/)

---

**Connect with Us:**
- GitHub
- Twitter
- LinkedIn
- Discord

Follow us to stay updated on the latest in AI and machine learning!

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [23]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [24]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## About Us
**Hugging Face is a pioneering AI community dedicated to building the future of machine learning (ML).** Our platform serves as a collaborative space where researchers, developers, and organizations come together to share models, datasets, and applications. With over 400,000 models and 100,000 datasets, Hugging Face is the go-to hub for ML innovation.

## Our Mission
At Hugging Face, we aspire to accelerate ML research through collaboration and open-source tools. Our commitment to the community empowers creators worldwide to innovate and build cutting-edge AI solutions.

## Key Offerings

### Models & Datasets
- **Explore 400k+ Models:** Access a diverse range of machine learning models, including state-of-the-art options for text, image, video, and audio processing.
- **100k+ Datasets:** Leverage a rich library of datasets that cater to various ML tasks, or contribute your own to help expand the community's resources.

### Spaces
- Engage with more than 150k applications hosted on Hugging Face Spaces, providing users a platform to build and present their work efficiently. 

### Enterprise Solutions
- **For Organizations:** Get dedicated support and advanced tools with our Enterprise plans, which offer features such as Single Sign-On, priority support, and secure access controls.

## Who We Serve
Hugging Face is trusted by more than 50,000 organizations, including tech giants like Amazon, Google, Microsoft, and Meta. Our diverse clientele ranges from non-profits to large corporations, all leveraging our tools for AI advancements.

## Company Culture
At Hugging Face, we celebrate a culture of openness and collaboration. We believe in harnessing the collective intelligence of the community to drive innovation in AI. Our team is passionate about:

- **Open Source:** We foster an ecosystem where shared knowledge leads to groundbreaking technologies.
- **Innovation:** Continuous learning and experimentation are at the heart of our work ethic. 
- **Inclusivity:** We are dedicated to creating an environment where every voice is heard and valued.

## Careers at Hugging Face
Join our growing team and help shape the future of AI! We are on the lookout for talented individuals across various domains, including but not limited to:

- Machine Learning Engineering
- Research & Development
- Software Development
- Community Management

If you're passionate about AI and eager to make an impact, explore our [career opportunities](#) and become part of our mission!

## Join Us Today!
Discover the possibilities at Hugging Face and become a part of a vibrant community dedicated to pushing the boundaries of machine learning.

- [Sign Up Now](#)
- [Explore Our Models](#)
- [Browse Datasets](#)
- [Learn More About Our Solutions](#)

**Hugging Face ‚Äì The AI community building the future!**

In [25]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'company page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'github page', 'url': 'https://github.com/huggingface'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'twitter page', 'url': 'https://twitter.com/huggingface'}]}


# Hugging Face Brochure

## Welcome to Hugging Face
**The AI community building the future.**  
Hugging Face is at the forefront of the machine learning revolution, providing a collaborative platform where enthusiasts, researchers, and developers work together on **models**, **datasets**, and **applications**.

---

## Our Offerings
- **Models**: Over **400k** models available for various applications, including trending models such as **Llama-3.3** and **HunyuanVideo**.  
- **Datasets**: Access to more than **100k datasets**, helping users find the right data to fuel their ML projects.  
- **Spaces**: A creative playground with **150k+ applications** for testing and deploying ML projects.  

### Accelerate Your ML Journey
Boost your machine learning efforts using our **open-source stack**. Deploy on optimized inference endpoints or update applications in just a few clicks ‚Äì starting at **$0.60/hour for GPU**.

### Enterprise Solutions
Hugging Face offers enterprise-grade solutions ideal for businesses including:
- **Dedicated Support**
- **Single Sign-On**
- **Priority Access Controls**

Plans start at **$20/user/month**.

---

## Company Culture
At Hugging Face, we foster a culture of **collaboration and innovation**. We believe in:
- **Open Source**: Building the ML tooling foundation with community input.
- **Diversity and Inclusion**: Creating a welcoming and safe environment for every contributor.
- **Continuous Learning**: Encouraging team members to share knowledge and grow their ML skills.

---

## Our Customers
Join a community of over **50,000 organizations**, including top names like:
- **AI at Meta**
- **Amazon Web Services**
- **Google**
- **Microsoft**

These companies trust Hugging Face to power their AI initiatives.

---

## Careers at Hugging Face
Interested in joining our dynamic team? We offer a variety of positions for creative and driven individuals. Whether you're a developer, a data scientist, or an operations expert, if you're passionate about AI, we want to hear from you!  

Check our **Jobs** page for current openings.

---

## Get Started Today!
Become part of the future of AI at Hugging Face.  
- **[Sign Up](https://huggingface.co/join)**  
- **[Explore Our Models](https://huggingface.co/models)**  
- **[Learn More About Our Services](https://huggingface.co/enterprise)**

*Together, we‚Äôre building the future of AI with the power of collaboration.*