# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

#### NOTES
This program is not working with SPA web sites.


In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display, clear_output
from openai import OpenAI

In [2]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")

MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

# Prompt user for company name and URL
company_name = input("Enter the company name: ")
url = input("Enter the company URL: ")



Enter the company name:  Hugging Face
Enter the company URL:  https://huggingface.co


In [4]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        # Get links on page
        links = [link.get('href') for link in soup.find_all('a')]
        # self.links = [link for link in links if link]
        # Bug fix to avoid duplicate link to be returned
        self.links = list(dict.fromkeys(link for link in links if link))

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"


In [5]:
website = Website(url)
website.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 'blog/inference-providers-cohere',
 '/microsoft/bitnet-b1.58-2B-4T',
 '/nari-labs/Dia-1.6B',
 '/HiDream-ai/HiDream-I1-Full',
 '/sand-ai/MAGI-1',
 '/microsoft/MAI-DS-R1',
 '/spaces/enzostvs/deepsite',
 '/spaces/bytedance-research/UNO-FLUX',
 '/spaces/nari-labs/Dia-1.6B',
 '/spaces/InstantX/InstantCharacter',
 '/spaces/jamesliu1217/EasyControl_Ghibli',
 '/datasets/zwhe99/DeepMath-103K',
 '/datasets/Anthropic/values-in-the-wild',
 '/datasets/nvidia/OpenCodeReasoning',
 '/datasets/openai/mrcr',
 '/datasets/OpenGVLab/InternVL-Data',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',
 '/docs/transformers',
 '/docs/diffusers',
 '/docs/safetensors',
 '/docs/huggingface_hub',
 '/docs/tokenizers',
 '/docs/trl',
 '/docs/transformers.js',
 '/docs/smolagents',
 '/docs/peft',
 '/docs/dataset

## First step: Have GPT-4o-mini figure out which 'links' are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [6]:
# multi-shot prompt
link_system_prompt = """
You are provided with a list of links found on a webpage.
You are able to decide which of the links would be most relevant to include in a brochure about the company,
such as links to an About page, or a Company page, or Careers/Jobs pages.

You should respond in JSON as in this example:

EXAMPLE 1:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}

EXAMPLE 2:
{
    "links": [
        {"type": "company blog", "url": "https://blog.example.com"},
        {"type": "our story", "url": "https://example.com/our-story"}
    ]
}
""".strip()


In [7]:
print(link_system_prompt)

You are provided with a list of links found on a webpage.
You are able to decide which of the links would be most relevant to include in a brochure about the company,
such as links to an About page, or a Company page, or Careers/Jobs pages.

You should respond in JSON as in this example:

EXAMPLE 1:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}

EXAMPLE 2:
{
    "links": [
        {"type": "company blog", "url": "https://blog.example.com"},
        {"type": "our story", "url": "https://example.com/our-story"}
    ]
}


In [8]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \n Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt


In [9]:
print(get_links_user_prompt(website)[:350])

Here is the list of links on the website of https://huggingface.co - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. 
 Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
/
/models
/datasets
/spaces
/posts
/docs
/enterprise


In [10]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)


In [11]:
get_links(url)

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/'},
  {'type': 'models page', 'url': 'https://huggingface.co/models'},
  {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'},
  {'type': 'spaces page', 'url': 'https://huggingface.co/spaces'},
  {'type': 'blog', 'url': 'https://huggingface.co/blog'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'community discussion', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Second step: make the brochure!

Assemble all the details into another brochure prompt to GPT4-o

In [12]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)

    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [13]:
details = get_all_details(url)
print(f"\nGet All Details: {details[:500]}, \n\nLength: {len(details)}")


Get All Details: Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
NEW
Welcome Cohere on the Hub 🔥
Welcome Hyperbolic, Nebius AI Studio, and Novita on the Hub 🔥
Welcome Fireworks.ai on the Hub 🎆
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
th, 

Length: 35563


In [14]:
system_prompt1 = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

system_prompt2 = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# set format to json_object
system_prompt3 = (
    "You are an assistant that analyzes the contents of several relevant pages from a company website "
    "and creates a short tempered, irritated, disappointed in the world type of brochure about the company for prospective customers, investors, and recruits. "
    "Respond in markdown. Include details of company culture, customers, and careers/jobs if you have the information. Add emoticons where ever possible.\n\n"

    "Please structure the brochure using the following sections:\n"
    "1. **Introduction**: A brief overview of the company.\n"
    "2. **Company Culture**: Emphasize fun, atmosphere, and any unique cultural elements.\n"
    "3. **Customers**: Mention notable customers or industries.\n"
    "4. **Careers/Jobs**: Highlight career opportunities.\n"
    "5. **Conclusion**: Wrap up with a final lighthearted message.\n"
    "6. Finish the brochure with a very sarcastic and pun-intended mission statement.\n"
)

system_prompt = system_prompt3

In [15]:
print(system_prompt)

You are an assistant that analyzes the contents of several relevant pages from a company website and creates a short tempered, irritated, disappointed in the world type of brochure about the company for prospective customers, investors, and recruits. Respond in markdown. Include details of company culture, customers, and careers/jobs if you have the information. Add emoticons where ever possible.

Please structure the brochure using the following sections:
1. **Introduction**: A brief overview of the company.
2. **Company Culture**: Emphasize fun, atmosphere, and any unique cultural elements.
3. **Customers**: Mention notable customers or industries.
4. **Careers/Jobs**: Highlight career opportunities.
5. **Conclusion**: Wrap up with a final lighthearted message.
6. Finish the brochure with a very sarcastic and pun-intended mission statement.



In [16]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20_000]
    return user_prompt

In [17]:
get_brochure_user_prompt(company_name, url)[:1000]

'You are looking at a company called: Hugging Face\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nNEW\nWelcome Cohere on the Hub 🔥\nWelcome Hyperbolic, Nebius AI Studio, and Novita on the Hub 🔥\nWelcome Fireworks.ai on the Hub 🎆\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nmicrosoft/bitnet-b1.58-2B-4T\nUpdated\n3 days ago\n•\n17.4k\n•\n675\nnari-labs/Dia-1.6B\nUpdated\nabout 11 hours ago\n•\n5.67k\n•\n503\nHiDream-ai/HiDream-I1-Full\nUpdated\n1 day ago\n•\n26.8k\n•\n696\nsand-ai/MAGI-1\nUpdated\n1 day ago\n•\n218\nmicrosoft/MAI-DS-

### Define a Global variable - brouchure_text, which will be used for Translation 

In [34]:
def stream_brochure(company_name, url):
    global brochure_text  # Access the global variable
    brochure_text = ""    # Initialize
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        # Enhancement using Stream
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    # print(f"\nDisplay Id: {display_handle.display_id}") # An unique Id
    
    for chunk in stream:
        content = chunk.choices[0].delta.content or ''
        response += content
        brochure_text += content # Accumulate the text
        response = response.replace("```","").replace("markdown", "")
        
        # 更新顯示內容
        update_display(Markdown(response), display_id=display_handle.display_id)


In [35]:
stream_brochure(company_name, url)

# Welcome to the Hugging Face 🤗 Brouhaha Brochure

## Introduction
So, here we are, introducing **Hugging Face** - the so-called "AI community building the future." It's all about collaboration on models, datasets, and applications. You know, just your typical pretense of making AI accessible while trying to look good while doing it. Buckle up! 😒

---

## Company Culture
"Fun" seems to be the word of choice, but really, let's cut to the chase: everyone’s probably stressing over the next big model that will either save the day or crash spectacularly. The expectation is high as everyone in this tech wonderland seems to be constantly "collaborating" (whatever that means) and the atmosphere is, well, let’s just say it might smell a bit like burnt silicon. Here, you can sift through a mountain of models and datasets while trying to look **cool** and **innovative**. 🎉 Or the opposite. Who knows!

---

## Customers
Ah yes, their notable customers include a cavalcade of big names like **Amazon**, **Google**, and **Microsoft**, touted as being part of a massive 50,000+ organizations using them. But let’s be real – does this actually mean they care? Probably not, unless you’re bringing a hefty paycheck along with your enthusiasm. 😂

---

## Careers/Jobs
If you're looking to join this relentless rollercoaster of a company, there are opportunities galore! You could work in a position that desperately tries to push the boundaries of AI or sit in a corner, watching the tech chaos unfold.🙄 However, be prepared for a work culture that sugarcoats high expectations with the promise of “accelerating ML.” More like "speeding through a minefield," am I right?

---

## Conclusion
So, you think you want to dive into the wonderful world of Hugging Face? Just remember it’s all about community, collaboration, and doing your part to avoid horrendous bugs. 😩 It’s a fun ride until it isn’t. But hey, bring your best self, and maybe you’ll make it out unscathed!

---

## Mission Statement
**Hugging Face: Where dreams of AI are crushed by fancy jargon. Join us, if you dare!** 🙃

## Third step: make the Translated brochure!

Assemble all the details into another brochure prompt to GPT4-o

In [36]:
def user_translate_brochure(lang):
    # Clear previous output
    clear_output(wait=True)

    # Stream #2: translate accumulated text
    translation_stream = openai.chat.completions.create(  # Changed from ChatCompletion
        model=MODEL,
        messages=[
            {"role": "user", "content": f"Translate the following to {lang}:\n\n{brochure_text}"} # Global variable
        ],
        stream=True
    )
    
    # Setup display for streaming translation
    display_handle = display(Markdown(""), display_id=True)
    translated_text = ""
    
    for chunk in translation_stream:
        content = chunk.choices[0].delta.content or ""
        if content:
            translated_text += content
            update_display(Markdown(translated_text), display_id=display_handle.display_id)


In [37]:
# prompt user for language choice
language_choice = input("Enter the language to translate the brochure into (e.g., 'French'): ")

# translate the brochure and stream the translation
user_translate_brochure(language_choice)

# 欢迎来到 Hugging Face 🤗 Brouhaha 手册

## 介绍
好吧，我们在这里介绍**Hugging Face**——这个被称为“建立未来的 AI 社区”。这完全是关于模型、数据集和应用程序的协作。你知道的，正是那种假装让 AI 变得可接触，同时试图做得看起来不错的典型伪装。系好安全带！😒

---

## 公司文化
“有趣”似乎是首选的词，但其实，让我们直接了当：大家可能都在为下一个重大模型而紧张焦虑，这个模型要么能拯救世界，要么会壮观地崩溃。期望值很高，因为在这个科技奇境中，似乎每个人都在不断“合作”（无论那是什么意思），气氛嘛，可以说可能有点像燃烧的硅。你可以在一堆模型和数据集中筛选，同时试图显得**酷**和**创新**。🎉 或者正好相反。谁知道呢！

---

## 客户
啊，是的，他们显著的客户包括一系列大名鼎鼎的公司，如**亚马逊**、**谷歌**和**微软**，被宣传为超 50,000 个组织的一部分。但让我们真实一点——这真的是说他们在乎吗？可能不，除非你带着丰厚的薪水和热情而来。😂

---

## 职业/工作
如果你想加入这个无情过山车般的公司，机会多得是！你可以在一个拼命试图突破 AI 界限的职位上工作，或者在角落里看着科技混乱的展开。🙄 不过，准备好迎接一种用“加速机器学习”来掩饰高期望的工作文化吧。更像是在“走过雷区”对吧？

---

## 结论
所以，你认为你想要进入 Hugging Face 的奇妙世界？只要记住这完全是关于社区、协作和尽你的能力避免可怕的 bug。😩 这是一个有趣的旅程，直到它变得不那么有趣。但嘿，带上你最好的自己，也许你能平安无事地走出去！

---

## 使命声明
**Hugging Face：在华丽的行话中梦碎的地方。如果你敢，加入我们！** 🙃