# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [1]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [None]:
ed = Website("https://www.geeksforgeeks.org/machine-learning/machine-learning/")
ed.links

## Using GPT-4o-mini to figure out the relevant links !

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.


In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [6]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [7]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [None]:
print(get_links_user_prompt(ed))

In [15]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [None]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

In [None]:
get_links("https://huggingface.co")

## Make the brochure!

Assemble all the details into another prompt to GPT4-o

In [16]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [None]:
print(get_all_details("https://huggingface.co"))

In [18]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [19]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [None]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

In [21]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [22]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}


# Hugging Face Brochure

---

## **Welcome to Hugging Face**

### **The AI Community Building the Future**

At Hugging Face, we are committed to revolutionizing the field of Artificial Intelligence through community collaboration and cutting-edge technology. Our platform serves as a vital hub for machine learning enthusiasts and professionals to create, discover, and share models, datasets, and applications.

---

## **What We Offer**

### **Models & Datasets**
- **1M+ Models**: Browse an extensive repository of state-of-the-art models such as Qwen and innovative audio generation models.
- **250K+ Datasets**: Access and share a diverse range of datasets for any machine learning tasks.

### **Spaces**
- Create, run, and share your applications seamlessly with over 11k active Spaces, including various examples for coding and audio processing.

### **Community Collaboration**
Join a thriving community that includes over 50,000 organizations, like Google, Meta, Amazon, and Microsoft, leveraging our open-source technology for transformative AI applications.

---

## **Our Culture**

At Hugging Face, we believe in fostering a culture of **collaboration, innovation**, and **inclusivity**. We are on a mission to democratize machine learning and empower creators and developers of all backgrounds to contribute to the future of AI. Our open-source philosophy encourages sharing knowledge and building tools together.

---

## **Why Choose Hugging Face?**

- **Enterprise Solutions**: Tailored offerings for businesses that require advanced features, including security, dedicated support, and enhanced collaboration tools.
- **Open Source Commitment**: With a robust collection of essential libraries and frameworks—like Transformers, Diffusers, and Tokenizers—we harness the collective power of community contributions.
- **Accelerated Learning & Development**: Leverage our user-friendly platform to enhance your machine learning skills and build your portfolio.

---

## **Careers at Hugging Face**

Join our vibrant team at Hugging Face! We are constantly looking for enthusiastic individuals passionate about AI and sharing knowledge. Benefits of being a part of our team include:

- **Dynamic Work Environment**: Collaborate with like-minded innovators in a flexible work setting.
- **Professional Growth**: Access learning opportunities and growth paths in cutting-edge AI research and technology.
- **Impactful Work**: Contribute to industry-leading projects that shape the future of AI and machine learning.

Explore current openings on our [Jobs page](https://huggingface.co/jobs).

---

## **Get Involved**

Join us on our mission to build the future of AI! Whether you are a company, a researcher, or an aspiring developer, there’s a place for you in our community. 

Feel free to sign up on our [website](https://huggingface.co) and embark on your AI journey with Hugging Face today!

---

### **Connect with Us!**
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://linkedin.com/company/huggingface)
- [Discord](https://discord.gg/huggingface)

---

Together, let's build a future where AI is accessible and beneficial for all!

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [23]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [24]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## Welcome to Hugging Face

At Hugging Face, we are on a mission to build the future of artificial intelligence. We are an enthusiastic AI community dedicated to developing, sharing, and enhancing machine learning models, datasets, and applications. Our platform serves as a collaborative space for machine learning engineers, researchers, and enthusiasts alike.

### Our Offerings

- **Models**: Access over 1 million models across various domains, including text, image, video, and audio. Our trending models like Qwen/Qwen3 Coder and BosonAI's Higgs audio showcase state-of-the-art capabilities continually updated for optimal performance.

- **Datasets**: Browse through an extensive collection of over 250,000 datasets curated for diverse ML tasks, ensuring a rich resource pool for any project requirement.

- **Spaces**: Explore innovative applications running on our platform, such as the Higgs Audio Demo and DeepSite v2, which allow users to generate and test their ML applications seamlessly.

- **Enterprise Solutions**: Offer your organization the best in AI development with our paid compute and enterprise options. Our enterprise solutions come with dedicated support, security features, and compliance options to cater to teams of any size.

### Community and Collaboration

Hugging Face thrives on collaboration, providing a space where over 50,000 organizations, including major players like Google, Meta, and Microsoft, join forces to fuel innovation in AI. Our tools promote knowledge sharing and community building, enabling you to connect with likeminded individuals to push the boundaries of what is possible with machine learning.

### Careers at Hugging Face 

Join our diverse and vibrant team! At Hugging Face, we are not only building cutting-edge AI tools but also fostering a company culture that is inclusive, innovative, and community-focused. We value creativity, collaboration, and continuous learning, providing our team with numerous opportunities to grow and make an impact in the world of AI.

### Why Choose Hugging Face?

- **Cutting-Edge Technology**: We provide state-of-the-art ML tools and frameworks, empowering you to build and deploy your AI applications with ease.
  
- **Open Source Commitment**: We are committed to open-source principles, which reverberates through our extensive library of tools like Transformers, Diffusers, and Tokenizers designed for seamless integration in research and production.

- **User-Centric Approach**: Whether you are just starting out or you are an expert, our user-friendly platform is intuitive and designed to meet users' varied needs.

### Get Started Today!

Become part of our thriving community and start your journey in AI with Hugging Face. Explore our models, datasets, and applications, or join our enterprise solutions for a more tailored experience. 

**[Sign Up Now](#)** to unleash the potential of AI!

---
Contact us at [Twitter](https://twitter.com/huggingface), [LinkedIn](https://www.linkedin.com/company/huggingface/), or visit our website for more information. Together, let's build the future of AI!