## Business Statement:
To Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.
We will be provided a company name and their primary website.

## Imports

In [1]:
# Importing Libraries

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

## Initialize and constants

In [11]:
## Loading ENV file
load_dotenv(override=True)
API_KEY = os.getenv("OPENAI_API_KEY")

if API_KEY and API_KEY.startswith('sk-proj-') and len(API_KEY)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


## Creating a Website Class

In [12]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [13]:
## Checking the links

ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'ht

## Getting Relevant links

In [14]:
## Writing the prompt

link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a \
brochure about the company,such as links to an About page, or a Company page, or \
Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company,such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [15]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [16]:
get_links_user_prompt(ed)

'Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.\nLinks (some might be relative links):\nhttps://edwarddonner.com/\nhttps://edwarddonner.com/connect-four/\nhttps://edwarddonner.com/outsmart/\nhttps://edwarddonner.com/about-me-and-about-nebula/\nhttps://edwarddonner.com/posts/\nhttps://edwarddonner.com/\nhttps://news.ycombinator.com\nhttps://nebula.io/?utm_source=ed&utm_medium=referral\nhttps://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html\nhttps://patents.google.com/patent/US20210049536A1/\nhttps://www.linkedin.com/in/eddonner/\nhttps://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/\nhttps://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/\nhttps://edwarddonner

In [20]:
## Getting Response from Model

def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [21]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Creating a Broachure

In [24]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [25]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/about'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}
Landing page:
Webpage Title:
Hugging Face ‚Äì The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+

In [26]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [27]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [28]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'models page', 'url': 'https://huggingface.co/models'}, {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'}, {'type': 'spaces page', 'url': 'https://huggingface.co/spaces'}, {'type': 'posts page', 'url': 'https://huggingface.co/posts'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face ‚Äì The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nmicrosoft/Phi-4-multimodal-instruct\nUpdated\nabout 15 hours ago\n‚Ä¢\n37.6k\n‚Ä¢\n844\nWan-AI/Wan2.1-T2V-14B\nUpdated\n7 days ago\n‚Ä¢\n165k\n‚Ä¢\n829\ndeepseek-ai/DeepSeek-R1\nUpdated\n9 days ago\n‚Ä¢\n4.26M\n‚Ä¢\n10.8k\nallenai/olmOCR-7B-0225-preview\nUpdated\n8 days ago\n‚Ä¢\n84k\n‚Ä¢\n417\nmicrosoft/Phi-4-mini-instruct\nUpdated\nabout 10 hours ago\n‚Ä¢\n47.7k\n‚Ä¢\n271\nBrowse 1M+ model

In [29]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [30]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}]}


# Hugging Face Brochure

## The AI Community Building the Future

Welcome to Hugging Face, a vibrant platform designed for the machine learning community, where collaboration flourishes around models, datasets, and application development. Join thousands of innovators and researchers who are actively shaping the future of AI.

### Our Offerings

- **Collaborative Model Hosting**: Share and discover over **1 million models** optimized for various machine learning tasks.
- **Extensive Datasets**: Access **250,000+ datasets** for diverse applications, perfect for researchers and developers.
- **Innovative Spaces**: Utilize or contribute to a collection of **400,000+ applications**, from text generation to image recognition.
- **Enterprise Solutions**: Leverage our enterprise-grade security, access controls, and dedicated support for seamless deployment.

### Our Customers

With more than **50,000 organizations** relying on our platform, Hugging Face supports a diverse clientele, including top-tier companies such as:
- **Meta** - AI at Meta
- **Amazon**
- **Google**
- **Microsoft**
- **Intel**

### Company Culture

At Hugging Face, community is at the heart of our mission. Our culture is characterized by:
- **Open Source Collaboration**: We believe in building a future powered by collective knowledge and open-source contributions. Join us in enhancing tools like Transformers, Diffusers, and more.
- **Innovation and Learning**: We embrace a culture of continuous learning and experimentation, encouraging team members to explore new ideas in AI and ML.
- **Supportive Environment**: We value diversity and foster an inclusive atmosphere where everyone can thrive and contribute meaningfully.

### Careers at Hugging Face

We are always looking for talented individuals passionate about machine learning and AI technology. Whether you‚Äôre a seasoned developer, researcher, or an enthusiastic newcomer, explore job opportunities in roles like:
- **Machine Learning Engineer**
- **Data Scientist**
- **Product Manager**

Join us in our mission to democratize AI and make its benefits accessible to all.

### Join Us Today!

Explore Hugging Face and be part of a growing community of innovators. Create, discover, and collaborate to accelerate your machine learning journey. 

- [Explore Models](https://huggingface.co/models)
- [Sign Up Now](https://huggingface.co/join)

Together, let‚Äôs build the future of AI!

---

### Connect With Us
- [GitHub](https://github.com/huggingface)
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://linkedin.com/company/huggingface)
- [Discord](https://discord.com/invite/huggingface)

---

*Hugging Face - Where AI innovation meets community collaboration.*

In [31]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [33]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'join page', 'url': 'https://huggingface.co/join'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

---

### Welcome to Hugging Face

**The AI Community Building the Future**

Hugging Face is a collaborative platform at the forefront of the machine learning revolution. We unite a vibrant community of innovators, researchers, and developers to create and share advanced machine learning models, datasets, and applications. Our mission is to democratize AI and empower users of all backgrounds to harness its potential in transformative ways.

---

### What We Offer

**Explore**: With over 1 million models and 250,000 datasets available, Hugging Face provides a rich resource for anyone interested in AI. From cutting-edge text, image, and audio processing systems to applications in diverse fields, our environment encourages innovation and exploration.

**Collaborate**: Join a community that thrives on collaboration. Share your work, contribute to open-source projects, and build your reputation in the machine learning space. Collaborate in "Spaces" to host and run applications seamlessly.

**Enterprise Solutions**: For organizations looking for advanced AI solutions, we provide enterprise-grade services with robust security, access controls, and dedicated support. Our enterprise offerings start at $20/user/month, making it easier for teams to deploy and scale AI applications securely.

---

### Company Culture

At Hugging Face, we foster a culture that promotes **innovation, inclusion, and collaboration**. Our diverse team is dedicated to building tools that make AI accessible to everyone. We celebrate knowledge sharing and encourage continuous learning in an open and friendly environment. We believe that the best ideas come from collaboration and diverse perspectives.

---

### Our Customers

More than **50,000 organizations** are leveraging Hugging Face's resources, including major players like:
- **Meta**
- **Amazon**
- **Google**
- **Microsoft**
- **Intel**

These organizations rely on our platform to push the boundaries of AI research and application development.

---

### Careers at Hugging Face

Join us in shaping the future of AI! We are looking for passionate individuals to grow our team. At Hugging Face, you will work with some of the brightest minds in the field, tackling challenging problems in a supportive environment. Whether you're a developer, researcher, or marketer, there‚Äôs a place for you to contribute.

Check our [Careers Page](#) for current job openings and learn how you can be part of the exciting journey at Hugging Face.

---

### Join the Hugging Face Community

Come explore our platform, share your insights, and transform the AI landscape with us. Sign up today to start collaborating, modeling, and innovating!

**[Explore Hugging Face](#)**

---

For more information, follow us on our social platforms:
- [GitHub](#)
- [Twitter](#)
- [LinkedIn](#)
- [Discord](#)

Together, let‚Äôs build the future of AI!