This program will build a Brochure for a company

Import Statements

In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

Load OpenAI API Key

In [2]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


WebPage Class

In [3]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [7]:
ed = Website("https://edwarddonner.com")
print(ed.get_contents())
ed.links

Webpage Title:
Home - Edward Donner
Webpage Contents:
Home
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of pr

['https://edwarddonner.com/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/',
 'https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/',
 'https://edwarddonner.com/

System Prompt

In [8]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [9]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



User Prompt

In [10]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [13]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/
https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/
https://edwarddonner.com/2024/11/13/llm-engineering-resources/
https://edwarddonner.com/2024/11/13/ll

Get Links Function

In [11]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [14]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/hexgrad/Kokoro-82M',
 '/openbmb/MiniCPM-o-2_6',
 '/deepseek-ai/DeepSeek-R1',
 '/MiniMaxAI/MiniMax-Text-01',
 '/microsoft/phi-4',
 '/models',
 '/spaces/hexgrad/Kokoro-TTS',
 '/spaces/JeffreyXiang/TRELLIS',
 '/spaces/lllyasviel/iclight-v2',
 '/spaces/Kwai-Kolors/Kolors-Virtual-Try-On',
 '/spaces/FaceOnLive/Face-Search-Online',
 '/spaces',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/NovaSky-AI/Sky-T1_data_17k',
 '/datasets/HumanLLMs/Human-Like-DPO-Dataset',
 '/datasets/DAMO-NLP-SG/multimodal_textbook',
 '/datasets/FreedomIntelligence/medical-o1-reasoning-SFT',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',
 '/docs/tran

In [15]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'discussion page', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}

Get All Details function

In [17]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

# We pass in a URL
# the function goes to the landing page
# Then lists the contents of the landing page
# Then calls the previously made function calling gpt 4-0 mini
# Prints that the links are found
# Then goes through the list of links and performs the same process again

In [18]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'homepage', 'url': 'https://huggingface.co'}, {'type': 'about page', 'url': 'https://huggingface.co/about'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, 

New System Prompt

In [19]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

Get Brochure Function

In [20]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [21]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'home page', 'url': 'https://huggingface.co'}, {'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'support/community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nTrending on\nthis week\nModels\nhexgrad/Kokoro-82M\nUpdated\n3 days ago\n•\n27.4k\n•\n2.09k\nopenbmb/MiniCPM-o-2_6\nUpdated\nabout 15 hours ago\n•\n18.3k\n•\n682\ndeepseek-ai/DeepSeek-R1\nUpdated\nabout 9 hours ago\n•\n616\nMiniMaxAI/MiniMax-Text-01\nUpdated\n4 days ago\n•\n2.7k\n•\n438\nmicrosoft/phi-4\nUpdated\n12 days ago\n•\n134k\n•\n1.47k\nBrowse 400k+ models\nSpaces\nRunning\non\nZero\n1.24k\n❤️\nKokoro TTS\nNow in 5 languages!\nRunning\non\nZero\n3.16k\n🏢\n

Create Brochure Function

In [22]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [23]:
create_brochure("HuggingFace", "https://huggingface.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.com/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

---

## Welcome to Hugging Face

**The AI community building the future.**

At Hugging Face, we are a vibrant machine learning community that empowers innovators, researchers, and developers to collaborate seamlessly on cutting-edge models, datasets, and applications. Our platform serves as the heartbeat of AI and machine learning advancement, allowing members to create, discover, and share their work with the world.

---

## Our Offerings

### Collaboration Platform
- **Models**: Over 400k models available for exploration and collaboration, including state-of-the-art solutions that run on various modalities like text, image, video, and audio.
- **Datasets**: Access a vast collection of over 100k datasets optimized for a variety of machine learning tasks.
- **Spaces**: Host and run applications effortlessly, with a focus on user experience and scalability.

### Innovations from Hugging Face
- **Transformers**: Leverage the leading ML library compatible with Pytorch, TensorFlow, and JAX.
- **Diffusers**: Explore advanced diffusion models for generating images and audio.
- **Tokens and more**: Fast tokenizers and tools designed for both research and production settings.

---

## Who We Serve

With over **50,000 organizations** utilizing our platform, including industry giants like Meta, Amazon Web Services, Google, and Microsoft, we foster a rich collaborative environment for individuals and enterprises alike. Whether you are a developer, data scientist, or enterprise leader, Hugging Face has the resources to meet your AI and ML needs.

---

## Company Culture

At Hugging Face, we believe in an open-source ethos and community-driven development. We prioritize collaboration, transparency, and continuous learning, fostering a work environment where diversity of thought is celebrated, and innovation thrives. Our mission is not just to provide artificial intelligence tools but to empower every contributor in this exciting field.

---

## Careers at Hugging Face

We are always on the lookout for passionate individuals to join our growing team! Whether you're a developer, researcher, or creative thinker, explore various opportunities to contribute to an exciting and dynamic company focused on shaping the future of AI. 

- **Current Opportunities**: [Explore Jobs](https://huggingface.co/jobs)

---

## Join Us

Become a part of a forward-thinking community that is dedicated to advancing the world of artificial intelligence. Collaborate, innovate, and share your creations with Hugging Face. 

- **Sign Up Today**: [Get Started](https://huggingface.co/join)

---

### Stay Connected
Follow us on:
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://linkedin.com/company/huggingface)
- [GitHub](https://github.com/huggingface)

For more information, visit our website: [Hugging Face](https://huggingface.co)

--- 

**Hugging Face: The AI community building the future.**

Improvement to stream the results from OpenAI

In [24]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [25]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'home page', 'url': 'https://huggingface.co'}, {'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

---

## About Us

**Hugging Face** is at the forefront of the AI community, pioneering a collaborative platform that unites individuals and organizations in their machine learning endeavors. Our aim is to empower researchers, developers, and enterprises to create, discover, and share models, datasets, and applications, all while accelerating innovation in AI.

## Our Offerings

- **Models:** Access over 400,000 models to jumpstart your machine learning projects.
- **Datasets:** Explore a rich repository of more than 100,000 datasets tailored for various tasks.
- **Spaces:** Collaborate and showcase applications in a user-friendly environment.
- **Enterprise Solutions:** Delivering advanced platforms for organizations that require robust performance and security.

### Pricing

- **Compute Services:** Starting from $0.60/hour for GPU usage.
- **Enterprise Solutions:** Starting at $20/user/month, with features like Single Sign-On and priority support.

## Our Customers

Hugging Face is proud to serve over **50,000 organizations**, including industry giants such as:

- **Meta**
- **Amazon Web Services**
- **Google**
- **Microsoft**
  
These companies leverage our tools and community for cutting-edge AI development.

## Open Source Commitment

We are dedicated to building a strong foundation for ML tooling through community collaboration. Our open-source projects include:

- **Transformers:** Over 137,000 models for Pytorch, TensorFlow, and JAX.
- **Diffusers:** State-of-the-art models for image and audio generation.
- **Safetensors:** A secure method for storing and distributing neural network weights.

## Company Culture

At Hugging Face, we foster a **supportive and inclusive environment** where creativity thrives. Our culture encourages collaboration, knowledge sharing, and pushing boundaries to explore new horizons in AI. We believe in the power of community and aim to build a place where everyone has a voice.

## Careers

Looking to make an impact in the AI space? Hugging Face is always on the lookout for passionate individuals to join our team. We offer various roles catering to different skill sets, with opportunities for growth and innovation.

---

**Join us in building the future of AI!**

[Sign Up Now](#) | [Explore Our Resources](#)

--- 

Feel free to reach out for more information on how Hugging Face can assist you in your AI journey!