# Business Problem
## Company sales brochure generator

### - create a product that can generate marketing brochures about a company
####    - for prospective client
####    - for investors
####   - for recruitment

### - the technology
####   - use openai api
####   - use one-shot prompting
####   - stream back results & show with formatting

In [105]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI
import json

# If you get an error running this cell, then please head over to the troubleshooting notebook!

In [5]:
# initialize and constants

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
if api_key and api_key[:8]=='sk-proj-':
    print("API key looks good so far.")
else:
    print("There might be a problem with your API key?")

API key looks good so far.


In [70]:
MODEL = 'gpt-4o-mini'
openai = OpenAI()

In [58]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [59]:
ed = Website("https://edwarddonner.com")

In [60]:
print(ed.get_contents())

Webpage Title:
Home - Edward Donner
Webpage Contents:
Home
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of pr

In [61]:
print(ed.links)

['https://edwarddonner.com/', 'https://edwarddonner.com/outsmart/', 'https://edwarddonner.com/about-me-and-about-nebula/', 'https://edwarddonner.com/posts/', 'https://edwarddonner.com/', 'https://news.ycombinator.com', 'https://nebula.io/?utm_source=ed&utm_medium=referral', 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html', 'https://patents.google.com/patent/US20210049536A1/', 'https://www.linkedin.com/in/eddonner/', 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/', 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/', 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/', 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/', 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/', 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/', 'https://edwarddonner.com/2024/10/16/from-software-engineer-to

In [87]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages. Make Sure to follow below format that consists type and url under links.\n"
link_system_prompt += """
{
    "links": [
    {"type":"about page", "url":"https://full.url/goes/here/about"},
    {"type":"careers page", "url":"https://another.full.url/careers"}
    ]
}
"""

In [88]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, responsd with \
    full https URL in JSON format. Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [89]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, responsd with     full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/
https:

In [90]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role":"system", "content":link_system_prompt},
            {"role":"user", "content":get_links_user_prompt(website)}
        ],
        response_format={"type":"json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [91]:
get_links("https://antropic.com")

{'links': [{'type': 'about page',
   'url': 'https://antropic.com/en-savoir-plus/'},
  {'type': 'contact page', 'url': 'https://antropic.com/contacter-lauteur/'}]}

In [92]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/blog/inference-providers',
 '/deepseek-ai/DeepSeek-R1',
 '/deepseek-ai/Janus-Pro-7B',
 '/mistralai/Mistral-Small-24B-Instruct-2501',
 '/deepseek-ai/DeepSeek-V3',
 '/unsloth/DeepSeek-R1-GGUF',
 '/models',
 '/spaces/deepseek-ai/Janus-Pro-7B',
 '/spaces/tencent/Hunyuan3D-2',
 '/spaces/lllyasviel/iclight-v2',
 '/spaces/Qwen/Qwen2.5-Max-Demo',
 '/spaces/hexgrad/Kokoro-TTS',
 '/spaces',
 '/datasets/open-thoughts/OpenThoughts-114k',
 '/datasets/cognitivecomputations/dolphin-r1',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/ServiceNow-AI/R1-Distill-SFT',
 '/datasets/bespokelabs/Bespoke-Stratos-17k',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/gr

In [93]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}]}

In [96]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [97]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
NEW
Welcome to Inference Providers on the Hub 🔥
smolagents - a smol library to build great agents
Use models from the HF Hub in LM Studio
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Trending on
this week
Models
deepseek-ai/DeepSeek-R1
Updated
4 days ago
•
1.04M
•
6.59k
deepseek-ai/Janus-Pro-7B
Updated
4 days ago
•
175k
•
2.54k
mistralai/Mistral-Small-24B-Instruct-2501
Updated
3 days ago
•
18.4k
•
582
deepseek-ai/DeepSeek-V3
Updated
12 days ago
•
930k
•
3.11k
unsloth/DeepSeek-R1-GGUF
Updated
5 days ago
•
301k
•
525
Browse 1M+ models
Spaces
Running


In [98]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [99]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [100]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nNEW\nWelcome to Inference Providers on the Hub 🔥\nsmolagents - a smol library to build great agents\nUse models from the HF Hub in LM Studio\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nTrending on\nthis week\nModels\ndeepseek-ai/DeepSeek-R1\nUpdated\n4 days ago\n•\n1.04M\n•\n6.6k\ndeepseek-ai/Janus-Pro-7B\nUpdated\n4 days ago\n•\n175k\n•\n2.54k\nmistralai/Mistral-Small-24B-Instruct-2501\nUpdated\n3 days ago\n•\n18.4k\n•\n582\ndeepseek-ai/DeepSeek-V3\nUpdated\n12 days ago\n•\n930k\n•\n3.11k\nunsloth/DeepSee

In [101]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [102]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}]}


# Hugging Face Brochure

---

## **Welcome to Hugging Face**

### **The AI Community Building the Future**

Hugging Face is a pioneering platform where the machine learning community collaborates on cutting-edge models, comprehensive datasets, and innovative applications. We cater to over 50,000 organizations including giants like Meta, Google, and Microsoft, providing a vibrant space for sharing ideas and advancing AI technologies. 

---

## **What We Offer**

- **Models**: Our hub hosts over **1 million models** that facilitate advanced machine learning tasks across various domains.
- **Datasets**: Access and share **250,000+ datasets** essential for your research and development.
- **Spaces**: Run **400,000+ applications** seamlessly and showcase your projects in a collaborative environment.

---

## **Why Choose Hugging Face?**

- **Open Source**: We believe in building a robust foundation for machine learning tools through community engagement. We provide tools like Transformers, Diffusers, and Tokenizers to empower developers.
- **Enterprise Solutions**: Our robust **enterprise-grade platform** ensures security and dedicated support for businesses looking to leverage AI in their operations.
- **Innovation**: We continuously update our offerings, including our recently launched Inference Providers on the Hub and targeted solutions for deployment and training.

---

## **Company Culture**

At Hugging Face, we foster an inclusive, collaborative, and innovative culture that encourages creativity and the free exchange of ideas. Our community-driven approach is central to our mission, ensuring that everyone has a voice in the future of AI.

---

## **Join Our Community**

- **Careers**: We are constantly seeking passionate and skilled individuals to join our diverse team. Whether you are a developer, researcher, or business professional, you can contribute to our mission of AI democratization. 
- **Engagement**: Join a global network of practitioners and enthusiasts in our forums and social channels, where independent projects and collaborative work thrive.

---

### **Get Started Today!**

Explore our offerings or sign up to join the community. Whether you're looking to collaborate, deploy a new AI model, or enhance your skills, Hugging Face is here to support your journey.

**Follow us on:**  
- [Twitter](https://twitter.com/huggingface)  
- [LinkedIn](https://www.linkedin.com/company/huggingface)  
- [GitHub](https://github.com/huggingface)

---

**Hugging Face** - Together, let’s build the future of AI.  

For more information, visit [huggingface.co](https://huggingface.co)

In [103]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [106]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}]}


# Hugging Face Brochure

---

## Welcome to Hugging Face!

**The AI Community Building the Future**

At Hugging Face, we aim to revolutionize the machine learning landscape by providing a platform where the community collaborates on models, datasets, and applications. Our vision is centered around collaboration, innovation, and empowering individuals and organizations to harness the power of AI.

---

## What We Offer

### **Models**
- Access over **1 million models** ranging from state-of-the-art transformers to custom-built solutions.
- Track trending models updated regularly to stay at the forefront of AI research.

### **Datasets**
- Explore an extensive library of **250,000+ datasets** optimized for various applications including NLP, computer vision, and audio processing.

### **Spaces**
- Collaborate and run applications with an impressive array of **400,000+ applications** built within our ecosystem.

### **Enterprise Solutions**
Provide your team with:
- **Enterprise-grade security**
- Dedicated support
- Access to advanced ML tools
- Competitive pricing starting at **$20/user/month**.

 *Over 50,000 organizations including industry leaders like **Google, Microsoft, and Amazon Web Services** trust us.*

---

## Company Culture

At Hugging Face, we foster an open and inclusive community where innovation thrives. We believe in the open-source ethos, which means we build and share our tools with the global community. Our culture is defined by:

- **Collaboration:** We encourage partnerships and knowledge sharing among our users.
- **Open Source:** Our commitment to transparency and accessibility drives us to provide robust open-source ML tools.
- **Continuous Learning:** We support our team and community members with resources to learn and adapt in this fast-paced industry.

---

## Careers at Hugging Face

Join us in shaping the future of AI! We are constantly looking for passionate individuals who are eager to contribute to our mission. Opportunities range from technical roles in AI development to community management and beyond.

- **Why Work With Us?**
  - Be part of a pioneering team at the forefront of AI research.
  - Collaborate with top-tier talent in a supportive environment.
  - Contribute to impactful projects that have the potential to change the world.

Explore our [Career Page](https://huggingface.co/jobs) to learn more.

---

## Connect with Us

Stay updated with Hugging Face through our social channels:
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://linkedin.com/company/huggingface)
- [GitHub](https://github.com/huggingface)

Join us in building the future of AI. Get involved today!

---

**Hugging Face** - Where the AI community collaborates, innovates, and shapes a better future for all.