# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [4]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'ht

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [None]:
print(link_system_prompt)

In [6]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [7]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2024/12/21/

In [9]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [10]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/meta-llama/Llama-4-Scout-17B-16E-Instruct',
 '/Qwen/Qwen2.5-Omni-7B',
 '/deepseek-ai/DeepSeek-V3-0324',
 '/meta-llama/Llama-4-Maverick-17B-128E-Instruct',
 '/reducto/RolmOCR',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/jamesliu1217/EasyControl_Ghibli',
 '/spaces/VAST-AI/TripoSG',
 '/spaces/Stable-X/Hi3DGen',
 '/spaces/VIDraft/Open-Meme-Studio',
 '/spaces',
 '/datasets/nvidia/OpenCodeReasoning',
 '/datasets/open-thoughts/OpenThoughts2-1M',
 '/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1',
 '/datasets/FreedomIntelligence/medical-o1-reasoning-SFT',
 '/datasets/LLM360/MegaMath',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '

In [11]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'models page', 'url': 'https://huggingface.co/models'},
  {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'},
  {'type': 'spaces page', 'url': 'https://huggingface.co/spaces'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [12]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [13]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Bro

In [24]:
# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."


In [16]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [17]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'community discussion', 'url': 'https://discuss.huggingface.co'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nmeta-llama/Llama-4-Scout-17B-16E-Instruct\nUpdated\n2 days ago\n•\n101k\n•\n623\nQwen/Qwen2.5-Omni-7B\nUpdated\n8 days ago\n•\n105k\n•\n1.26k\ndeepseek-ai/DeepSeek-V3-0324\nUpdated\n13 days ago\n•\n168k\n•\n2.44k\nmeta-llama/Llama-4-Maverick-17B-128E-Instruct\nUpdated\n3 days ago\n•\n12.4k\n•\n240\nreducto/RolmOCR\nUpdated\n6 days ago\n•\n1.69k\n•\n220\nBrowse 1M+ models\nSpaces\nRunning\n3.39

In [18]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [19]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'discussion page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## Welcome to Hugging Face

**The AI community building the future.**  
At Hugging Face, we are dedicated to fostering collaboration within the machine learning community. Our platform provides the tools for users to create, discover, and share a wealth of models, datasets, and applications.

### Our Offerings
- **Models**: Access over 1 million machine learning models.
- **Datasets**: Explore more than 250,000 datasets tailored for your ML needs.
- **Spaces**: Build and host applications effortlessly with our collaborative Spaces.
- **Compute Solutions**: We offer optimized computer resources for your projects, starting at just $0.60 per hour for GPU use.
- **Enterprise Solutions**: Empower your team with enterprise-grade security, access controls, and dedicated support starting at $20 per user per month.

### Who We Serve

Hugging Face is trusted by over **50,000 organizations**, including industry leaders such as:
- **Meta**
- **Google**
- **Amazon**
- **Microsoft**

We are committed to empowering businesses, researchers, and developers by providing them with sophisticated machine learning tools.

### Our Open Source Commitment

We believe in the power of community! Hugging Face is actively building an open-source foundation of ML tooling, featuring:
- Transformers: State-of-the-art ML for PyTorch, TensorFlow, JAX.
- Diffusers: Advanced diffusion models in PyTorch.
- Tokenizers: Optimized for research and production.
- And much more!

### Company Culture

At Hugging Face, our mission is to democratize machine learning and make it accessible to everyone. We thrive on collaboration, innovation, and continuous learning. Our community-driven approach ensures that every voice is heard, and every contribution is valued. 

### Join Us!

Whether you are a customer looking for robust AI solutions, an investor looking for promising opportunities in the tech space, or an enthusiastic recruit eager to make a difference in AI, Hugging Face has something to offer for everyone.

- **Careers**: Explore career opportunities with us on our [Jobs page](https://huggingface.co/jobs).
- **Community Engagement**: Join us on platforms like [GitHub](https://github.com/huggingface), [Twitter](https://twitter.com/huggingface), and [Discord](https://huggingface.co/discord) to connect and collaborate.

### Contact Us

For more information about our services, partnerships, or any inquiries, please visit our website at [Hugging Face](https://huggingface.co).

---

*Hugging Face is your partner in building the future of AI - collaboratively and openly!*

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [20]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [22]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community discussions', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## **About Us**
Hugging Face is an innovative AI and machine learning company dedicated to building the future of AI collaboration. Our community-driven platform enables developers, researchers, and organizations to contribute, discover, and collaborate on state-of-the-art models, datasets, and applications. 

With over 1 million models and 250,000 datasets, our open-source foundation empowers users to accelerate their machine learning projects and facilitate creativity across a range of modalities, including text, image, video, audio, and 3D.

## **Our Mission**
The mission of Hugging Face is to **democratize AI** by providing the best tools, resources, and collaboration opportunities for everyone interested in machine learning. By fostering a vibrant community, we aim to make machine learning more accessible and engaging.

## **Company Culture**
At Hugging Face, we believe in a culture of **collaboration, creativity, and transparency.** Our team is composed of passionate individuals who value open-source contributions and actively engage with the community. We encourage professional growth and provide a supportive environment where team members can explore new ideas and technologies.

Our inclusive workplace promotes creativity and innovation, where every voice is heard. We prioritize a healthy work-life balance, ensuring our staff is motivated and enjoys their work experience.

## **Services and Offerings**
- **Models**: Browse and discover the latest machine learning models supported by the Hugging Face community.
- **Datasets**: Access a vast library of datasets tailored for various ML tasks.
- **Spaces**: Create and host interactive applications and demos using our easy-to-use platform.
- **Enterprise Solutions**: Our enterprise offerings provide advanced security and support for team collaboration in AI projects.

### **Pricing**
- **Compute solutions** starting at $0.60/hour for GPU.
- **Enterprise solutions** starting at $20/user/month, which includes features such as Single Sign-On, priority support, and dedicated resource groups.

## **Our Customers**
Hugging Face serves a diverse clientele of over 50,000 organizations, including industry leaders such as:
- Google
- Microsoft
- Amazon
- Intel
- Meta

Our platform is equipped with models and datasets to meet the needs of enterprises while maintaining a strong focus on the development of cutting-edge AI tools.

## **Careers/Jobs**
Join our growing team! We are continuously on the lookout for innovative and talented individuals who are eager to make a difference in the AI community. Opportunities exist in various areas, from software engineering to community management.

### Current Openings:
- Machine Learning Engineer
- Software Developer
- Research Scientist
- Community Manager

Interested candidates are encouraged to apply through our official careers page.

## **Connect With Us**
Follow Hugging Face to stay updated on the latest trends, models, and community news:
- [GitHub](https://github.com/huggingface)
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://linkedin.com/company/huggingface)
- [Discord](https://discord.gg/huggingface)

---

**Hugging Face**: Together, we build the future of AI. Join our mission today!

In [25]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# 🥳 Welcome to Hugging Face! 🥳

## The AI Community Building the Future

### What Do We Do? 🤔
At Hugging Face, we’re not just a sweet-faced conglomeration of AI enthusiasts! We're the place where machine learning gets cozy. Our platform is bursting with **1M+ models** (not that we’re counting 😏), ready for you to explore. Think of us as a candy store for data scientists and AI aficionados, minus the sticky fingers!

### Why Choose Hugging Face? 🌟
- **Models Galore**: From **deepseek-ai** to **meta-llama**, we have more models than a cat has lives. And just like your cat, our models love attention! 🐱
- **Datasets for Days**: Hosting over **250k datasets** – why did the dataset break up with the model? It found a better match on Hugging Face! 💔
- **Spaces**: Join a lively co-working space without the coffee spill! Build apps and create magic, all in our digital hangout spots.

### Who’s Using Us? 👥
Big-name players love to hug it out with us! More than **50,000 organizations** have joined our cuddle puddle, including tech giants like **Google**, **Amazon**, and **Microsoft**. They’re in good company (just like you’d be), hugging it out over AI development.

### Company Culture – Hug It Out! 🤗
At Hugging Face, we believe in the power of collaboration! We’re a community of researchers, engineers, and dreamers (yes, including people who still think they can become astronauts). Our philosophy? “Teamwork makes the dream work… and also makes the models run better!” 🥇

### Careers at Hugging Face – Join the Hug Fest! 🚀
Ready to dive into the future and make an impact? We’re on the lookout for imaginative minds to craft the next wave of machine learning wonders! 
- **Positions Available**: Data Scientists, AI Researchers, Model Tango Experts (okay, we made that one up, but you get the gist). 
- **What You Get**: A creative, diverse workplace where your ideas matter! Plus, flexible hours because no one likes traffic while dreaming up the next big thing.

### In Summary… 📜
- **Explore AI Apps** 🖥️
- **Browse 1M+ Models** 📊
- **Collaborate and Create** 🤝
- **Be Part of Something Hugely Awesome!** 💙

So, if you’re looking to be part of the AI revolution without the heavy lifting, come join our family at Hugging Face! Together, we’ll hug that future right into existence! 

### Ready to get involved? 
Hop on over to [**Hugging Face**](https://huggingface.co) and let’s get that AI love flowing! 

--- 
**Disclaimer**: Hugs, puns, and laughter may be included in the terms and conditions of collaboration. 🥳

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.

This is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.

Generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype. See what other students have done in the community-contributions folder -- so many valuable projects -- it's wild!</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you move to Week 2 (which is tons of fun)</h2>
            <span style="color:#900;">Please see the week1 EXERCISE notebook for your challenge for the end of week 1. This will give you some essential practice working with Frontier APIs, and prepare you well for Week 2.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">A reminder on 3 useful resources</h2>
            <span style="color:#f71;">1. The resources for the course are available <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">here.</a><br/>
            2. I'm on LinkedIn <a href="https://www.linkedin.com/in/eddonner/">here</a> and I love connecting with people taking the course!<br/>
            3. I'm trying out X/Twitter and I'm at <a href="https://x.com/edwarddonner">@edwarddonner<a> and hoping people will teach me how it's done..  
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../thankyou.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#090;">Finally! I have a special request for you</h2>
            <span style="color:#090;">
                My editor tells me that it makes a MASSIVE difference when students rate this course on Udemy - it's one of the main ways that Udemy decides whether to show it to others. If you're able to take a minute to rate this, I'd be so very grateful! And regardless - always please reach out to me at ed@edwarddonner.com if I can help at any point.
            </span>
        </td>
    </tr>
</table>

In [26]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()


# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"




link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""


def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt



def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)


def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result


system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."


def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt


# def create_brochure(company_name, url):
#     response = openai.chat.completions.create(
#         model=MODEL,
#         messages=[
#             {"role": "system", "content": system_prompt},
#             {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
#           ],
#     )
#     result = response.choices[0].message.content
#     display(Markdown(result))




def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)


stream_brochure("ibis acam", "https://www.ibisacam.at")

API key looks good so far
Found links: {'links': [{'type': 'about page', 'url': 'https://www.ibisacam.at/vision-wir/unsere-geschichte/'}, {'type': 'company page', 'url': 'https://www.ibisacam.at/vision-wir/missionvision/'}, {'type': 'company page', 'url': 'https://www.ibisacam.at/vision-wir/team/'}, {'type': 'careers page', 'url': 'https://www.ibisacam.at/arbeiten-bei-ibis-acam/'}, {'type': 'careers page', 'url': 'https://www.ibisacam.at/arbeiten-bei-ibis-acam-karriere/'}, {'type': 'contact page', 'url': 'https://www.ibisacam.at/kontakt/'}, {'type': 'courses page', 'url': 'https://www.ibisacam.at/kurse/'}, {'type': 'services for companies page', 'url': 'https://www.ibisacam.at/fuer-unternehmen/'}, {'type': 'job opportunities page', 'url': 'https://www.ibisacam.at/ausbildungjob/'}]}


# ibis acam Brochure

## Welcome to ibis acam

At ibis acam, we empower individuals and organizations through targeted educational and personnel placement programs. Our mission is to contribute to a sustainable world by facilitating personal and professional success through quality education and career guidance.

---

### Our Vision

We see ourselves as learning and life companions, dedicated to supporting people on their journey to achieving their fullest potential. With various programs, courses, and initiatives, we aim to foster a community that thrives on education and personal development.

---

### Educational Offerings

#### Courses
- **Career Orientation**: Discover your path with us through professional orientation services.
- **Digital Literacy**: Engage in workshops that prepare you for the digital world, offering topics such as internet security, digital administration, and AI.
- **Language Courses**: Improve your German language skills for better integration and participation in Austrian society.
- **Life Design Coaching**: Tailored coaching to help you reach personal and professional goals.

#### Specialized Programs
- **AQUA – Workplace-Integrated Qualification**: Our innovative approach to training and recruitment empowers companies to develop talented staff efficiently.
- **Youth College**: A dedicated program for youth seeking job placements or vocational training.

### Career Opportunities

Join our team and contribute to making a difference! At ibis acam, we are always looking for passionate individuals who want to engage in meaningful work and help others achieve their goals. 

#### Why Work with Us?
- **Community-Oriented Culture**: Be part of a supportive team that values growth, collaboration, and social responsibility.
- **Diverse Job Openings**: With over [number] open positions in various fields, we encourage talented individuals to apply and flourish.
- **Comprehensive Onboarding and Continuous Support**: We are committed to providing our employees with the resources they need to thrive.

---

### Our Clients

We work with a diverse range of clients, including individuals seeking to improve their skills, organizations looking for qualified personnel, and municipalities aiming for digital transformation within their communities. Our focus on sustainable practices ensures long-term success for everyone involved.

### Join Us Today!

Are you ready to take the next step in your career, support community growth, and make a meaningful impact? **Explore what ibis acam has to offer and become part of our mission**! 

For more information about our courses, recruitment services, or career opportunities, visit [ibis acam’s website](#) or contact us directly.

**Together, let’s build a future where everyone has the opportunity to succeed!**