# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [4]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'ht

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [6]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [7]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [8]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2024/12/21/

In [9]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [10]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/Qwen/QwQ-32B',
 '/deepseek-ai/DeepSeek-R1',
 '/microsoft/Phi-4-multimodal-instruct',
 '/Wan-AI/Wan2.1-T2V-14B',
 '/CohereForAI/aya-vision-8b',
 '/models',
 '/spaces/Wan-AI/Wan2.1',
 '/spaces/nanotron/ultrascale-playbook',
 '/spaces/ASLP-lab/DiffRhythm',
 '/spaces/Qwen/QwQ-32B-Demo',
 '/spaces/black-forest-labs/FLUX.1-dev',
 '/spaces',
 '/datasets/facebook/natural_reasoning',
 '/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k',
 '/datasets/FreedomIntelligence/medical-o1-reasoning-SFT',
 '/datasets/GeneralReasoning/GeneralThought-195K',
 '/datasets/KodCode/KodCode-V1',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/gramm

In [11]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'blog', 'url': 'https://huggingface.co/blog'},
  {'type': 'company page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [12]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [13]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
Qwen/QwQ-32B
Updated
1 day ago
•
83.8k
•
1.58k
deepseek-ai/DeepSeek-R1
Updated
13 days ago
•
3.89M
•
11k
microsoft/Phi-4-multimodal-instruct
Updated
about 23 hours ago
•
155k
•
1k
Wan-AI/Wan2.1-T2V-14B
Updated
10 days ago
•
182k
•
928
CohereFo

In [14]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [15]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [16]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'jobs page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'discussion page', 'url': 'https://discuss.huggingface.co'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nQwen/QwQ-32B\nUpdated\n1 day ago\n•\n83.8k\n•\n1.58k\ndeepseek-ai/DeepSeek-R1\nUpdated\n13 days ago\n•\n3.89M\n•\n11k\nmicrosoft/Phi-4-multimodal-instruct\nUpdated\nabout 23 hours ago\n•\n155k\n•\n1k\nWan-AI/Wan2.1-T2V-14B\nUpdated\n10 days ago\n•\n182k\n•\n929\nCohereForAI/aya-vision-8b\nUpdated\n4 days ago\n•\n119k\n•\n195\nBrowse 1M+ models\nSpaces\nRunning\n1.02k\n1.02k\nWan2.1\n💻\nWan: Op

In [17]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [18]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'discuss page', 'url': 'https://discuss.huggingface.co'}, {'type': 'github page', 'url': 'https://github.com/huggingface'}, {'type': 'twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'linkedin page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

## Welcome to Hugging Face
**The AI community building the future.**  
At Hugging Face, we are committed to creating an open platform for collaboration in the machine learning space, empowering developers, researchers, and organizations to innovate and share their contributions to AI. 

---

## Our Offerings
### Explore:
- **1M+ Models**: A vast library of state-of-the-art machine learning models for various applications, from text generation and image processing to more complex datasets.
- **250k+ Datasets**: Comprehensive datasets that cater to the diverse needs of the ML community.
- **AI Apps**: Explore a variety of AI applications hosted on our platform.

### Special Features:
- **Spaces**: Collaborate on public models, datasets, and applications.
- **Compute Solutions**: Paid options for optimized inference and enterprise-grade support, starting at **$0.60/hour for GPU** usage.
- **Enterprise security and support**: Tailored solutions for organizations starting at **$20/user/month**.

---

## Who We Work With
More than **50,000 organizations** trust Hugging Face, including prominent names like:
- **Meta, Amazon, Google, Microsoft, and Intel**.
- Non-profits like **AI2** also call Hugging Face home, showcasing our commitment to democratizing AI.

---

## Our Community and Culture
Hugging Face thrives on the principles of open-source collaboration and community engagement. We believe in building foundations for machine learning tools with collective input from developers and researchers around the globe.

- **Team Size**: Over **210 team members**, including engineers, researchers, and advocates for machine learning.
- **Collaborative Spirit**: Our community fosters an environment where sharing knowledge and resources is encouraged, allowing collective progress in the field.
  
---

## Career Opportunities
Join us in our mission! We offer a dynamic work culture with opportunities for personal and professional growth. Whether you’re a developer, a data scientist, or an AI enthusiast, there are varied roles available that cater to all skill levels.

### Current Openings:
- AI Research Scientist
- Software Engineer
- Community Manager
- Data Analyst  
*Check our jobs page for the latest opportunities.*

---

## Join the AI Revolution!
### Connect with Us
To learn more or to become a part of our vibrant community, visit us at:  
[Hugging Face](https://huggingface.co)  
Follow us on **Twitter, LinkedIn, and Discord** to stay updated.

---

*Become a part of the future of AI at Hugging Face—where collaboration and innovation converge.*

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [19]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [21]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company blog', 'url': 'https://huggingface.co/blog'}, {'type': 'community discussion', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face: The AI Community Building the Future

## About Us
Hugging Face is a collaborative platform for the machine learning community, dedicated to building, sharing, and deploying state-of-the-art Artificial Intelligence models, datasets, and applications. Our mission is to empower researchers, developers, and enterprises to share their innovations and accelerate the advancement of AI technology.

## What We Offer
- **1M+ Models**: Explore and use a vast collection of AI models, including the latest trends in machine learning like Qwen/QwQ-32B and others.
- **Datasets**: Access over 250,000 diverse datasets tailored for various machine learning tasks.
- **Spaces**: Utilize our collaborative workspace to run and share applications, model demos, and more.
- **Compute Solutions**: Experience optimized inference endpoints and Request for Compute solutions for various enterprise needs, starting from $0.60/hour.

## Our Customers
Hugging Face serves more than 50,000 organizations across industries, including tech giants like:
- **AI2**
- **Meta**
- **Amazon**
- **Google**
- **Microsoft**
- **Intel**
  
These enterprises rely on our platform to implement cutting-edge machine learning solutions, share their research, and foster innovation.

## Company Culture
At Hugging Face, we embrace a community-driven culture centered around collaboration, transparency, and innovation. We are committed to building an open-source foundation that empowers users to contribute, learn, and grow together. Our team values creativity and encourages individuals to explore new ideas that challenge conventional boundaries in AI. 

## Careers at Hugging Face
Join us in shaping the future of AI! We are continuously looking for passionate and talented individuals to become part of our growing team. Opportunities exist across various disciplines, and we welcome candidates from diverse backgrounds.

### Why Work with Us?
- Be at the forefront of AI technology.
- Work in an inclusive and diverse environment.
- Build impactful applications that serve thousands of users.
- Engage with an enthusiastic community dedicated to open-source principles.

Whether you are a prospective customer seeking advanced AI solutions, an investor looking to be part of an innovative company, or a recruit eager to join a dynamic team, Hugging Face welcomes you!

---

For further information, visit our website at [Hugging Face](https://huggingface.co) and join our thriving community today!

In [22]:
# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [23]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Welcome to Hugging Face: Where AI Gets Warm Hugs!

## 🤖 **Join the AI Revolution!**
At Hugging Face, we’re not just building AI, we’re cultivating the future—one model, dataset, and hug at a time. Think of us as the big, cozy blanket that wraps around the machine learning community, helping everyone collaborate and create better things. 

## 🏢 **What’s the Buzz?** 
More than **50,000 organizations** are already snuggling up with us, including big names like **Amazon, Google**, and **Microsoft**. They know that good things happen when smart minds come together (and we supply lots of models)!

## 📊 **1 Million Models and Counting!** 
We’ve got more models than you can shake a stick at! 🚀 From **Qwen/QwQ-32B** to **DeepSeek-R1**, no model is left unloved! 

Come for the models, stay for the fun! We have a little something for everyone.

- **Explore AI Apps**: Try out our cool gadgets, like the **Di♪♪Rhythm** 🎶 for generating tunes. (Warning: may cause spontaneous dance parties.)
- **Trending Models**: Watching the latest stars of AI become famous—right here!

## 🤝 **Our Company Culture: Hug it Out!**
At Hugging Face, we believe in collaboration with zero awkwardness. We provide a friendly environment where ideas can flourish and employees feel supported. No walls, just open minds and open-source tools!

Want to wear your heart (and your work) on your sleeve? Build your portfolio and show the world your brilliance!

### **Careers with Heart ❤️**
Thinking about becoming a Hugging Face? We’re always on the lookout for innovative and passionate individuals! From AI wizards to code magicians, we offer careers that provide both growth and a sense of community.
- 💼 **Jobs Available**: Check out our careers page if you’re ready to jump into the deep end of AI fun. Or just tiptoe in! Either way, we’ve got floaties. 

## 💰 **Enterprise Solutions – Serious Hugging!**
Looking for something a bit more tailored? Our **Enterprise** offers advanced platform support and enterprise-grade security. With plans starting at just **$20/user/month**, you can give your team the top-notch tools they need—just think of us as your secret AI weapon! 🤫 

## 🌍 **Join the Hugging Face Family!**
Ready to make your mark on the future and snuggle up with some cutting-edge technology? Join the Hugging Face family and help us build the community where AI hugs are just part of everyday life. 

### **Sign Up today:**
Get ready to explore, collaborate, and maybe even get a little cozy with AI! Remember, we’re not just about technology; we’re about fellowship of the smartest kind. 

---

**Hugging Face**: Because the future isn’t just bright—it’s warm and fuzzy!

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.

This is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.

Generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype. See what other students have done in the community-contributions folder -- so many valuable projects -- it's wild!</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you move to Week 2 (which is tons of fun)</h2>
            <span style="color:#900;">Please see the week1 EXERCISE notebook for your challenge for the end of week 1. This will give you some essential practice working with Frontier APIs, and prepare you well for Week 2.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">A reminder on 3 useful resources</h2>
            <span style="color:#f71;">1. The resources for the course are available <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">here.</a><br/>
            2. I'm on LinkedIn <a href="https://www.linkedin.com/in/eddonner/">here</a> and I love connecting with people taking the course!<br/>
            3. I'm trying out X/Twitter and I'm at <a href="https://x.com/edwarddonner">@edwarddonner<a> and hoping people will teach me how it's done..  
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../thankyou.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#090;">Finally! I have a special request for you</h2>
            <span style="color:#090;">
                My editor tells me that it makes a MASSIVE difference when students rate this course on Udemy - it's one of the main ways that Udemy decides whether to show it to others. If you're able to take a minute to rate this, I'd be so very grateful! And regardless - always please reach out to me at ed@edwarddonner.com if I can help at any point.
            </span>
        </td>
    </tr>
</table>

In [None]:
# Creating an alternate solution, Challenge! https://ibm-learning.udemy.com/course/llm-engineering-master-ai-and-large-language-models/learn/lecture/46871457#overview

In [24]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [25]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [29]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

        self.filtered_links = ""

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [30]:
## Agent 1 - Get Links for a website

def get_link_agent(website):
    #website = Website(url)

    link_system_prompt = "You are provided with a list of links found on a webpage. \
    You are able to decide which of the links would be most relevant to include in a brochure about the company, \
    such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
    link_system_prompt += "You should respond in JSON as in this example:"
    link_system_prompt += """
    {
        "links": [
            {"type": "about page", "url": "https://full.url/goes/here/about"},
            {"type": "careers page": "url": "https://another.full.url/careers"}
        ]
    }
    """

    #trying multi-shot prompting
    link_system_prompt += """\nHere's another example:
    {
        "links": [
            {"type": "about page", "url": "https://huggingface.co/"},
            {"type": "enterprise page", "url": "https://huggingface.co/enterprise"}
        ]
    }   
    """
    
    # def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
    Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)

    # def get_links(url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": user_prompt}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [44]:
def create_brochure_agent(website,links, company_name):
    system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
    and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
    Include details of company culture, customers and careers/jobs if you have the information."

    def get_all_details(url):
        result = "Landing page:\n"
        result += Website(url).get_contents()
        result += website.get_contents()
        links = get_links(url)
        # print("Found links:", links)

        # get the content for each link for the website
        for link in links["links"]:
            result += f"\n\n{link['type']}\n"
            result += Website(link["url"]).get_contents()
        return result
    
    # def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    # return user_prompt

    # def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
          ],
    )
    result = response.choices[0].message.content
    return result
    #display(Markdown(result))

In [36]:
url = "https://huggingface.co"
company = "HuggingFace"

website = Website(url)


In [38]:
retrieved_links = get_link_agent(website)
website.filtered_links = retrieved_links



In [41]:
website.filtered_links

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'learn page', 'url': 'https://huggingface.co/learn'},
  {'type': 'company page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

In [45]:
result = create_brochure_agent(website, retrieved_links, company)
display(Markdown(result))

```markdown
# Hugging Face Brochure

## About Us
**Hugging Face** is at the forefront of the AI community, dedicated to building the future of machine learning. We provide a platform that fosters collaboration among developers, researchers, and enthusiasts globally, enabling them to share models, datasets, and applications efficiently.

### Our Mission
To empower the machine learning community through open collaboration, innovative tools, and a comprehensive ecosystem for all things AI.

## Company Culture
At Hugging Face, we prioritize **community, innovation, and open-source collaboration**. Our culture thrives on creativity and inclusivity, encouraging team members to strive for excellence while supporting one another in a flexible, engaging work environment. We believe in building not just technology, but a **community of builders and academic leaders**.

### The Hugging Face Community
More than **50,000 organizations** utilize our tools and resources, including industry giants like **Meta, Google, Microsoft**, and **Amazon**. We are committed to creating state-of-the-art solutions and fostering growth and learning opportunities.

## Products & Services
Hugging Face offers a variety of powerful tools and resources for machine learning applications, including:

- **Transformers**: State-of-the-art machine learning models compatible with PyTorch, TensorFlow, and JAX.
- **Datasets**: Access to over **250k datasets** for a multitude of machine learning tasks.
- **Spaces**: A collaborative platform to showcase and deploy applications running on our advanced models.
- **Enterprise Solutions**: Starting at $20/user/month, providing enterprise-grade security and dedicated support.

## Join Us!
Hugging Face is not only a leader in AI technology but also a great place to build your career. We are continuously looking for innovative thinkers and passionate doers in various fields. 

### Career Opportunities
We offer a range of roles from software engineering to research, and encourage diversity in our talent pool. Here, you can make a real impact on the AI landscape while working alongside some of the brightest minds in the industry.

- **Explore Job Openings**: [Join our team!](https://huggingface.co/jobs)

## Connect with Us
Join the Hugging Face community and be part of shaping the future of AI!

- **Website**: [huggingface.co](https://huggingface.co)
- **GitHub**: [Hugging Face GitHub](https://github.com/huggingface)
- **Twitter**: [Follow us on Twitter](https://twitter.com/huggingface)
- **LinkedIn**: [Connect with us](https://www.linkedin.com/company/huggingface)

### Explore AI Together
Join us in breaking new ground in AI applications and fostering a collaborative community that welcomes everyone to contribute.
```


In [46]:
def convert_brochure_to_spanish_agent(text_to_convert: str):
    system_prompt = "You are an assistant that takes english brochures and converts them to spanish."
    
    user_prompt = f"Convert this brochure to spanish in markdown format: {text_to_convert}"
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
          ],
    )
    result = response.choices[0].message.content
    return result

In [47]:
spanish_brochure = convert_brochure_to_spanish_agent(result)
display(Markdown(spanish_brochure))

```markdown
# Folleto de Hugging Face

## Sobre Nosotros
**Hugging Face** está a la vanguardia de la comunidad de IA, dedicada a construir el futuro del aprendizaje automático. Proporcionamos una plataforma que fomenta la colaboración entre desarrolladores, investigadores y entusiastas a nivel mundial, permitiéndoles compartir modelos, conjuntos de datos y aplicaciones de manera eficiente.

### Nuestra Misión
Empoderar a la comunidad de aprendizaje automático a través de la colaboración abierta, herramientas innovadoras y un ecosistema integral para todo lo relacionado con la IA.

## Cultura de la Empresa
En Hugging Face, priorizamos **la comunidad, la innovación y la colaboración de código abierto**. Nuestra cultura prospera en la creatividad y la inclusión, fomentando que los miembros del equipo se esfuercen por la excelencia mientras se apoyan mutuamente en un entorno de trabajo flexible y atractivo. Creemos en construir no solo tecnología, sino una **comunidad de creadores y líderes académicos**.

### La Comunidad de Hugging Face
Más de **50,000 organizaciones** utilizan nuestras herramientas y recursos, incluidas grandes empresas como **Meta, Google, Microsoft** y **Amazon**. Estamos comprometidos con la creación de soluciones de vanguardia y el fomento de oportunidades de crecimiento y aprendizaje.

## Productos y Servicios
Hugging Face ofrece una variedad de potentes herramientas y recursos para aplicaciones de aprendizaje automático, que incluyen:

- **Transformers**: Modelos de aprendizaje automático de última generación compatibles con PyTorch, TensorFlow y JAX.
- **Conjuntos de Datos**: Acceso a más de **250,000 conjuntos de datos** para una multitud de tareas de aprendizaje automático.
- **Espacios**: Una plataforma colaborativa para mostrar y desplegar aplicaciones que funcionan con nuestros modelos avanzados.
- **Soluciones Empresariales**: A partir de $20/usuario/mes, ofreciendo seguridad de nivel empresarial y soporte dedicado.

## ¡Únete a Nosotros!
Hugging Face no solo es un líder en tecnología de IA, sino también un gran lugar para construir tu carrera. Estamos continuamente buscando pensadores innovadores y hacedores apasionados en varios campos.

### Oportunidades Laborales
Ofrecemos una variedad de roles desde ingeniería de software hasta investigación, y fomentamos la diversidad en nuestro grupo de talento. Aquí, puedes tener un impacto real en el panorama de la IA mientras trabajas junto a algunas de las mentes más brillantes de la industria.

- **Explora Vacantes**: [¡Únete a nuestro equipo!](https://huggingface.co/jobs)

## Conéctate con Nosotros
¡Únete a la comunidad de Hugging Face y sé parte de la configuración del futuro de la IA!

- **Sitio Web**: [huggingface.co](https://huggingface.co)
- **GitHub**: [GitHub de Hugging Face](https://github.com/huggingface)
- **Twitter**: [Síguenos en Twitter](https://twitter.com/huggingface)
- **LinkedIn**: [Conéctate con nosotros](https://www.linkedin.com/company/huggingface)

### Exploremos la IA Juntos
Únete a nosotros para abrir nuevos caminos en aplicaciones de IA y fomentar una comunidad colaborativa que dé la bienvenida a todos para contribuir.
```