# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [42]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [43]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [44]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [45]:
# ed = Website("https://edwarddonner.com")
# ed.links

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [46]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [47]:
# print(link_system_prompt)

In [48]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [49]:
# print(get_links_user_prompt(ed))

In [50]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [51]:
# # Anthropic has made their site harder to scrape, so I'm using HuggingFace..

# huggingface = Website("https://huggingface.co")
# huggingface.links

In [52]:
# get_links("https://huggingface.co")

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [53]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [54]:
# print(get_all_details("https://huggingface.co"))

In [55]:
# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

# # Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."


In [56]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [57]:
# get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

In [58]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [59]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'community discussion', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Twitter', 'url': 'https://twitter.com/huggingface'}]}


# Welcome to Hugging Face! 🤗

**Where AI gets cozy and the future is as bright as our coffee! ☕💡**

---

## Who Are We?

We’re the AI community building the future—one model, one dataset, and one application at a time! At Hugging Face, we believe that the only thing better than a hug is an AI community where you can collaborate and innovate. Why? Because seriously—who doesn't love hugging a robot?!

---

## What We Do

**We give you access to:**  
- **1M+ AI Models:** Explore more models than you have fingers! (though it is a bit awkward for our 6-fingered friends)
- **250k+ Datasets:** Because one dataset just isn't enough. Add more to the mix!
- **Creativity Spaces:** Like a high-tech sandbox—but instead of shovels, you have cutting-edge AI applications!

**Trending Models This Week:**  
- **HunyuanImage-3.0:** Visuals that make it look like you know what you’re doing.  
- **DeepSeek-V3.2-Exp:** For when you want your machine learning to sound more impressive.  
- **GLM-4.6:** It’s not just a model; it’s a lifestyle.  

---

## Culture

At Hugging Face, we’re not just about AI—we're a family of AI enthusiasts who lift each other up, just like a great hug! 

- **Teamwork makes the Dream Work:** Collaboration is our middle name (it can be a little clunky to type, but we persist)!
- **Open Source Lovers:** Together, we contribute to open-source ML tooling, ensuring the AI revolution is open for everyone (no secret handshakes required).
- **Diversity and Inclusion:** We welcome everyone—both humans and friendly AI. Yes, even your pet robot!

---

## Join Us!

**Careers at Hugging Face:**  
Are you a machine learning genius, a wizard of data sets, or simply someone who believes that AI should be fun? We want YOU!  
- **Roles available:** Data whisperer, AI philosopher, and the occasional office comedian (applications for the last one are open until the position is filled by our other office comedians).

---

## Our Customers

More than 50,000 organizations trust us to help shape their AI strategies. Even popular giants like Google, Amazon, and Microsoft are hugging it out with us! 💼🤖

---

## Let's Connect!     
    
Want to explore AI Apps that make you feel like a superhero? Visit [Hugging Face](https://huggingface.co) and experience the magic yourself! 

### Remember: If you’re in the AI game, it’s better to go in equipped with a hug! 🤗

---

*Terms and conditions apply, including the need to occasionally share snacks with AI during training.*

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [64]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [65]:
stream_brochure("cnn", "https://cnn.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://www.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'company page', 'url': 'https://www.cnn.com/profiles'}, {'type': 'leadership profiles', 'url': 'https://www.cnn.com/profiles/cnn-leadership'}]}


# CNN Brochure: Your Inside Scoop on All Things Newsworthy! 

---

### Welcome to CNN - Where News Meets Fun!

**Breaking News?** We’re always on it! From spicy political gossip to the hottest trends in weather (Did someone say rain?), CNN is your one-stop shop for all things informative and entertaining. It’s like a buffet for your brain - just with fewer calories and more 24-hour coverage.

### What’s Brewing Over at CNN?

At CNN, we pride ourselves on the *value of feedback*, just like that friend who always asks you to rate their new haircut—awkward, but necessary! From technical issues to ad annoyances, we want to hear from you. Please hold your applause until the end (just kidding; we love it).  

We are not just about hard-hitting news but also sprinkle in some entertainment and lifestyle content—because sometimes you need a giggle in between world crises! Welcome to our range of topics: *Politics, Health, Sports, and more!* You can even catch our tasty segment, *CNN Underscored*, where we give you the lowdown on stylish hats and comfy socks!

### Our Customers: News Junkies & Coffee Lovers!

Our audience is as diverse as a box of chocolates (though we’d never mix politics with sweets—yikes!). Whether you're a *news junkie*, a *celebrity gossip enthusiast*, or a casual reader looking to spice up your Insta feed, we've got something for you! We even have international editions in Arabic and Spanish—because who doesn’t love a little *multilingual* flavor?

### The CNN Culture - Where News Meets Vibes!

At CNN, we believe in creating a culture that’s *more inclusive than a family reunion* (minus the weird cousin). From our newsroom to your inbox, we promote collaboration, creativity, and the occasional fun debate on the best coffee roast—dark roast, we’re looking at you!

Want a career with us? Picture this: you reporting live from the scene, a coffee in one hand, and breakfast burrito in the other. Who wouldn’t want their job to be that exciting? Join our squad and take your career *beyond the news desk*! We’re always looking for bright minds ready to make waves (in a *non-tsunami* way).

### Why Choose CNN? 

- **Expert Reporting:** We dive deep! Just like you into that half-eaten pizza in your fridge.
- **Diverse Topics:** Because sometimes you need to know what’s happening from outer space to your local coffee shop.
- **Careers with Flavor:** One word: *exciting*. Join our team and become a part of our *news fam*!

### Final Thoughts: 

So, whether you’re here for the news, the entertainment, or that midday chuckle, CNN has got your back. Tune in, follow up, and feel free to give us that feedback—we promise we’ll make a mental note (or write you a sticky!). 

---

**Headline:** *Stay informed, stay entertained – with CNN!* 🎤🎉

In [62]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

# stream_brochure("HuggingFace", "https://huggingface.co")

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.

This is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.

Generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype. See what other students have done in the community-contributions folder -- so many valuable projects -- it's wild!</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you move to Week 2 (which is tons of fun)</h2>
            <span style="color:#900;">Please see the week1 EXERCISE notebook for your challenge for the end of week 1. This will give you some essential practice working with Frontier APIs, and prepare you well for Week 2.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">A reminder on 3 useful resources</h2>
            <span style="color:#f71;">1. The resources for the course are available <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">here.</a><br/>
            2. I'm on LinkedIn <a href="https://www.linkedin.com/in/eddonner/">here</a> and I love connecting with people taking the course!<br/>
            3. I'm trying out X/Twitter and I'm at <a href="https://x.com/edwarddonner">@edwarddonner<a> and hoping people will teach me how it's done..  
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../thankyou.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#090;">Finally! I have a special request for you</h2>
            <span style="color:#090;">
                My editor tells me that it makes a MASSIVE difference when students rate this course on Udemy - it's one of the main ways that Udemy decides whether to show it to others. If you're able to take a minute to rate this, I'd be so very grateful! And regardless - always please reach out to me at ed@edwarddonner.com if I can help at any point.
            </span>
        </td>
    </tr>
</table>

In [63]:
create_brochure("cnn", "https://cnn.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://www.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'company page', 'url': 'https://www.cnn.com'}]}


```markdown
# 🥳 Welcome to CNN: 24/7 News with a Side of Humor! 🥳

## 🌎 Who Are We?

At **CNN**, we bring you the latest and greatest from every corner of the globe and every crevice of your living room! Whether it’s breaking news, political scandals, or the *latest* celebrity haircuts, we’ve got the scoop (and probably some popcorn too). After all, why take life too seriously, when there’s *always* something happening and a new ad to critique?

## 🤔 What’s Our Flavor?

We serve up a delicious buffet of:
- **US & World News**: What’s shaking both near and far. Spoiler alert: It’s often shaking quite vigorously.
- **Politics**: All the drama of a reality show, but people take the stakes a little more seriously.
- **Health**: Life hacks guaranteed to make you feel great... most of the time.
- **Entertainment**: Because Taylor Swift is always breaking up (or making up), and we must keep our viewers updated!
- **Sports**: Who needs a playbook? Our sports section covers all the major highlights and maybe a few “oops” moments. 
- **Science**: Reports on things that sound impossible but are happening anyway — like eating cake for breakfast while staying fit!

## 👀 Who’s Watching Us?

Our customers? They’re anyone from concerned citizens wanting to know what’s up, to folks just looking for something to keep them awake while they sip their fourth cup of coffee. We know you love feedback, and boy do we appreciate the barrage of responses about our *excellent* ad placements! 

## 🎉 Culture: We Take News Seriously… but Not That Seriously!

At CNN, humor and serious news dance like awkward relatives at a family wedding. Here’s what makes our culture stand out:
- **Diversity**: We welcome everyone from reporters who have traveled the globe to those who can barely find their car keys.
- **Flexibility**: Like a rubber band! We believe in adapting to every twist and turn in the news world. News is water, and we’re just trying not to drown in it.
- **Innovation**: Did you know we even have an actual "Innovative Cities" section? Spoiler: it's not just hipster coffee shops and street art (but we do love a good latte)!

## 💼 Careers: Want to Join the Fun?

Looking for a career where you can use your best witty comebacks and sprinkle some fact-checking magic? CNN is hiring! We’re looking for:
- **Reporters** who can spot a story from a mile away! 
- **Editors** who keep the news straight, rightside-up, and not upside down.
- **Marketing Mavericks** - Because someone has to make sure our headlines grab eyeballs without making people cringe too much.
- **Tech Wizards** who can fix our slow-loading video player before we lose the attention of A.D.D. viewers!

## 🥳 Join the Party or Just Tune In!

At CNN, we're on the frontlines bringing you the news you need (and a bunch you didn't) with a splash of humor, a dash of enthusiasm, and lots of commitments to feedback. Grab your coffee, sit back, and let us take you on a wild news ride. Your couch is ready!

**Follow us for news about everything - except those items on your to-do list!**

---

👩‍💻 Get all the updates on your screen and join the conversation!

### #CNN #WeHopeYouLaugh #NewsAndNonsense
```


In [74]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

def scrape_page_text(base_url, path):
    """
    Scrapes clean visible text from a given webpage.
    Returns:
        str: Clean text content of the page.
    """
    
    url = urljoin(base_url, path)
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                      "(KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }

    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
    except Exception as e:
        return f"Error fetching {url}: {e}"
    
    soup = BeautifulSoup(response.text, "html.parser")

    
    for tag in soup(["script", "style", "noscript", "header", "footer", "form", "nav"]):
        tag.extract()

    # Get text
    text = soup.get_text(separator=" ", strip=True)

    # Clean extra spaces
    return " ".join(text.split())
                       


In [73]:
base_url = "https://cnn.com"

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

USEFUL_KEYWORDS = [
    "about", "company", "who-we-are", "our-story", "team",
    "service", "services", "solutions", "what-we-do",
    "product", "products", "offerings", "portfolio",
    "feature", "features", "capabilities",
    "pricing", "plans", "packages", "rates",
    "case-study", "case-studies", "success", "projects", "work",
    "testimonial", "testimonials", "reviews", "clients", "customers",
    "contact", "contact-us", "support", "help", "enquiry", "get-in-touch",
    "blog", "insights", "resources", "news", "articles"
]

def get_useful_links(base_url):
    try:
        response = requests.get(base_url, timeout=10)
        response.raise_for_status()
    except Exception as e:
        print(f"Error fetching {base_url}: {e}")
        return []

    soup = BeautifulSoup(response.text, "html.parser")
    links = set()

    for a in soup.find_all("a", href=True):
        href = a["href"].lower()
        full_url = urljoin(base_url, href)

        # check if any keyword matches in the link
        if any(keyword in href for keyword in USEFUL_KEYWORDS):
            links.add(full_url)

    return list(links)
useful_paths = get_useful_links(base_url)
print("Useful paths found:", useful_paths)

for path in useful_paths:
    page_text = scrape_page_text(base_url, path)
    print(f"--- Content from {path} ---\n{page_text[:500]}...\n")


Useful paths found: ['https://bleacherreport.com/articles/25255248-dolphins-tua-tagovailoa-talks-diet-changes-help-avoid-future-concussions-video?utm_source=cnn.com&utm_medium=referral&utm_campaign=editorial', 'https://cnn.com/cnn-underscored/reviews/best-drip-coffee-makers', 'https://bleacherreport.com/articles/25255180-shedeur-sanders-keeps-making-his-biggest-problem-worse?utm_source=cnn.com&utm_medium=referral&utm_campaign=editorial', 'https://cnn.com/cnn-underscored/reviews/best-kitchen-knife-sets', 'https://cnn.com/newsletters', 'https://bleacherreport.com/articles/25254854-speed-trains-tom-brady-sauce-gardner-amendola-be-nfl-player-new-video?utm_source=cnn.com&utm_medium=referral&utm_campaign=editorial', 'https://cnn.com/cnn-underscored/reviews/truskin-vitamin-c-serum', 'https://www.cnn.com/newsletters', 'https://www.cnn.com/climate/solutions', 'https://cnn.com/cnn-underscored/reviews/best-leaf-blower', 'https://www.cnn.com/cnn-underscored/reviews', 'https://cnn.com/2025/10/02/po