# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [27]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import json
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from bs4 import BeautifulSoup
import requests

In [29]:
# Standard headers to fetch a website
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


def fetch_website_contents(url):
    """
    Return the title and contents of the website at the given url;
    truncate to 2,000 characters as a sensible limit
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    title = soup.title.string if soup.title else "No title found"
    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""
    return (title + "\n\n" + text)[:2_000]


def fetch_website_links(url):
    """
    Return the links on the webiste at the given url
    I realize this is inefficient as we're parsing twice! This is to keep the code in the lab simple.
    Feel free to use a class and optimize it!
    """
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    links = [link.get("href") for link in soup.find_all("a")]
    return [link for link in links if link]

In [30]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-5-nano'
openai = OpenAI()

API key looks good so far


In [31]:
links = fetch_website_links("https://edwarddonner.com")
links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/11/11/ai-live-event/',
 'https://edwarddonner.com/2025/11/11/ai-live-event/',
 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/28/connecting-my-cou

## First step: Have GPT-5-nano figure out which links are relevant

### Use a call to gpt-5-nano to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [32]:
link_system_prompt = """
You are provided with a list of links found on a webpage.
You are able to decide which of the links would be most relevant to include in a brochure about the company,
such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [33]:
def get_links_user_prompt(url):
    user_prompt = f"""
Here is the list of links on the website {url} -
Please decide which of these are relevant web links for a brochure about the company, 
respond with the full https URL in JSON format.
Do not include Terms of Service, Privacy, email links.

Links (some might be relative links):

"""
    links = fetch_website_links(url)
    user_prompt += "\n".join(links)
    return user_prompt

In [34]:
print(get_links_user_prompt("https://edwarddonner.com"))


Here is the list of links on the website https://edwarddonner.com -
Please decide which of these are relevant web links for a brochure about the company, 
respond with the full https URL in JSON format.
Do not include Terms of Service, Privacy, email links.

Links (some might be relative links):

https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/11/11/ai-live-event/
https://edwarddonner.com/2025/11/11/ai-live-event/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
htt

In [35]:
def select_relevant_links(url):
    print(f"Selecting relevant links for {url} by calling {MODEL}")
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(url)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    links = json.loads(result)
    print(f"Found {len(links['links'])} relevant links")
    return links

In [36]:
select_relevant_links("https://huggingface.co")

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 21 relevant links


{'links': [{'type': 'homepage', 'url': 'https://huggingface.co/'},
  {'type': 'brand page', 'url': 'https://huggingface.co/brand'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'join page', 'url': 'https://huggingface.co/join'},
  {'type': 'Discord community', 'url': 'https://huggingface.co/join/discord'},
  {'type': 'blog', 'url': 'https://huggingface.co/blog'},
  {'type': 'community forum', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn', 'url': 'https://www.linkedin.com/company/huggingface/'},
  {'type': 'Zhihu', 'url': 'https://www.zhihu.com/org/huggingface'},
  {'type': 'Brand partner', 'url': 'https://huggingface.co/allenai'},
  {'type': 'Brand partner',

## Second step: make the brochure!

Assemble all the details into another prompt to GPT-5-nano

In [37]:
def fetch_page_and_all_relevant_links(url):
    contents = fetch_website_contents(url)
    relevant_links = select_relevant_links(url)
    result = f"## Landing Page:\n\n{contents}\n## Relevant Links:\n"
    for link in relevant_links['links']:
        result += f"\n\n### Link: {link['type']}\n"
        result += fetch_website_contents(link["url"])
    return result

In [38]:
print(fetch_page_and_all_relevant_links("https://huggingface.co"))

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 14 relevant links
## Landing Page:

Hugging Face ‚Äì The AI community building the future.

Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
Tongyi-MAI/Z-Image-Turbo
Updated
2 days ago
‚Ä¢
44.5k
‚Ä¢
1.46k
black-forest-labs/FLUX.2-dev
Updated
4 days ago
‚Ä¢
171k
‚Ä¢
760
tencent/HunyuanOCR
Updated
3 days ago
‚Ä¢
92.9k
‚Ä¢
543
deepseek-ai/DeepSeek-Math-V2
Updated
4 days ago
‚Ä¢
3.53k
‚Ä¢
507
microsoft/Fara-7B
Updated
2 days ago
‚Ä¢
10.7k
‚Ä¢
326
Browse 1M+ models
Spaces
Running
on
Zero
306
FLUX.2 [dev]
üíª
306
Generate images from text prompts with optional image editing
Running
on
Zero
MCP
Featured
1.41k
Qwen Image Edit Camera Control
üé¨
1.41k
Fast 4 step inference wit

In [39]:
audit_system_prompt = """
You are a senior consultant in marketing, sales optimization, and website value proposition analysis.

Your task is to analyze all extracted content from the company's website and produce:

1. A diagnosis of the current value proposition.
2. Detected weaknesses (messaging, clarity, UX, trust, structure).
3. Opportunities for quick improvements.
4. Exact recommended phrases, headlines, and improved copy.
5. A 'quick wins' list with actions that can be implemented in 24 hours.

Write in concise, professional English.
Respond in markdown without code blocks.
Base all insights strictly on the extracted website content.
"""

In [40]:
def get_audit_user_prompt(company_name, url):
    user_prompt = f"""
You are analyzing a company called: {company_name}

Here is the content from its landing page and all relevant internal pages.
Use this information to produce a professional website value proposition audit,
including:

- Value proposition diagnosis  
- Messaging clarity issues  
- UX / structure weaknesses  
- Trust & credibility gaps  
- Quick-win improvements  
- Suggested headlines and improved copy  

Respond in markdown without code blocks.

Below is the extracted content:
"""
    user_prompt += fetch_page_and_all_relevant_links(url)
    return user_prompt[:5_000]

In [41]:
get_audit_user_prompt("HuggingFace", "https://huggingface.co")

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 10 relevant links


Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.


'\nYou are analyzing a company called: HuggingFace\n\nHere is the content from its landing page and all relevant internal pages.\nUse this information to produce a professional website value proposition audit,\nincluding:\n\n- Value proposition diagnosis  \n- Messaging clarity issues  \n- UX / structure weaknesses  \n- Trust & credibility gaps  \n- Quick-win improvements  \n- Suggested headlines and improved copy  \n\nRespond in markdown without code blocks.\n\nBelow is the extracted content:\n## Landing Page:\n\nHugging Face ‚Äì The AI community building the future.\n\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nTongyi-MAI/Z-Image-Turbo\nUpdated\n2 days ago\n‚Ä¢\n44.5k\n‚Ä¢\n1.46k\nblack-forest-labs/FLUX.2-dev\nUpdated\n4 da

In [42]:
def create_website_audit(company_name, url):
    response = openai.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[
            {"role": "system", "content": audit_system_prompt},
            {"role": "user", "content": get_audit_user_prompt(company_name, url)}
        ],
    )
    display(Markdown(response.choices[0].message.content))


In [43]:
create_website_audit("HuggingFace", "https://huggingface.co")

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 8 relevant links


# Hugging Face Website Value Proposition Audit

---

## 1. Value Proposition Diagnosis

**Core proposition:**  
Hugging Face positions itself as *‚ÄúThe AI community building the future,‚Äù* serving as a collaborative platform where machine learning (ML) practitioners and enthusiasts share, explore, and experiment with models, datasets, and applications.

Key strengths:  
- Emphasis on community and collaboration.  
- Comprehensive access to over 1 million ML models, datasets, and applications.  
- Multi-modality support (text, image, video, audio, 3D).  
- Free hosting of unlimited public projects plus paid compute and enterprise solutions, targeting both individual users and teams.  
- Clear call to action to explore AI apps or browse models.

The value is mainly community-driven - it‚Äôs a hub for discovery, collaboration, and development of ML projects with open source ethos.

---

## 2. Messaging Clarity Issues

- **Headline vagueness:**  
  "The AI community building the future" is inspirational but abstract; it doesn‚Äôt clearly state *why* the visitor should use Hugging Face or what specific problem it solves.

- **Redundancy and lack of unique benefit focus:**  
  Phrases like "The platform where the machine learning community collaborates on models, datasets, and applications" restate the community aspect but don‚Äôt clarify unique benefits or outcomes for different target audiences (researchers, developers, enterprises).

- **Mixed terminology & call-to-actions:**  
  Multiple CTAs like *Explore AI Apps*, *Browse 1M+ models*, *Sign Up* are presented without clear prioritization or context, which can confuse first-time visitors.

- **Missing differentiated value for paid offerings:**  
  The transition from free open-source resources to *paid Compute and Enterprise solutions* is abrupt and minimally explained. It‚Äôs unclear what advantages paid tiers provide beyond ‚Äúacceleration‚Äù.

---

## 3. UX / Structure Weaknesses

- **Information overload on landing page:**  
  The homepage displays a large volume of trending models, datasets, spaces, and applications upfront. This might overwhelm new users unfamiliar with ML or the platform‚Äôs ecosystem.

- **Lack of segmented navigation:**  
  Users with different intentions (e.g., researchers, enterprises, hobbyists) are not immediately guided to focused content or tailored value propositions.

- **Insufficient onboarding guidance:**  
  The platform could benefit from onboarding prompts for new users to understand how to best use the platform‚Äîe.g., suggestions on where to start, how to create, or how to collaborate.

- **Community aspect under-leveraged:**  
  Although "community" is in the core messaging, there is limited visible emphasis on community success stories, testimonials, or active discussions which reinforce engagement.

---

## 4. Trust & Credibility Gaps

- **No visible social proof or endorsements:**  
  The landing page lacks logos of key enterprise clients, partner institutions, or testimonials that could strengthen credibility.

- **No clear security or compliance information:**  
  For enterprise customers, trust is critical but there‚Äôs no visible mention of data security, privacy, or compliance certifications.

- **Unclear about support & reliability:**  
  There is no immediate reference to uptime guarantees, developer support options, or SLAs for enterprise users.

---

## 5. Quick-Win Improvements

- **Clarify and sharpen the main value proposition statement** to clearly articulate who the platform is for and the outcome they can expect.

- **Simplify and prioritize CTAs**; guide new users explicitly to sign up or explore starter content.

- **Add a concise benefits statement under the main headline** reinforcing unique strengths: e.g., collaborative hub, open-source ecosystem, multi-modal support.

- **Introduce visible social proof** ‚Äî featuring logos of major partners, user metrics, or testimonials.

- **Create segmented user pathways** on the homepage or via quick links for researchers, developers, and enterprise teams.

- **Include brief messaging around enterprise features** such as security, support, and scalability in the pricing and enterprise sections.

---

## 6. Suggested Headlines and Improved Copy

**Current main headline:**  
*‚ÄúThe AI community building the future.‚Äù*

**Improved headline options:**  
- *‚ÄúThe World‚Äôs Leading Collaborative Platform for Machine Learning Innovation.‚Äù*  
- *‚ÄúExplore, Create, and Share Cutting-Edge AI Models and Data ‚Äî Together.‚Äù*  
- *‚ÄúAccelerate Your Machine Learning Projects With the Largest Open AI Community.‚Äù*

**Supporting subheadlines:**  
- *‚ÄúJoin over 1 million ML practitioners sharing models, datasets, and apps across all AI modalities.‚Äù*  
- *‚ÄúHost unlimited public projects for free. Scale with paid compute and enterprise solutions.‚Äù*  
- *‚ÄúFrom research to deployment ‚Äî collaborate, build, and showcase your work in one place.‚Äù*

**Clear CTA options:**  
- *Get Started ‚Äî Browse AI Models & Datasets*  
- *Create Your First ML Space*  
- *Explore Enterprise Solutions*  

**Revised enterprise copy snippet:**  
*‚ÄúNeed scalable compute and enterprise-grade support? Hugging Face provides secure, reliable hosting and tailored solutions to accelerate your team‚Äôs AI projects.‚Äù*

**Social proof block example:**  
*Trusted by leading AI innovators worldwide: Microsoft, NVIDIA, OpenAI, Tencent*  
*(Logos displayed)*

---

## 7. Quick Wins List (Implement within 24 hours)

- Replace the headline on the landing page with a clearer, benefit-driven statement.  
- Add a brief 1-2 line subheadline clarifying who the platform serves and the main value it delivers.  
- Prioritize two main CTAs placed prominently (e.g., "Get Started" and "Explore Models"). Remove clutter around CTAs.  
- Introduce a small ‚ÄúTrusted by‚Äù section with partner or client logos to build immediate credibility.  
- Add tooltip or microcopy guiding new users on next steps (e.g., ‚ÄúNot sure where to start? Try browsing top models‚Äù).  
- Organize the ‚ÄúTrending‚Äù section to show fewer items initially with an option to expand ‚Äî reducing cognitive load.  
- Insert a short ‚ÄúAbout Enterprise‚Äù link or highlight near pricing to address business user needs quickly.

---

# Summary

Hugging Face‚Äôs website has a strong community and open-source focus but suffers from vague messaging and UX clutter that dilute its core value proposition. By sharpening the headline, prioritizing CTAs, highlighting trust signals, and guiding diverse user types with clearer pathways, the site can significantly improve clarity, engagement, and conversion for both individual and enterprise customers. Quick wins center on simplifying messaging, boosting credibility, and enhancing navigation without major redesign.