# A full business solution

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

In [1]:
# Importing the Libraries

import requests
import json
import time
import re
from typing import List
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display

In [2]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """
    def __init__(self, url):
        self.url = url
        response = requests.get(headers=headers, url=self.url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No Title Found"
        if soup.body:
            for irrelevent in soup.body(["script", "style", "img", "input"]):
                irrelevent.decompose()
            self.text = soup.body.get_text(separator='\n', strip=True)
        else:
            self.text = ""

        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"


In [3]:
test_url = Website("https://edwarddonner.com")
print(test_url.get_contents()[:100])
print(test_url.links)

Webpage Title:
Home - Edward Donner
Webpage Contents:
Home
Connect Four
Outsmart
An arena that pits 
['https://edwarddonner.com/', 'https://edwarddonner.com/connect-four/', 'https://edwarddonner.com/outsmart/', 'https://edwarddonner.com/about-me-and-about-nebula/', 'https://edwarddonner.com/posts/', 'https://edwarddonner.com/', 'https://news.ycombinator.com', 'https://nebula.io/?utm_source=ed&utm_medium=referral', 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html', 'https://patents.google.com/patent/US20210049536A1/', 'https://www.linkedin.com/in/eddonner/', 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/', 'https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/', 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/', 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expe

# First step: Have Mistral/Qwen figure out which links are relevant
Use a call to Mistral/Qwen to read the links on a webpage, and respond in structured JSON.

It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".

We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [4]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages and don't give urls which is not starting with https.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [5]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages and don't give urls which is not starting with https.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



In [6]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [7]:
print(get_links_user_prompt(test_url))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
https://edwarddonner.com/2025/09/15/ai-in-production-gen-ai-and-agentic-ai-on-aws-at-scale/
https://edward

In [8]:
MODEL_NAME="qwen3"  # Get this from you local ollama by using "ollama list"
OLLAMA_URL = "http://localhost:11434/api/chat"

In [9]:
def get_links(url):
    website = Website(url)
    payload = {
        "model": MODEL_NAME,
        "messages": [
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
        ],
        "stream": False  # Important: to get streaming responses
    }
    start_time = time.time()
    response = requests.post(OLLAMA_URL, json=payload, stream=False)
    print("Total Time Taken: ", time.time()-start_time)
    reply = response.json()["message"]["content"]
    reply = re.sub(r"<think>.*?</think>", "", reply, flags=re.DOTALL).strip()
    return reply

In [10]:
print(get_links("https://huggingface.co"))

Total Time Taken:  317.3220841884613
{
    "links": [
        {"type": "homepage", "url": "https://huggingface.co/"},
        {"type": "models page", "url": "https://huggingface.co/models"},
        {"type": "datasets page", "url": "https://huggingface.co/datasets"},
        {"type": "spaces page", "url": "https://huggingface.co/spaces"},
        {"type": "documentation", "url": "https://huggingface.co/docs"},
        {"type": "enterprise page", "url": "https://huggingface.co/enterprise"},
        {"type": "pricing page", "url": "https://hugging.face.co/pricing"},
        {"type": "community", "url": "https://huggingface.co/chat"},
        {"type": "blog", "url": "https://huggingface.co/blog"},
        {"type": "social media", "url": "https://twitter.com/huggingface"},
        {"type": "social media", "url": "https://www.linkedin.com/company/huggingface/"},
        {"type": "github", "url": "https://github.com/huggingface"},
        {"type": "careers page", "url": "https://apply.workab

## Second step: make the brochure!

In [12]:
def get_all_details(url):
    result = "Landing Page:\n"
    result += Website(url).get_contents()
    links = json.loads(get_links(url))  # converting the string output to JSON format
    print(f"Found Links: {links}")
    for link in links["links"]:
        result += f"\n\n{link["type"]}\n"
        result += Website(link["url"]).get_contents()
    return result

In [15]:
print(get_all_details("https://huggingface.co"))

Total Time Taken:  111.2578513622284
Found Links: {'links': [{'type': 'homepage', 'url': 'https://huggingface.co/'}, {'type': 'company page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise services', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing', 'url': 'https://huggingface.co/pricing'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'social media', 'url': 'https://twitter.com/huggingface'}, {'type': 'social media', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'code repository', 'url': 'https://github.com/huggingface'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}]}
Landing Page:
Webpage Title:
Hugging Face ‚Äì The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine 

In [16]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [17]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20000]
    return user_prompt

In [18]:
print(get_brochure_user_prompt(company_name="Hugging Face", url="https://huggingface.co"))

Total Time Taken:  185.34936714172363
Found Links: {'links': [{'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'about page', 'url': 'https://huggingface.co/brand'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}]}
You are looking at a company called: Hugging Face
Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.
Landing Page:
Webpage Title:
Hugging Face ‚Äì The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
deepseek-ai/DeepSeek-OCR
Updated
4 days ago
‚Ä¢
1.18M
‚Ä¢
2.18k
MiniMaxAI/MiniMax-M

In [19]:
def create_brochure(company_name, url):
    payload = {
        "model": "mistral",
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
        "stream": False  # Important: to get streaming responses
    }
    start_time = time.time()
    response = requests.post(OLLAMA_URL, json=payload, stream=False)
    print("Total Time Taken: ", time.time()-start_time)
    reply = response.json()["message"]["content"]
    reply = re.sub(r"<think>.*?</think>", "", reply, flags=re.DOTALL).strip()
    display(Markdown(reply))

In [20]:
create_brochure(company_name="Hugging Face", url="https://huggingface.co")

Total Time Taken:  322.74373054504395
Found Links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'models page', 'url': 'https://huggingface.co/models'}, {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'}, {'type': 'spaces page', 'url': 'https://huggingface.co/spaces'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'learn page', 'url': 'https://huggingface.co/learn'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://huggingface.co/discuss'}, {'type': 'github page', 'url': 'https://huggingface.co/github'}, {'type': 'social media (twitter)', 'url': 'https://twitter.com/huggingface'}, {'type': 'social media (linkedin)', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'status page', 'url': 'https://hu

The **"Krea Realtime"** Space on Hugging Face is likely a specific AI application or tool, though its exact details aren't fully described in the provided text. Here's how to interpret and address the query:

---

### **Key Observations from the Context**
1. **"Running on CPU Upgrade"**  
   This suggests the Space might be optimized for **CPU execution** (e.g., for users without GPU access) or uses a **CPU-specific configuration**. Some Hugging Face Spaces are designed to run efficiently on CPUs, though performance may vary compared to GPU-accelerated versions.

2. **Possible Purpose of "Krea Realtime"**  
   While not explicitly stated, the name implies it could be related to **real-time AI processing**, such as:
   - Real-time video/image generation.
   - Live data analysis or inference.
   - Interactive tools for creative tasks (e.g., editing, animation, or simulation).

3. **Location in the Spaces Directory**  
   The Space is listed under the **"All running apps"** section, indicating it is currently active and accessible via Hugging Face's Spaces platform.

---

### **How to Explore or Use "Krea Realtime"**
1. **Access the Space**  
   - Navigate to [Hugging Face Spaces](https://huggingface.co/spaces).
   - Search for **"Krea Realtime"** in the search bar.
   - Click the Space to view its description, documentation, and interface.

2. **Check for Documentation**  
   - Look for a **README.md** or **usage instructions** in the Space's repository.
   - If unavailable, check the **"About"** section or community discussions for details.

3. **CPU vs. GPU Performance**  
   - If the Space is optimized for CPU, it may have lower latency or require less hardware resources.
   - For GPU acceleration, ensure your environment supports it (e.g., using a GPU instance on Hugging Face).

4. **Community and Support**  
   - Engage with the Space's creators via **Discussions** or **Issues** on the repository.
   - Check for tutorials or examples in the **"Files"** section of the Space.

---

### **If You're Facing Issues**
- **Performance Concerns**: If the Space is slow on CPU, consider using a GPU-powered instance (e.g., via Hugging Face's **Inference API** or **Spaces with GPU support**).
- **Missing Features**: If the Space lacks documentation, request clarification from the maintainers or check for related Spaces (e.g., "Krea Realtime" might be part of a series).

---

### **Example Use Cases**
- **Real-Time Video Generation**: If the Space uses a model like **Wan2.2** or **Qwen Image Edit**, it might generate videos from text/images in real time.
- **Creative Tools**: It could be an interactive app for editing images, generating art, or simulating environments.
- **Data Analysis**: A tool for real-time data visualization or anomaly detection.

---

### **Next Steps**
1. **Visit the Space Directly**: Use the Hugging Face Spaces search to locate "Krea Realtime."
2. **Check for Updates**: The Space might have been updated recently (e.g., "about 17 hours ago" in the example).
3. **Engage with the Community**: If the Space is under active development, reach out to the creators for insights.

If you can provide more details about the Space (e.g., its description or repository link), I can offer more specific guidance!