# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [12]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI
#import ollama

In [2]:
OLLAMA_API = "http://localhost:11434/api/chat"
MODEL = "llama3.2"

# Let's just make sure the model is loaded

!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest â ‹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ™ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ¹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ¸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ¼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ´ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ¦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â § [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ‡ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â � [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ‹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ™ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ¹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ¸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ¼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ´ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â ¦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest â § [K[?25h

In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [None]:
ed = Website("https://edwarddonner.com")
ed.links

## First step: Retrieve relevant links from website


In [24]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages. Remove any blog pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [None]:
print(link_system_prompt)

In [33]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, press contact email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [None]:
print(get_links_user_prompt(ed))

In [None]:
openai = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [36]:
get_links("https://huggingface.co")

{'links': [{'type': 'About page', 'url': 'https://huggingface.co/'},
  {'type': 'Company page', 'url': 'https://huggingface.co brand'},
  {'type': 'Careers/Jobs page',
   'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'Pricing page',
   'url': 'https://ui.endpoints.huggingface.co/chat pricing#endpoints'}]}

## Second step: generate brochure



In [27]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [None]:
print(get_all_details("https://huggingface.co"))

In [None]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."


In [38]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [39]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'About page', 'url': 'https://huggingface.co'}, {'type': 'Company website', 'url': 'https://huggingface.co'}, {'type': 'Careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter profile', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'Discord server (join link)', 'url': 'https://huggingface.co/join/discord'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nQwen/Qwen2.5-Omni-7B\nUpdated\n6 days ago\n•\n90k\n•\n1.21k\ndeepseek-ai/DeepSeek-V3-0324\nUpdated\n10 days ago\n•\n158k\n•\n2.36k\nmeta-llama/Llama-4-Scout-17B-16E-Instruct\nUpdated\nabout 12 hours ago\n•\n16k\n•\n326\nall-hands/openhands-lm-32b-v0.1\nUpdated\n3 days ago\n•\n3.62k\n•\n271\nopenfree/flux-chatgpt-ghibli-lora\nUpdated\n1 day ago\n•\n6.7k\n•\n275\nBrowse 1M+ models\nSpaces\nRunni

In [41]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ]
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [44]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'company page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers/job page', 'url': 'https://apply.workable.com/huggingface/'}]}


**Hugging Face: The AI Community Building the Future**
======================================================

**Welcome to Hugging Face, where machine learning and community collaboration come together.**

### Our Mission

We harness the power of open-source AI tools to empower researchers, developers, and practitioners globally. By unleashing the full potential of machine learning, we're shaping a more intelligent future for all.

### Products and Services

*   **Hugging Face Hub**: A platform offering 1M+ pre-trained models, datasets, and applications that facilitate easy discovery, integration, and collaboration.
*   **Spaces**: An open-source framework allowing users to host, deploy, and manage AI workloads anywhere.
*   **Compute**: Efficient inference endpoints for accelerating model training with optimized computing resources.

### Our Story

Since our inception in 2016, Hugging Face has grown to become the largest open-source AI platform. With over 50,000 organizations using our services, we're redefining how machine learning is explored and utilized across industries.

We take pride in providing:

*   **Access**: Free access to a vast library of pre-trained models and datasets
*   **Accelerate**: Computational acceleration via optimized inference endpoints and compute resources

### Community Support

Engage with our vibrant community at [GitHub](https://github.com), where you'll find an extensive collection of open-source projects, tutorials, and collaboration opportunities.

Develop your expertise:

*   **Transformers**: Learn state-of-the-art transformers for text classification, language modeling, image captioning, and more.
*   **Diffusers**: Discover cutting-edge diffusion models and advanced training techniques for generating realistic images and videos.

Our API-focused community enables seamless integration of AI applications into various applications with ease.

### Join Our Journey

At Hugging Face, we believe that collaboration and innovation are key to shaping a better future for all. If you share our vision and value open-source contributions, we invite you to join us on this exciting journey!

[**Career Opportunities**](#)

[Internships, Jobs]( Careers.md )

[GitHub Repository (link)](https://github.com/huggingface/)

[Discord Server Join](https://discord.huggingface.co/)