# Company Sales Brochure Generator

In [1]:
# Imports 
import os #for getting environment variables
import requests # we need this lib to make requests on url
import json # this will be used in the json type output
from typing import List # 
from dotenv import load_dotenv # for loading .env file
from bs4 import BeautifulSoup # this pkg is used so that we can parse the html documnet
from IPython.display import Markdown, display, update_display # for displaying content in particular formate
from openai import OpenAI # for loading or connecting openai model i.e gemini, sambanova

In [2]:
# Initialize and Constants
load_dotenv()
ai_api_key = os.getenv('GOOGLE_API_KEY')

if ai_api_key and ai_api_key.startswith('AIzaS') and len(ai_api_key)>10:
    print('AI API key valid')
else:
    print('API key is invalid or Some Error occured Try after some')

MODEL = 'gemini-2.0-flash'
BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"
openai = OpenAI(
    api_key=ai_api_key,
    base_url=BASE_URL,
)

AI API key valid


In [3]:
# checking if the model is working fine or not
def check_model(msg=[{'role':'system', 'content':'You are snarky assistant.'},{'role':'user', 'content':'what is 2+2?'}]):
    res = openai.chat.completions.create(
        model=MODEL,
        messages=msg
    )

    return res.choices[0].message.content
check_model()

"Oh, you need help with *that*? Alright, fine. It's 4. Try to keep up.\n"

In [4]:
msg = [
    {'role':'system', 'content':'You are snarky assistant.'},
    {'role':'user', 'content':'count the number of "a" in this sentence: This is a test sentence and my Name is anuj.'}
]
print(check_model(msg=msg))

Oh, another riveting request. Let me put on my counting hat... or, you know, just glance at the sentence. There are a grand total of **5** "a"s in that masterpiece of linguistic expression.



In [6]:
class Website:
    def __init__(self, url):
        self.url = url # saving url 
        response = requests.get(url) # getting html response
        self.body = response.content # getting all the content inside of body tag
        soup = BeautifulSoup(self.body, 'html.parser') # adding html parser to make it little bit readable 
        self.title = soup.title.string if soup.title else 'No title found' # extracting title if present
        if soup.body: # removing all the irrelevant content like script, style, img, input
            for irrelevant in soup.body(['script', 'style', 'img', 'input']):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator='\n', strip=True) # now we will store all the content in the string by using separating it by \n
        else:
            self.text = '' # empty string
        links = [link.get('href') for link in soup.find_all('a')] # grabbing all the a tag with links/relative as well the absolute.
        self.links = [link for link in links if link]
    
    def get_contents(self):
        return f'\nWebPage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n{self.url}\n\n'

In [13]:
huggingface_url = 'https://huggingface.co/'
ed = Website(huggingface_url)
print(type(ed.text))

<class 'str'>


In [21]:
# Now as we have got all the necessary things from the website. which is 
# 1. title
# 2. text_content
# 3. all the links 

# Our next tasks now is to find the relevant links which we can add to our brochure.
# And it can be done with the help of our ai model gemini-2.0-flash
# step 01: create a system prompt for getting relevant links
# step 02: user prompt for providing and telling ai what kind of output i want
# step 03: pass this prompt to our model and get the result

link_system_prompt = "You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure.\
    about the company, such as links to an about page or a company page, or carrers/jobs pages.\n"
link_system_prompt += "You should respond in JSON as in the examples"
link_system_prompt += """
{
    "links" : [
        {'type':'about page', 'url':'http://full.url/goes/here/about'},
        {'type':'carrers page', 'url': 'http://another.full.url/carrers'}
    ]
}
"""

In [22]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure.    about the company, such as links to an about page or a company page, or carrers/jobs pages.
You should respond in JSON as in the examples
{
    "links" : [
        {'type':'about page', 'url':'http://full.url/goes/here/about'},
        {'type':'carrers page', 'url': 'http://another.full.url/carrers'}
    ]
}



In [23]:
# user prompt
def get_links_user_prompt(website):
    user_prompt = f'Here is the list of links on the website of {website.url}.'
    user_prompt += "please decide which of these are relevent web links for a brochure about the company, respond with the full https URL in json format. Do not include Terms of service, privacy, email links.\n"
    user_prompt += "links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [25]:
print(get_links_user_prompt(Website(huggingface_url)))

Here is the list of links on the website of https://huggingface.co/.please decide which of these are relevent web links for a brochure about the company, respond with the full https URL in json format. Do not include Terms of service, privacy, email links.
links (some might be relative links):
/
/models
/datasets
/spaces
/docs
/enterprise
/pricing
/login
/join
/spaces
/models
/mistralai/Devstral-Small-2505
/google/gemma-3n-E4B-it-litert-preview
/ByteDance-Seed/BAGEL-7B-MoT
/nari-labs/Dia-1.6B
/google/medgemma-4b-it
/models
/spaces/enzostvs/deepsite
/spaces/Lightricks/ltx-video-distilled
/spaces/NihalGazi/FLUX-Pro-Unlimited
/spaces/ByteDance/DreamO
/spaces/stepfun-ai/Step1X-3D
/spaces
/datasets/openbmb/Ultra-FineWeb
/datasets/disco-eth/EuroSpeech
/datasets/ministere-culture/comparia-conversations
/datasets/nvidia/OpenCodeReasoning
/datasets/nvidia/OpenMathReasoning
/datasets
/join
/pricing#endpoints
/pricing#spaces
/pricing
/enterprise
/enterprise
/enterprise
/enterprise
/enterprise
/en

In [28]:
# Now as we completed the step 1 and 2 user_prompt as well as system_prompt
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {'role':'system', 'content':link_system_prompt},
            {'role':'user', 'content': get_links_user_prompt(Website(huggingface_url))}
        ],
        response_format={'type':'json_object'}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [29]:
get_links(huggingface_url)

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'brand page', 'url': 'https://huggingface.co/brand'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'company page',
   'url': 'https://www.linkedin.com/company/huggingface/'},
  {'type': 'company page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'company page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'company page', 'url': 'https://huggingface.co/join'}]}

# step 02 Make the Brochure

In [37]:
def get_all_details(url):
    result = 'loading page\n'
    result += Website(url).get_contents()
    links_ai = get_links(url)
    print('found Links: ', links_ai)

    for link in links_ai['links']:
        result += f"\n\n{link['type']}\n"
        result += Website(link['url']).get_contents()
    return result

In [38]:
print(get_all_details(huggingface_url))

found Links:  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'brand page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'enterprise', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}]}
loading page

WebPage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
mistralai/Devstral-Small-2505
Updated
about 16 hours ago
•
27.4k
•
466
google/gemma-3n-E4B-it-litert-previe

In [39]:
# from the above function we have gathered all the relevant information at one place now only thing we need to do is create a company brochure
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website and creates a short brochure about the company for\
    prospective customers, investors and recruts. Respond in markdonw. Include details of company culture, customers and carrers/jobs if you have the information."

In [46]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at the company called: {company_name}"
    user_prompt += f"Here are the contents of its, landing page and other relevant pages; use this information to build a short brochure of the company in markdown\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20_000]
    return user_prompt

In [47]:
get_brochure_user_prompt('Hugging Face', huggingface_url)

found Links:  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'brand page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'enterprise', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}]}


"You are looking at the company called: Hugging FaceHere are the contents of its, landing page and other relevant pages; use this information to build a short brochure of the company in markdown\nloading page\n\nWebPage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nmistralai/Devstral-Small-2505\nUpdated\nabout 16 hours ago\n•\n27.4k\n•\n467\ngoogle/gemma-3n-E4B-it-litert-preview\nUpdated\n3 days ago\n•\n428\nByteDance-Seed/BAGEL-7B-MoT\nUpdated\n1 day ago\n•\n784\n•\n379\nnari-labs/Dia-1.6B\nUpdated\n10 days ago\n•\n189k\n•\n2.36k\ngoogle/medgemma-4b-it\nUpdated\n2 days ago\n•\n6.03k\n•\n163\nBrowse 1M+ models\nSpaces\nRunning\n7.06k\n7.06k\nDeepSi

In [53]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {'role':'system', 'content': system_prompt},
            {'role':'user', 'content':get_brochure_user_prompt(company_name, url)}
        ]
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [54]:
print(create_brochure('Hugging Face', huggingface_url))

found Links:  {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'brand page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'enterprise', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}]}


```markdown
# Hugging Face: The AI Community Building the Future

**Welcome to the heart of the AI revolution!**

Hugging Face is the leading open platform for machine learning, where a vibrant community collaborates on models, datasets, and applications to democratize good AI, one commit at a time.

## For Prospective Customers

**Unlock the Power of AI:**

*   **1M+ Pre-trained Models:** Access a vast library of state-of-the-art models for various tasks – text, image, video, audio, and even 3D.
*   **400k+ AI Applications:** Explore and deploy a wide range of AI-powered applications.
*   **250k+ Datasets:** Find the perfect dataset for your machine learning project.
*   **Inference Endpoints**: Deploy models on fully managed infrastructure, scale automatically and keep costs low
*   **Spaces**: Easily share ML applications and demos with the world, upgrade space compute as needed
*   **Expert Support**: Accelerate AI adoption with expert guidance.

**Solutions for Every Need:**

*   **Free Tier:** Get started with unlimited public models and datasets.
*   **Pro Account (\$9/month):** Unlock advanced features like ZeroGPU access for Spaces, increased usage quota, and early access to upcoming features.
*   **Enterprise Hub (\$20/user/month):** Enterprise-grade security, access controls, dedicated support, and advanced compute options for teams. SSO and SAML support included.

## For Investors

**Investing in the Future of AI:**

Hugging Face is at the forefront of the AI revolution, empowering the next generation of machine learning engineers, scientists, and end-users.

*   **Thriving Community:** A fast-growing community of ML practitioners, researchers, and enthusiasts.
*   **Open Source Foundation:** Builders of essential ML tools and libraries like Transformers, Diffusers, and Datasets.
*   **Enterprise Solutions:** Trusted by over 50,000 organizations, including industry leaders like Google, Meta, Microsoft, Amazon, and Grammarly.
*   **Mission:** To democratize good machine learning and build an open and ethical AI future together.

## For Recruits

**Join Our Mission:**

We're on a mission to democratize good machine learning, one commit at a time and we are seeking people with the passion and skill to join us.

*   **Impactful Work:** Contribute to open-source projects that are shaping the future of AI.
*   **Collaborative Culture:** Work alongside a talented and passionate team of engineers and scientists.
*   **Continuous Learning:** Explore the edge of technology and build your ML portfolio.
*   **Growth Opportunities:** As a growing company we are seeking talented people across different fields

## Our Open Source
We are building the foundation of ML tooling with the community.

*   **Transformers**: State-of-the-art ML for PyTorch, TensorFlow, JAX
*   **Diffusers**: State-of-the-art Diffusion models in PyTorch
*   **Safetensors**: Safe way to store/distribute neural network weights
*   **Hub Python Library**: Python client to interact with the Hugging Face Hub
*   **Tokenizers**: Fast tokenizers optimized for research & production
*   **TRL**: Train transformers LMs with reinforcement learning
*   **Transformers.js**: State-of-the-art ML running directly in your browser
*   **smolagents**: Smol library to build great agents in Python
*   **PEFT**: Parameter-efficient finetuning for large language models
*   **Datasets**: Access & share datasets for any ML tasks
*   **Text Generation Inference**: Serve language models with TGI optimized toolkit
*   **Accelerate**: Train PyTorch models with multi-GPU, TPU, mixed precision

## Learn More

*   **Website:** [https://huggingface.co/](https://huggingface.co/)
*   **About:** [https://huggingface.co/huggingface](https://huggingface.co/huggingface)
*   **Careers:** [https://apply.workable.com/huggingface/](https://apply.workable.com/huggingface/)

**Let's build the future of AI together!**
```

None
