We are building a simple yet powerful brochure generator that creates professional brochures for companies to share with clients, investors, and potential recruits. Given a company name and its primary website, the tool automatically scrapes content from the site and all internal links to gather comprehensive, up-to-date information. It then organizes this data into a clean, well-structured brochure that highlights key offerings and achievements

In [35]:
import os
import json 
from dotenv import load_dotenv
import requests 
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from typing import List
from bs4 import BeautifulSoup

In [36]:
load_dotenv(override=True)
api_key=os.getenv("OPENAI_API_KEY")

if api_key and api_key.startswith("sk-proj-") and len(api_key)>10:
     print("API key is found ")
else:
    print("there is a problem with your API key")

MODEL="gpt-4o-mini"
openai=OpenAI()

API key is found 


In [37]:
headers={
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win 64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    def __init__(self, url):
        self.url=url
        response=requests.get(url, headers=headers)
        self.body=response.content
        soup=BeautifulSoup(self.body, 'html.parser')
        self.title= soup.title.string if soup.title else "No title Found"

        if soup.body:
            for irrelevant in soup.body (["script", "style", "img", "input"]):
                irrelevant.decompose
            self.text= soup.body.get_text(separator="\n", strip=True)
        else:
            self.text=""
        links=[link.get('href') for link in soup.find_all('a')]
        self.links=[link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWepage Contents:\n{self.text}\n\n"        
            

In [38]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [39]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [40]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [41]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [42]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [43]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [44]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [45]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [47]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/about'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}


# Hugging Face Brochure

### The AI Community Building the Future

Welcome to **Hugging Face**, a vibrant platform where the machine learning community collaborates on cutting-edge models, datasets, and applications. Our mission is to accelerate the development and deployment of AI technologies through community-driven projects and open-source solutions.

---

## Key Offerings

### Models
Explore **1M+ models** shared by the community. Whether it's text, image, video, audio, or even 3D, Hugging Face offers a wide array of state-of-the-art machine learning models catering to every need.

### Datasets
With over **250k datasets** available, Hugging Face is an invaluable resource for those looking to access and share datasets for various ML tasks.

### Spaces
Our platform hosts **400k+ applications**, allowing users to run and share their creations with a global audience. Build applications that utilize the latest AI technology with ease.

### Enterprise Solutions
We specialize in providing robust solutions for organizations of all sizes. Our **Enterprise Hub** offers:
- Advanced security and access controls.
- Dedicated support for a seamless AI integration experience.
- Starting at just **$20/user/month**.

---

## Our Community

Hugging Face is home to **more than 50,000 organizations**, including industry giants like:
- **Meta**
- **Amazon**
- **Google**
- **Microsoft**

Engaging with our community means collaborating with the best minds in AI, sharing insights, and accelerating your journey in machine learning.

---

## Company Culture

At Hugging Face, we value:
- **Collaboration**: Work seamlessly with a diverse team and community.
- **Innovation**: Be part of a platform that sets the standard in AI and ML technologies.
- **Growth**: Build your professional portfolio and expertise in a rapidly evolving field.

Join us in building the future of AI!

---

## Careers at Hugging Face

Hugging Face is constantly on the lookout for talented individuals to join our dynamic team. By becoming a part of our company, you can expect:
- Opportunities to work on meaningful projects within the AI space.
- A supportive environment that encourages creativity and collaboration.
- Competitive compensation and benefits along with a flexible work culture.

### Current Openings
Check our careers page for the latest job opportunities and join us in shaping the future of AI.

---

## Connect With Us

- **Website**: [huggingface.co](https://huggingface.co)
- **Social Media**: Follow us on [Twitter](https://twitter.com/huggingface), [LinkedIn](https://linkedin.com/company/huggingface), and [Discord](https://discord.gg/huggingface).

Join the Hugging Face community today and immerse yourself in the world of AI innovation!