This notebook is about creating a brochure for a company.
We use GPT-4o-mini to help us sort the important parts of the website.

In [1]:
# importing tools

import os  # Helps us work with files and environment variables
import requests  # Lets us fetch data from websites
import json  # Helps us work with JSON (a way to store data)
from typing import List  # Helps define lists in code
from dotenv import load_dotenv  # Loads secret keys from a `.env` file
from bs4 import BeautifulSoup  # Helps extract information from websites
from IPython.display import Markdown, display, update_display  # Displays results nicely in Jupyter
from openai import OpenAI  # Lets us use GPT-4o-mini

In [2]:
# setting up API key - a secret password to use GPT-4o-mini

load_dotenv(override=True)  # Load the `.env` file
api_key = os.getenv('OPENAI_API_KEY')  # Get the API key from the file

if api_key and api_key.startswith('sk-') and len(api_key) > 10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key!")

API key looks good so far


In [3]:
# setting up our AI model

MODEL = 'gpt-4o-mini'  # The AI model we want to use
openai = OpenAI(api_key=api_key)  # Connect to GPT-4o-mini using the API key

In [4]:
# fetching and cleaning a website
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)  # Fetch the webpage
        self.body = response.content  # Get the raw content
        soup = BeautifulSoup(self.body, 'html.parser')  # Parse the webpage
        self.title = soup.title.string if soup.title else "No title found"  # Get the title
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()  # Remove unnecessary parts
            self.text = soup.body.get_text(separator="\n", strip=True)  # Get the main text
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]  # Find all links
        self.links = [link for link in links if link]  # Keep only valid links
   
    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpageContents:\n{self.text}\n\n"

# we get the title, main text and all the links on the page

In [5]:
devxpress = Website("https://devxpress.ca/") # fetch the website
devxpress.links # show all the linked on the website

['index.php?lang=',
 'index.php?lang=',
 'a-propos.php?lang=',
 '#.',
 'conception-web.php?lang=',
 'optimisation-SEO.php?lang=',
 'reseaux-sociaux.php?lang=',
 'campagne-publicitaire.php?lang=',
 'application-web.php?lang=',
 'application-mobile.php?lang=',
 'portfolio.php?lang=',
 'blogs.php?lang=',
 'contact.php?lang=',
 'https://www.facebook.com/DevXpressInc/',
 'https://www.instagram.com/devxpress.ca/',
 'https://www.linkedin.com/company/devxpress/',
 'contact.php?lang=',
 'blog.php?lang=en&blogId=yro9jwUOHATDJhIsYV5f',
 'blog.php?lang=en&blogId=S1qFrJWqbLA3EcP3rdAD',
 'portfolio-project.php?lang=en&projectId=1rEVGsCUkBgCTDi6jKq1',
 'portfolio-project.php?lang=en&projectId=eLSIqT8ZLDnkF0ieh4uQ',
 'portfolio-project.php?lang=en&projectId=ysmW16LKXcBRUrx1pNNX',
 'blog.php?lang=en&blogId=Y1hPMENPmBG1CSHXsqTq',
 'blogs.php?lang=',
 'portfolio.php?lang=',
 'https://www.facebook.com/DevXpressInc/',
 'https://www.instagram.com/devxpress.ca/',
 'https://www.linkedin.com/company/devxpress/

Step 1: GPT-4o-mini to figure out which links are relevant

In [6]:
# deciding which links are important

link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [7]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



In [8]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} -"
    user_prompt += "please decide which of there are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privact, email links. \n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links) # Add all the links, separated by newlines
    return user_prompt

In [9]:
print(get_links_user_prompt(devxpress))

Here is the list of links on the website of https://devxpress.ca/ -please decide which of there are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privact, email links. 
Links (some might be relative links):
index.php?lang=
index.php?lang=
a-propos.php?lang=
#.
conception-web.php?lang=
optimisation-SEO.php?lang=
reseaux-sociaux.php?lang=
campagne-publicitaire.php?lang=
application-web.php?lang=
application-mobile.php?lang=
portfolio.php?lang=
blogs.php?lang=
contact.php?lang=
https://www.facebook.com/DevXpressInc/
https://www.instagram.com/devxpress.ca/
https://www.linkedin.com/company/devxpress/
contact.php?lang=
blog.php?lang=en&blogId=yro9jwUOHATDJhIsYV5f
blog.php?lang=en&blogId=S1qFrJWqbLA3EcP3rdAD
portfolio-project.php?lang=en&projectId=1rEVGsCUkBgCTDi6jKq1
portfolio-project.php?lang=en&projectId=eLSIqT8ZLDnkF0ieh4uQ
portfolio-project.php?lang=en&projectId=ysmW16LKXcBRUrx1pNNX
blog.php?lang=en&b

In [10]:
def get_links(url):
    try:
        website = Website(url)
        response = openai.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": link_system_prompt},
                {"role": "user", "content": get_links_user_prompt(website)}
            ],
            response_format={"type": "json_object"}  # Ensure JSON response
        )
        result = response.choices[0].message.content
        parsed_result = json.loads(result)
        if "links" not in parsed_result:
            print("Warning: Response missing 'links' key")
            return {"links": []}
        return parsed_result
    except json.JSONDecodeError as e:
        print(f"JSON parsing error: {e}")
        return {"links": []}
    except Exception as e:
        print(f"Error in get_links: {e}")
        return {"links": []}

In [11]:
get_links("https://devxpress.ca/")

{'links': [{'type': 'about page',
   'url': 'https://devxpress.ca/a-propos.php?lang='},
  {'type': 'portfolio page',
   'url': 'https://devxpress.ca/portfolio.php?lang='},
  {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='},
  {'type': 'blog page', 'url': 'https://devxpress.ca/blogs.php?lang='},
  {'type': 'portfolio project',
   'url': 'https://devxpress.ca/portfolio-project.php?lang=en&projectId=1rEVGsCUkBgCTDi6jKq1'},
  {'type': 'portfolio project',
   'url': 'https://devxpress.ca/portfolio-project.php?lang=en&projectId=eLSIqT8ZLDnkF0ieh4uQ'},
  {'type': 'portfolio project',
   'url': 'https://devxpress.ca/portfolio-project.php?lang=en&projectId=ysmW16LKXcBRUrx1pNNX'},
  {'type': 'blog post',
   'url': 'https://devxpress.ca/blog.php?lang=en&blogId=yro9jwUOHATDJhIsYV5f'},
  {'type': 'blog post',
   'url': 'https://devxpress.ca/blog.php?lang=en&blogId=S1qFrJWqbLA3EcP3rdAD'},
  {'type': 'blog post',
   'url': 'https://devxpress.ca/blog.php?lang=en&blogId=Y1hPMEN

Step 2: Make a brochure

In [12]:
def get_all_details(url):
    try:
        # Get the landing page
        result = "Landing page:\n"
        result += Website(url).get_contents()
        
        # Get and validate links
        links = get_links(url)
        if not links or "links" not in links:
            print("Warning: No valid links found")
            return result
        
        print("Found links:", links)
        
        # Process each link with error handling
        for link in links["links"]:
            try:
                # Ensure the URL is absolute
                link_url = link["url"]
                if not link_url.startswith(('http://', 'https://')):
                    link_url = f"{url.rstrip('/')}/{link_url.lstrip('/')}"
                
                result += f"\n\n{link['type']}\n"
                result += Website(link_url).get_contents()
            except Exception as e:
                print(f"Warning: Could not fetch content for {link.get('url', 'unknown URL')}: {str(e)}")
                continue
                
        return result
    except Exception as e:
        print(f"Error in get_all_details: {str(e)}")
        return "Error occurred while generating brochure."

In [13]:
print(get_all_details("https://devxpress.ca/"))

Found links: {'links': [{'type': 'about page', 'url': 'https://devxpress.ca/a-propos.php?lang='}, {'type': 'portfolio page', 'url': 'https://devxpress.ca/portfolio.php?lang='}, {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='}, {'type': 'careers page', 'url': 'https://devxpress.ca/careers.php?lang='}]}
Landing page:
Webpage Title:
DevXpress | Agence Web
WebpageContents:
Loading...
Home
About
Services
Web Design
SEO Optimization
Social Networks
Advertising Campaigns
Web application
Mobile app
Portfolio
Blog
Contact us
Facebook
Instagram
LinkedIn
Get a quote
DevXpress
We are
Your trusted partners for the creation of websites, technological solutions and digital strategies.
Our mission
Our mission is to create websites that captivate and engage users. We aim to deliver a seamless user experience, with cutting-edge design and clean, high-performance code.
Our
Values
Quality & Innovation
Collaboration & Agility
Integrity & Customer Service
Recent Projects
- 11 Aug 20

In [None]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [16]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [17]:
get_brochure_user_prompt("DevXpress", "https://devxpress.ca/")

Found links: {'links': [{'type': 'about page', 'url': 'https://devxpress.ca/a-propos.php?lang='}, {'type': 'portfolio page', 'url': 'https://devxpress.ca/portfolio.php?lang='}, {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='}, {'type': 'facebook', 'url': 'https://www.facebook.com/DevXpressInc/'}, {'type': 'instagram', 'url': 'https://www.instagram.com/devxpress.ca/'}, {'type': 'linkedin', 'url': 'https://www.linkedin.com/company/devxpress/'}]}


"You are looking at a company called: DevXpress\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nDevXpress | Agence Web\nWebpageContents:\nLoading...\nHome\nAbout\nServices\nWeb Design\nSEO Optimization\nSocial Networks\nAdvertising Campaigns\nWeb application\nMobile app\nPortfolio\nBlog\nContact us\nFacebook\nInstagram\nLinkedIn\nGet a quote\nDevXpress\nWe are\nYour trusted partners for the creation of websites, technological solutions and digital strategies.\nOur mission\nOur mission is to create websites that captivate and engage users. We aim to deliver a seamless user experience, with cutting-edge design and clean, high-performance code.\nOur\nValues\nQuality & Innovation\nCollaboration & Agility\nIntegrity & Customer Service\nRecent Projects\n- 11 Aug 2024\n- 09 Aug 2024\nAppro Expert\nShowcase site - 17 Jun 2024\nService Renaud\nShowcase site - 13 Jun 20

In [20]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
        response_format={"type": "text"}
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [21]:
create_brochure("DevXpress", "https://devxpress.ca/")

Found links: {'links': [{'type': 'about page', 'url': 'https://devxpress.ca/a-propos.php?lang='}, {'type': 'portfolio page', 'url': 'https://devxpress.ca/portfolio.php?lang='}, {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='}, {'type': 'blog page', 'url': 'https://devxpress.ca/blogs.php?lang='}]}


# Welcome to the Wacky World of **DevXpress!** 🎉

### Your Ultimate Digital Playground

At DevXpress, we’re not just flipping the switch on websites; we’re igniting a dazzling digital bonanza! Established in 2022 amidst the hustle and bustle of Montreal, we’re your trusted partners in crafting cutting-edge websites and tech solutions. Forget those boring, cookie-cutter designs! With us, you'll get services that are as original as that third cousin who keeps wearing socks with sandals.

## Our Mission: 
Create websites that *captivate and engage!* 🕵️‍♂️ Think of us as the digital magicians waving our wands, delivering seamless user experiences with stunning designs and squeaky-clean code. Poof! Your perfect site is born!

## What We Value:
- **Quality & Innovation:** Because who wants second-rate?
- **Collaboration & Agility:** Teamwork makes the dream work—especially when it comes to avoiding Monday morning Zoom calls.
- **Integrity & Customer Service:** We’ll be there faster than you can say, “Do I need to pay for this?”

## Our Clients: 
We’ve got stars in our eyes! 🌟 With over 100 projects and a sparkling reputation of 20+ five-star reviews, we’re trusted by clients like Olivier Beauchamps and Rym Bourezg—who, by the way, totally thinks we’re the bee's knees!

### Feelin' Lucky? Check Out Our Portfolio!
From **Web Design** to **Mobile Apps**, our portfolio is like a box of chocolates—except we promise it won’t melt in your hands. With a plethora of happy clients, we’re ready to create fancy digital treats for everyone.

## Join the Circus! 🎪
### Careers at DevXpress:
Want to hop on our digital rollercoaster? We’re always on the lookout for talent with a sprinkle of pizzazz! If you love innovation, have a knack for creativity, and can tolerate our office puns, shoot us your resume quicker than a speeding browser tab!

### Come Hang Out:
Located in the vibrant heart of Westmount, Montreal, we invite you to swing by or shoot us an email at [hello@devxpress.com](mailto:hello@devxpress.com) for a chat that’s 100% free of sales pitches—unless you're into that sort of thing. 😉

## Connect with Us! 
- 📱 Facebook, Instagram, LinkedIn
- 💌 Send us a message and let’s get the digital party started!

### DevXpress: 
Where we turn your digital dreams into reality—without a wand, just a lot of coffee and maybe a slice of cake! 🍰 

---

_DevXpress: Your delightfully quirky digital service agency—because your website deserves a personality!_

Bonus: typewriter animation

In [30]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
     
    # Process the stream
    for chunk in stream:
        if hasattr(chunk.choices[0].delta, 'content') and chunk.choices[0].delta.content:
            # Get new content
            new_content = chunk.choices[0].delta.content
            # Add to running response
            response += new_content
            # Update the display with markdown formatting
            update_display(Markdown(response), display_id=display_handle.display_id)
            # Small delay for typewriter effect
            import time
            time.sleep(0.01)

In [31]:
stream_brochure("DevXpress", "https://devxpress.ca/")

Found links: {'links': [{'type': 'about page', 'url': 'https://devxpress.ca/a-propos.php?lang='}, {'type': 'portfolio page', 'url': 'https://devxpress.ca/portfolio.php?lang='}, {'type': 'blog page', 'url': 'https://devxpress.ca/blogs.php?lang='}, {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='}, {'type': 'facebook', 'url': 'https://www.facebook.com/DevXpressInc/'}, {'type': 'instagram', 'url': 'https://www.instagram.com/devxpress.ca/'}, {'type': 'linkedin', 'url': 'https://www.linkedin.com/company/devxpress/'}]}


# Welcome to DevXpress! 🚀

### Your Go-To Web Wizards, One Pixel at a Time

#### Who Are We? 
At DevXpress, we're not just a web agency; we're your digital best friends! Founded in 2022 by a group of passionate techies with over 20 years of collaboration under our belts, we sprinkle innovation on the Internet like it's confetti. 💻✨

#### Our Superpowers
We make websites that *attract* users like moths to a flame. With a mission to provide seamless user experiences through captivating designs and turbo-charged code, we are dedicated to keeping your digital dreams alive. 

Our offerings include:
- **Web Design** - Because your website should look as good as your cat memes! 🐱
- **SEO Optimization** - We’ll get you ranking higher than that one annoying celebrity on Google! 🎤
- **Social Networks** - Making sure your likes are piled high like a mountain of ice cream. 🍦
- **Advertising Campaigns** - Because who doesn’t want their product to be the next shiny thing on Instagram?! 📱
- **Web Applications & Mobile Apps** - If it’s got pixels, we can create it! 📲

#### Our Values (And We're Not Just Saying This)
- **Quality & Innovation** - We don’t just follow trends; we start them!
- **Collaboration & Agility** - Like a dance party, we move beautifully together! 💃🕺
- **Integrity & Customer Service** - We’ll treat you like family (the good kind, not that weird uncle)! 🤗

#### Our Customers
From small local businesses to large enterprises, our clients are as varied as the toppings on a pizza! 🍕 We’re on a quest to help YOU turn your online vision into a digital reality.

#### What Our Customers Say
Here’s what happy (and not-so-silent) customers are saying about us:
- "DevXpress took my expectations, spun them around, and delivered something even better! I can't recommend them enough!" – *Olivier Beauchamps*
- "They cleaned up my online presence faster than my dog can shred a new shoe!" – *Yannick Kongue*
- "Thanks to DevXpress, my yoga studio’s website flows with serenity! I can finally stretch properly instead of pulling my hair out." – *Rym Bourezg*

#### Join Our Team! 🧑‍💻
Looking to work with a bunch of creative geniuses who also know how to have fun? We’re always on the lookout for new talent! If you’re ready to roll up your sleeves, get your geek on, and make the Internet a more magical place, drop us a line.

#### Let’s Connect!
- **Address:** 2-4275 boul. Maisonneuve West, Westmount, QC, H3Z 1K8
- **Email:** [contact@devxpress.com](mailto:contact@devxpress.com)
- **Phone:** (514) 909-4880  
- **Follow Us:** Facebook | Instagram | LinkedIn  

So what are you waiting for? Whether you need a website that dazzles or a job that inspires, join the DevXpress delight today! 🙌💼