This notebook is about creating a brochure for a company.
We use GPT-4o-mini to help us sort the important parts of the website.

In [54]:
# importing tools

import os  # Helps us work with files and environment variables
import requests  # Lets us fetch data from websites
import json  # Helps us work with JSON (a way to store data)
from typing import List  # Helps define lists in code
from dotenv import load_dotenv  # Loads secret keys from a `.env` file
from bs4 import BeautifulSoup  # Helps extract information from websites
from IPython.display import Markdown, display, update_display  # Displays results nicely in Jupyter
from openai import OpenAI  # Lets us use GPT-4o-mini

In [55]:
# setting up API key - a secret password to use GPT-4o-mini

load_dotenv(override=True)  # Load the `.env` file
api_key = os.getenv('OPENAI_API_KEY')  # Get the API key from the file

if api_key and api_key.startswith('sk-') and len(api_key) > 10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key!")

API key looks good so far


In [56]:
# setting up our AI model

MODEL = 'gpt-4o-mini'  # The AI model we want to use
openai = OpenAI(api_key=api_key)  # Connect to GPT-4o-mini using the API key

In [57]:
# fetching and cleaning a website
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)  # Fetch the webpage
        self.body = response.content  # Get the raw content
        soup = BeautifulSoup(self.body, 'html.parser')  # Parse the webpage
        self.title = soup.title.string if soup.title else "No title found"  # Get the title
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()  # Remove unnecessary parts
            self.text = soup.body.get_text(separator="\n", strip=True)  # Get the main text
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]  # Find all links
        self.links = [link for link in links if link]  # Keep only valid links
   
    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpageContents:\n{self.text}\n\n"

# we get the title, main text and all the links on the page

In [58]:
devxpress = Website("https://devxpress.ca/") # fetch the website
devxpress.links # show all the linked on the website

['index.php?lang=',
 'index.php?lang=',
 'a-propos.php?lang=',
 '#.',
 'conception-web.php?lang=',
 'optimisation-SEO.php?lang=',
 'reseaux-sociaux.php?lang=',
 'campagne-publicitaire.php?lang=',
 'application-web.php?lang=',
 'application-mobile.php?lang=',
 'portfolio.php?lang=',
 'blogs.php?lang=',
 'contact.php?lang=',
 'https://www.facebook.com/DevXpressInc/',
 'https://www.instagram.com/devxpress.ca/',
 'https://www.linkedin.com/company/devxpress/',
 'contact.php?lang=',
 'blog.php?lang=en&blogId=yro9jwUOHATDJhIsYV5f',
 'blog.php?lang=en&blogId=S1qFrJWqbLA3EcP3rdAD',
 'portfolio-project.php?lang=en&projectId=1rEVGsCUkBgCTDi6jKq1',
 'portfolio-project.php?lang=en&projectId=eLSIqT8ZLDnkF0ieh4uQ',
 'portfolio-project.php?lang=en&projectId=ysmW16LKXcBRUrx1pNNX',
 'blog.php?lang=en&blogId=Y1hPMENPmBG1CSHXsqTq',
 'blogs.php?lang=',
 'portfolio.php?lang=',
 'https://www.facebook.com/DevXpressInc/',
 'https://www.instagram.com/devxpress.ca/',
 'https://www.linkedin.com/company/devxpress/

Step 1: GPT-4o-mini to figure out which links are relevant

In [59]:
# deciding which links are important

link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [60]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



In [61]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} -"
    user_prompt += "please decide which of there are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privact, email links. \n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links) # Add all the links, separated by newlines
    return user_prompt

In [62]:
print(get_links_user_prompt(devxpress))

Here is the list of links on the website of https://devxpress.ca/ -please decide which of there are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privact, email links. 
Links (some might be relative links):
index.php?lang=
index.php?lang=
a-propos.php?lang=
#.
conception-web.php?lang=
optimisation-SEO.php?lang=
reseaux-sociaux.php?lang=
campagne-publicitaire.php?lang=
application-web.php?lang=
application-mobile.php?lang=
portfolio.php?lang=
blogs.php?lang=
contact.php?lang=
https://www.facebook.com/DevXpressInc/
https://www.instagram.com/devxpress.ca/
https://www.linkedin.com/company/devxpress/
contact.php?lang=
blog.php?lang=en&blogId=yro9jwUOHATDJhIsYV5f
blog.php?lang=en&blogId=S1qFrJWqbLA3EcP3rdAD
portfolio-project.php?lang=en&projectId=1rEVGsCUkBgCTDi6jKq1
portfolio-project.php?lang=en&projectId=eLSIqT8ZLDnkF0ieh4uQ
portfolio-project.php?lang=en&projectId=ysmW16LKXcBRUrx1pNNX
blog.php?lang=en&b

In [63]:
def get_links(url):
    try:
        website = Website(url)
        response = openai.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": link_system_prompt},
                {"role": "user", "content": get_links_user_prompt(website)}
            ],
            response_format={"type": "json_object"}  # Ensure JSON response
        )
        result = response.choices[0].message.content
        parsed_result = json.loads(result)
        if "links" not in parsed_result:
            print("Warning: Response missing 'links' key")
            return {"links": []}
        return parsed_result
    except json.JSONDecodeError as e:
        print(f"JSON parsing error: {e}")
        return {"links": []}
    except Exception as e:
        print(f"Error in get_links: {e}")
        return {"links": []}

In [64]:
get_links("https://devxpress.ca/")

{'links': [{'type': 'about page',
   'url': 'https://devxpress.ca/a-propos.php?lang='},
  {'type': 'portfolio page',
   'url': 'https://devxpress.ca/portfolio.php?lang='},
  {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='},
  {'type': 'Facebook page', 'url': 'https://www.facebook.com/DevXpressInc/'},
  {'type': 'Instagram page', 'url': 'https://www.instagram.com/devxpress.ca/'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/devxpress/'}]}

Step 2: Make a brochure

In [65]:
def get_all_details(url):
    try:
        # Get the landing page
        result = "Landing page:\n"
        result += Website(url).get_contents()
        
        # Get and validate links
        links = get_links(url)
        if not links or "links" not in links:
            print("Warning: No valid links found")
            return result
        
        print("Found links:", links)
        
        # Process each link with error handling
        for link in links["links"]:
            try:
                # Ensure the URL is absolute
                link_url = link["url"]
                if not link_url.startswith(('http://', 'https://')):
                    link_url = f"{url.rstrip('/')}/{link_url.lstrip('/')}"
                
                result += f"\n\n{link['type']}\n"
                result += Website(link_url).get_contents()
            except Exception as e:
                print(f"Warning: Could not fetch content for {link.get('url', 'unknown URL')}: {str(e)}")
                continue
                
        return result
    except Exception as e:
        print(f"Error in get_all_details: {str(e)}")
        return "Error occurred while generating brochure."

In [66]:
print(get_all_details("https://devxpress.ca/"))

Found links: {'links': [{'type': 'about page', 'url': 'https://devxpress.ca/a-propos.php?lang='}, {'type': 'portfolio page', 'url': 'https://devxpress.ca/portfolio.php?lang='}, {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='}, {'type': 'blog page', 'url': 'https://devxpress.ca/blogs.php?lang='}]}
Landing page:
Webpage Title:
DevXpress | Agence Web
WebpageContents:
Loading...
Home
About
Services
Web Design
SEO Optimization
Social Networks
Advertising Campaigns
Web application
Mobile app
Portfolio
Blog
Contact us
Facebook
Instagram
LinkedIn
Get a quote
DevXpress
We are
Your trusted partners for the creation of websites, technological solutions and digital strategies.
Our mission
Our mission is to create websites that captivate and engage users. We aim to deliver a seamless user experience, with cutting-edge design and clean, high-performance code.
Our
Values
Quality & Innovation
Collaboration & Agility
Integrity & Customer Service
Recent Projects
- 11 Aug 2024
- 

In [67]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [68]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [69]:
get_brochure_user_prompt("DevXpress", "https://devxpress.ca/")

Found links: {'links': [{'type': 'about page', 'url': 'https://devxpress.ca/a-propos.php?lang='}, {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='}, {'type': 'portfolio page', 'url': 'https://devxpress.ca/portfolio.php?lang='}, {'type': 'blog page', 'url': 'https://devxpress.ca/blogs.php?lang='}, {'type': 'Facebook page', 'url': 'https://www.facebook.com/DevXpressInc/'}, {'type': 'Instagram page', 'url': 'https://www.instagram.com/devxpress.ca/'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/devxpress/'}]}


"You are looking at a company called: DevXpress\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nDevXpress | Agence Web\nWebpageContents:\nLoading...\nHome\nAbout\nServices\nWeb Design\nSEO Optimization\nSocial Networks\nAdvertising Campaigns\nWeb application\nMobile app\nPortfolio\nBlog\nContact us\nFacebook\nInstagram\nLinkedIn\nGet a quote\nDevXpress\nWe are\nYour trusted partners for the creation of websites, technological solutions and digital strategies.\nOur mission\nOur mission is to create websites that captivate and engage users. We aim to deliver a seamless user experience, with cutting-edge design and clean, high-performance code.\nOur\nValues\nQuality & Innovation\nCollaboration & Agility\nIntegrity & Customer Service\nRecent Projects\n- 11 Aug 2024\n- 09 Aug 2024\nAppro Expert\nShowcase site - 17 Jun 2024\nService Renaud\nShowcase site - 13 Jun 20

In [73]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
        response_format={"type": "text"}
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [74]:
create_brochure("DevXpress", "https://devxpress.ca/")

Found links: {'links': [{'type': 'about page', 'url': 'https://devxpress.ca/a-propos.php?lang='}, {'type': 'portfolio page', 'url': 'https://devxpress.ca/portfolio.php?lang='}, {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='}, {'type': 'blog page', 'url': 'https://devxpress.ca/blogs.php?lang='}]}


# Welcome to the Wild World of DevXpress! 🎉

**Your One-Stop Shop for All Things Web-illiant!**

---

## Who Are We? 🤔
We’re not just any agency; we’re **DevXpress**, your trusted partners in the digital jungle! Hailing from the vibrant lands of **Montreal**, we are a bunch of web wizards who turned a 20-year friendship into a mission to revolutionize the web! 🌍✨

### Our Mission
Our goal? Craft websites that are **so captivating**, they might just earn themselves a couple of fan clubs. (Seriously, don't be surprised if you find your site getting love letters!) 

## Our Services - Where the Magic Happens! 🪄

- **Web Design**: Because every pixel counts!
- **SEO Optimization**: We know how to tickle Google’s fancy and push you up the ranks!
- **Social Networks**: Get your social game on point. #DevXpress
- **Advertising Campaigns**: Let's make your brand the talk of the town.
- **Web & Mobile Applications**: Apps that are as smooth as butter on toast!

---

## Company Culture: Where Pixels Blend With Fun! 🎨
We believe in **collaboration, agility**, and above all, a **good laugh** (usually at someone's expense… just kidding!). Our team thrives on the edge of creativity and technical expertise. We’re like a perfectly brewed cup of coffee - a little strong, a bit sweet, and **fully energizing**! ☕️💻

### Life at DevXpress
- Casual Friday is an everyday affair! 
- Tech debates over lunch - Mac vs. PC, anyone?
- Regular team-building activities that include trust falls and... well, maybe just trust parties with cake.

---

## Customer Love ❤️
Speaking of love, our happy customers (over 100 of them with ⭐️⭐️⭐️⭐️⭐️ ratings) have said:
- *“Their team is both talented and creative!”* - Olivier Beauchamps
- *“You can trust them with your eyes closed!”* - Rym Bourezg (though we recommend keeping them open for better results! 😜)

---

## Join Us! Become a Part of Our Journey 🚀
We're always on the lookout for creative minds and tech enthusiasts who want to plunge into the digital adventure! If you’re:
- A web design ninja 🥷
- An SEO guru 📈
- A coding sorcerer 🧙
- Or just a friendly human with a good attitude!

**We Want You!**

---

## Get In Touch! 📞
Ready to embark on this wonderful web journey? 

Reach out to us!
- **Address**: 2-4275 boul. Maisonneuve West, Westmount, QC, H3Z 1K8
- **Email**: [email protected]
- **Phone**: (514) 909-4880

---

**DevXpress: Shaping the Future of the Web, One Click at a Time!** 

Let’s create magic together! 🪄✨

Bonus: typewriter animation

In [75]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
     
    # Process the stream
    for chunk in stream:
        if hasattr(chunk.choices[0].delta, 'content') and chunk.choices[0].delta.content:
            # Get new content
            new_content = chunk.choices[0].delta.content
            # Add to running response
            response += new_content
            # Update the display with markdown formatting
            update_display(Markdown(response), display_id=display_handle.display_id)
            # Small delay for typewriter effect
            import time
            time.sleep(0.1)

In [76]:
stream_brochure("DevXpress", "https://devxpress.ca/")

Found links: {'links': [{'type': 'about page', 'url': 'https://devxpress.ca/a-propos.php?lang='}, {'type': 'contact page', 'url': 'https://devxpress.ca/contact.php?lang='}, {'type': 'portfolio page', 'url': 'https://devxpress.ca/portfolio.php?lang='}, {'type': 'blog page', 'url': 'https://devxpress.ca/blogs.php?lang='}, {'type': 'Facebook page', 'url': 'https://www.facebook.com/DevXpressInc/'}, {'type': 'Instagram page', 'url': 'https://www.instagram.com/devxpress.ca/'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/devxpress/'}]}


# Welcome to the DevXpress Universe! 🚀

**Your Friendly Neighborhood Web Agency**  
Located in the charming streets of Montreal, DevXpress is like the superhero of the digital world—swooping in to save your online presence with spiffy websites and digital strategies that dazzle!

---

## Our Mission: More Than Just Code 🍰

At DevXpress, we believe websites should be as captivating as a triple chocolate cake! Our dedicated team is on a noble quest to create online experiences that are smoother than a freshly buttered pancake. We mix cutting-edge design with clean, high-performance code for a user experience that’ll have your visitors saying, “Wowza!”

---

## What We Offer: Services to Make Your Head Spin! 🌪️

- **Web Design:** More tasty than it looks! 
- **SEO Optimization:** Because nobody likes being lost in the digital abyss.
- **Social Networks:** We'll turn your likes into "Love at first sight!"
- **Advertising Campaigns:** Making your audience click instead of snooze!
- **Web Applications & Mobile Apps:** Building the future... one app at a time!

---

## Our Values: Serious Business, With a Twist of Fun! 🎉

At DevXpress, we   embrace:
- **Quality & Innovation:** Like peanut butter and jelly; they just go together.
- **Collaboration & Agility:** Quick as a cat on a hot tin roof!
- **Integrity & Customer Service:** We believe in honesty—like the act of admitting you ate the last cookie.

---

## Meet Our Happy Customers! 🎤🎶

Our clients love us! Don’t believe us? Here are some fan favorites from the audience:
- **Olivier Beauchamps:** "Working with DevXpress was like finding an extra fry at the bottom of the bag—so satisfying!"
- **Yannick Kongue:** "Excellent service, like a well-brewed coffee—strong and reliable!"
- **Rym Bourezg:** "I would trust them with my cat's Instagram account!"
- **Samuel Tremblay:** "Top-notch quality delivered fast, like pizza on a movie night!"

---

## Careers at DevXpress: Join The Ruckus! ✈️

Are you looking for a workplace that’s full of laughs, creativity, and the occasional impromptu dance party? At DevXpress, we’re always on the lookout for new talent to join our merry misfits! We value innovation, and your quirky ideas could be just what we need to spark something magical!

**Apply now** if you’re passionate about digital solutions! 😉

---

**Location:**  
2-4275 Boul. Maisonneuve West,  
Westmount, QC, H3Z 1K8

**Contact Us:**  
Email: [email protected]  
Phone: (514) 909-4880  

Come for the web design, stay for the laughs! Let’s make the digital world a better place together! 🌐✨