## Project Name: **BrochureGenie**

### Description:
**BrochureGenie** is an intelligent brochure generation platform that automatically creates professional, visually appealing brochures for companies using just their name and website. It extracts key business insights, branding elements, and offerings from the company’s site and turns them into brochures tailored for potential clients, investors, and recruits.


In [71]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
import google.generativeai as genai


In [72]:
load_dotenv()
gemini_api_key = os.getenv("GEMINI_API_KEY")
genai.configure(api_key=gemini_api_key)

In [73]:
API_KEY = gemini_api_key
if not API_KEY:
    raise ValueError("GEMINI_API_KEY environment variable is not set.")

MODEL_NAME = "gemini-2.0-flash"

In [74]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [75]:
data = Website("https://www.geeksforgeeks.org/large-language-model-llm/")
print(data.title)
print(data.text)

What is a Large Language Model (LLM) | GeeksforGeeks
Skip to content
Courses
DSA to Development
Get IBM Certification
Newly Launched!
Master Django Framework
Become AWS Certified
For Working Professionals
Interview 101: DSA & System Design
Data Science Training Program
JAVA Backend Development (Live)
DevOps Engineering (LIVE)
Data Structures & Algorithms in Python
For Students
Placement Preparation Course
Data Science (Live)
Data Structure & Algorithm-Self Paced (C++/JAVA)
Master Competitive Programming (Live)
Full Stack Development with React & Node JS (Live)
Full Stack Development
Data Science Program
All Courses
Tutorials
Data Structures & Algorithms
ML & Data Science
Interview Corner
Programming Languages
Web Development
CS Subjects
DevOps And Linux
School Learning
Practice
GfG 160: Daily DSA
Problem of the Day
Practice Coding Problems
GfG SDE Sheet
Python
R Language
Python for Data Science
NumPy
Pandas
OpenCV
Data Analysis
ML Math
Machine Learning
NLP
Deep Learning
Deep Learning I

In [76]:
data.get_contents()

"Webpage Title:\nWhat is a Large Language Model (LLM) | GeeksforGeeks\nWebpage Contents:\nSkip to content\nCourses\nDSA to Development\nGet IBM Certification\nNewly Launched!\nMaster Django Framework\nBecome AWS Certified\nFor Working Professionals\nInterview 101: DSA & System Design\nData Science Training Program\nJAVA Backend Development (Live)\nDevOps Engineering (LIVE)\nData Structures & Algorithms in Python\nFor Students\nPlacement Preparation Course\nData Science (Live)\nData Structure & Algorithm-Self Paced (C++/JAVA)\nMaster Competitive Programming (Live)\nFull Stack Development with React & Node JS (Live)\nFull Stack Development\nData Science Program\nAll Courses\nTutorials\nData Structures & Algorithms\nML & Data Science\nInterview Corner\nProgramming Languages\nWeb Development\nCS Subjects\nDevOps And Linux\nSchool Learning\nPractice\nGfG 160: Daily DSA\nProblem of the Day\nPractice Coding Problems\nGfG SDE Sheet\nPython\nR Language\nPython for Data Science\nNumPy\nPandas\nO

In [77]:
data = Website("https://www.geeksforgeeks.org/large-language-model-llm/")
print(data.title)
print(data.text)

What is a Large Language Model (LLM) | GeeksforGeeks
Skip to content
Courses
DSA to Development
Get IBM Certification
Newly Launched!
Master Django Framework
Become AWS Certified
For Working Professionals
Interview 101: DSA & System Design
Data Science Training Program
JAVA Backend Development (Live)
DevOps Engineering (LIVE)
Data Structures & Algorithms in Python
For Students
Placement Preparation Course
Data Science (Live)
Data Structure & Algorithm-Self Paced (C++/JAVA)
Master Competitive Programming (Live)
Full Stack Development with React & Node JS (Live)
Full Stack Development
Data Science Program
All Courses
Tutorials
Data Structures & Algorithms
ML & Data Science
Interview Corner
Programming Languages
Web Development
CS Subjects
DevOps And Linux
School Learning
Practice
GfG 160: Daily DSA
Problem of the Day
Practice Coding Problems
GfG SDE Sheet
Python
R Language
Python for Data Science
NumPy
Pandas
OpenCV
Data Analysis
ML Math
Machine Learning
NLP
Deep Learning
Deep Learning I

In [78]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON format as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [79]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON format as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [80]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON format as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [81]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [82]:
print(get_links_user_prompt(data))

Here is the list of links on the website of https://www.geeksforgeeks.org/large-language-model-llm/ - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
#main
https://www.geeksforgeeks.org/
https://www.geeksforgeeks.org/courses/dsa-to-development-coding-guide/
https://www.geeksforgeeks.org/courses/category/ibm-certification/
https://www.geeksforgeeks.org/courses/mastering-django-framework-beginner-to-advance/
https://www.geeksforgeeks.org/courses/search?query=AWS
https://www.geeksforgeeks.org/courses/interviewe-101-data-structures-algorithm-system-design/
https://www.geeksforgeeks.org/courses/full-stack-applied-data-science-program/
https://www.geeksforgeeks.org/courses/Java-backend-live
https://www.geeksforgeeks.org/courses/devops-live
https://www.geeksforgeeks.org/courses/Data-Structures-With-Python
https://www

In [85]:
def get_links(url):
    website = Website(url)
    
    prompt = (
        link_system_prompt + "\n\n" +  # Instruction to Gemini
        get_links_user_prompt(website)  # User-level prompt constructed from the Website object
    )
    
    model = genai.GenerativeModel("gemini-2.0-flash")
    
    response = model.generate_content(prompt)
    
    cleaned = response.text.strip()
    if cleaned.startswith("```json"):
        cleaned = cleaned.removeprefix("```json").strip()
    if cleaned.endswith("```"):
        cleaned = cleaned.removesuffix("```").strip()
    
    try:
        result = json.loads(cleaned)
        return result
    except json.JSONDecodeError:
        print("Failed to parse cleaned JSON:", cleaned)
        return None


In [88]:
links = get_links("https://www.geeksforgeeks.org/large-language-model-llm/")
print(links)

{'links': [{'type': 'homepage', 'url': 'https://www.geeksforgeeks.org/'}, {'type': 'about page', 'url': 'https://www.geeksforgeeks.org/about/'}, {'type': 'contact us', 'url': 'https://www.geeksforgeeks.org/about/contact-us/'}, {'type': 'advertise with us', 'url': 'https://www.geeksforgeeks.org/advertise-with-us/'}, {'type': 'corporate solutions', 'url': 'https://www.geeksforgeeks.org/gfg-corporate-solution/'}, {'type': 'campus training program', 'url': 'https://www.geeksforgeeks.org/campus-training-program/'}, {'type': 'press releases', 'url': 'https://www.geeksforgeeks.org/press-release/'}]}


### make the brochure

In [89]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [90]:
print(get_all_details("https://www.geeksforgeeks.org/large-language-model-llm/"))

Found links: {'links': [{'type': 'homepage', 'url': 'https://www.geeksforgeeks.org/'}, {'type': 'about page', 'url': 'https://www.geeksforgeeks.org/about/'}, {'type': 'contact us', 'url': 'https://www.geeksforgeeks.org/about/contact-us/'}, {'type': 'advertise with us', 'url': 'https://www.geeksforgeeks.org/advertise-with-us/'}, {'type': 'corporate solution', 'url': 'https://www.geeksforgeeks.org/gfg-corporate-solution/'}, {'type': 'campus training program', 'url': 'https://www.geeksforgeeks.org/campus-training-program/'}, {'type': 'press release', 'url': 'https://www.geeksforgeeks.org/press-release/'}]}
Landing page:
Webpage Title:
What is a Large Language Model (LLM) | GeeksforGeeks
Webpage Contents:
Skip to content
Courses
DSA to Development
Get IBM Certification
Newly Launched!
Master Django Framework
Become AWS Certified
For Working Professionals
Interview 101: DSA & System Design
Data Science Training Program
JAVA Backend Development (Live)
DevOps Engineering (LIVE)
Data Structure