# Website Brochure Generator

An AI-powered tool that automatically generates professional brochures from any website. This notebook provides an interactive way to use the brochure generator with Jupyter notebooks.

## Features

- 🌐 **Website Analysis**: Automatically scrapes and analyzes website content
- 🤖 **AI-Powered**: Uses OpenAI GPT-4o-mini for intelligent content generation
- 📄 **Professional Output**: Generates markdown-formatted brochures
- 🌍 **Multi-Language Support**: Translate brochures to any language using AI
- ⚡ **Interactive**: Run step-by-step in Jupyter notebooks
- 🎨 **Beautiful Output**: Native Jupyter markdown rendering with HTML styling

## Prerequisites

- Python 3.8 or higher
- OpenAI API key
- Jupyter notebook environment

## Setup Instructions

1. **Get your OpenAI API key**:
   - Visit [OpenAI API Keys](https://platform.openai.com/api-keys)
   - Create a new API key

2. **Set up environment variables**:
   - Create a `.env` file in the project directory with: `OPENAI_API_KEY=your_api_key_here`
   - Or set the environment variable directly in the notebook

3. **Install dependencies**:
   ```bash
   pip install openai python-dotenv requests beautifulsoup4 ipywidgets
   ```


In [None]:
# Import required libraries
from openai import OpenAI
from dotenv import load_dotenv
import os
import requests
import json
from typing import List
from bs4 import BeautifulSoup
import ipywidgets as widgets
from IPython.display import display, Markdown, HTML, clear_output
import time

print("✅ All libraries imported successfully!")


## Configuration

Set up your OpenAI API key and configure the client.


In [None]:
# Configuration cell - Set up your OpenAI API key
def get_client_and_headers():
    """Initialize OpenAI client and headers for web scraping"""
    load_dotenv(override=True)
    api_key = os.getenv("OPENAI_API_KEY")
    
    if api_key and api_key.startswith('sk-proj-') and len(api_key) > 10:
        print("✅ API key looks good!")
    else:
        print("⚠️  There might be a problem with your API key")
        print("Make sure you have set OPENAI_API_KEY in your .env file or environment variables")

    client = OpenAI(api_key=api_key)
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
    }
    return client, headers

# Initialize the client
client, headers = get_client_and_headers()
print("✅ OpenAI client initialized successfully!")


## Core Functions

The main functions for website analysis and brochure generation.


In [None]:
# Utility methods to display content in markdown format
def display_content(content, is_markdown=True):
    """Display content using Jupyter's display methods"""
    if is_markdown:
        display(Markdown(content))
    else:
        print(content)

def stream_content(response, title="Content"):
    """
    Utility function to handle streaming content display in Jupyter
    
    Args:
        response: OpenAI streaming response object
        title (str): Title to display for the streaming content
    
    Returns:
        str: Complete streamed content
    """
    result = ""
    
    # Display title
    display(HTML(f"<h3 style='color: #1f77b4;'>{title}...</h3>"))
    
    # Create output widget for streaming
    from IPython.display import clear_output
    import time
    
    for chunk in response:
        content = chunk.choices[0].delta.content or ""
        result += content
        # Print each chunk as it arrives for streaming effect
        print(content, end='', flush=True)
    
    # Display completion message
    display(HTML(f"<div style='color: green; font-weight: bold; margin-top: 20px;'>{'='*50}</div>"))
    display(HTML(f"<div style='color: green; font-weight: bold;'>{title.upper()} COMPLETE</div>"))
    display(HTML(f"<div style='color: green; font-weight: bold;'>{'='*50}</div>"))
    
    return result

print("✅ Utility functions loaded!")


In [None]:
# Utility class to get the contents of a website
class Website:
    def __init__(self, url):
        self.url = url
        self.client, self.headers = get_client_and_headers()
        print(f"🌐 Fetching content from: {url}")
        response = requests.get(url, headers=self.headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]
        print(f"✅ Website analyzed: {self.title}")

    def get_contents(self):
        return f"Webpage Title: {self.title}\nWebpage Contents: {self.text}\n\n"

print("✅ Website class loaded!")


In [None]:
# AI Prompt Functions
def get_links_system_prompt():
    link_system_prompt = """"You are provided with a list of links found on a webpage. \
        You are able to decide which of the links would be most relevant to include in a brochure about the company. \
        Relevant links usually include: About page, or a Company page, or Careers/Jobs pages or News page\n"""
    link_system_prompt += "Always respond in JSON exactly like this: \n"
    link_system_prompt += """
        {
            "links": [
                {"type": "<page type>", "url": "<full URL>"},
                {"type": "<page type>", "url": "<full URL>"}
            ]
        }\n
    """
    link_system_prompt += """ If no relevant links are found, return:
        {
            "links": []
        }\n
    """
    link_system_prompt += "If multiple links could map to the same type (e.g. two About pages), include the best candidate only.\n"

    link_system_prompt += "You should respond in JSON as in the below examples:\n"
    link_system_prompt += """
        ## Example 1
        Input links:
        - https://acme.com/about  
        - https://acme.com/pricing  
        - https://acme.com/blog  
        - https://acme.com/signup  

        Output:
        {
        "links": [
            {"type": "about page", "url": "https://acme.com/about"},
            {"type": "blog page", "url": "https://acme.com/blog"},
            {"type": "pricing page", "url": "https://acme.com/pricing"}
        ]
        }
        """
    link_system_prompt += """
        ## Example 2
        Input links:
        - https://startup.io/  
        - https://startup.io/company  
        - https://startup.io/careers  
        - https://startup.io/support  

        Output:
        {
        "links": [
            {"type": "company page", "url": "https://startup.io/company"},
            {"type": "careers page", "url": "https://startup.io/careers"}
        ]
        }
        """
    link_system_prompt += """
        ## Example 3
        Input links:
        - https://coolapp.xyz/login  
        - https://coolapp.xyz/random  

        Output:
        {
        "links": []
        }
        """
    return link_system_prompt

def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \n"
    user_prompt += "Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

def get_brochure_system_prompt():
    brochure_system_prompt = """
        You are an assistant that analyzes the contents of several relevant pages from a company website \
        and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.
        Include details of company culture, customers and careers/jobs if you have the information.
    """
    return brochure_system_prompt

def get_brochure_user_prompt(url):
    user_prompt = f"You are looking at a company details of: {url}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_details_for_brochure(url)
    user_prompt = user_prompt[:15000] # Truncate if more than 15,000 characters
    return user_prompt

def get_translation_system_prompt(target_language):
    translation_system_prompt = f"You are a professional translator specializing in business and marketing content. \
    Translate the provided brochure to {target_language} while maintaining all formatting and professional tone."
    return translation_system_prompt

def get_translation_user_prompt(original_brochure, target_language):
    translation_prompt = f"""
    You are a professional translator. Please translate the following brochure content to {target_language}.
    
    Important guidelines:
    - Maintain the markdown formatting exactly as it appears
    - Keep all headers, bullet points, and structure intact
    - Translate the content naturally and professionally
    - Preserve any company names, product names, or proper nouns unless they have established translations
    - Maintain the professional tone and marketing style
    
    Brochure content to translate:
    {original_brochure}
    """
    return translation_prompt

print("✅ AI prompt functions loaded!")


In [None]:
# Core Brochure Generation Functions
def get_links(url):
    """Get relevant links from a website using AI analysis"""
    website = Website(url)
    response = website.client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": get_links_system_prompt()},
            {"role": "user", "content": get_links_user_prompt(website)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    print("🔗 Found relevant links:", result)
    return json.loads(result)

def get_details_for_brochure(url):
    """Get comprehensive details from website and relevant pages"""
    website = Website(url)
    result = "Landing page:\n"
    result += website.get_contents()
    links = get_links(url)
    print("📄 Analyzing additional pages...")
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

def create_brochure(url):
    """Create a brochure from a website URL"""
    website = Website(url)
    print("🤖 Generating brochure with AI...")
    response = website.client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": get_brochure_system_prompt()},
            {"role": "user", "content": get_brochure_user_prompt(url)}
        ]
    )
    result = response.choices[0].message.content
    display_content(result, is_markdown=True)
    return result

def stream_brochure(url):
    """Create a brochure with streaming output"""
    website = Website(url)
    print("🤖 Generating brochure with streaming output...")
    response = website.client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": get_brochure_system_prompt()},
            {"role": "user", "content": get_brochure_user_prompt(url)}
        ],
        stream=True
    )
    
    # Use the reusable streaming utility function
    result = stream_content(response, "Generating brochure")
    return result

print("✅ Core brochure generation functions loaded!")


In [None]:
# Translation Functions
def translate_brochure(url, target_language="Spanish", stream_mode=False):
    """
    Generate a brochure and translate it to the target language
    
    Args:
        url (str): The website URL to generate brochure from
        target_language (str): The target language for translation (default: "Spanish")
        stream_mode (bool): Whether to use streaming output (default: False)
    
    Returns:
        str: Translated brochure content
    """
    # First generate the original brochure
    print(f"🌍 Generating brochure and translating to {target_language}...")
    original_brochure = create_brochure(url)
    
    # Get translation prompts
    translation_system_prompt = get_translation_system_prompt(target_language)
    translation_user_prompt = get_translation_user_prompt(original_brochure, target_language)
    
    # Get OpenAI client
    website = Website(url)
    
    if stream_mode:
        # Generate translation using OpenAI with streaming
        response = website.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": translation_system_prompt},
                {"role": "user", "content": translation_user_prompt}
            ],
            stream=True
        )
        
        # Use the reusable streaming utility function
        translated_brochure = stream_content(response, f"Translating brochure to {target_language}")
    else:
        # Generate translation using OpenAI with complete output
        response = website.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": translation_system_prompt},
                {"role": "user", "content": translation_user_prompt}
            ]
        )
        
        translated_brochure = response.choices[0].message.content
        
        # Display the translated content
        display_content(translated_brochure, is_markdown=True)
    
    return translated_brochure

print("✅ Translation functions loaded!")


## Interactive Examples

Now let's try generating brochures for some example websites. You can run these cells to see the brochure generator in action!


In [None]:
# Example 1: Generate a brochure for a sample website
# You can change this URL to any website you want to analyze

sample_url = "https://openai.com"  # Change this to any website you want to analyze

print(f"🚀 Generating brochure for: {sample_url}")
print("=" * 60)

# Generate the brochure
brochure = create_brochure(sample_url)


In [None]:
# Example 2: Generate a brochure with streaming output
# This shows the brochure being generated in real-time

streaming_url = "https://anthropic.com"  # Change this to any website you want to analyze

print(f"🚀 Generating brochure with streaming for: {streaming_url}")
print("=" * 60)

# Generate the brochure with streaming
streaming_brochure = stream_brochure(streaming_url)


In [None]:
# Example 3: Generate and translate a brochure
# This creates a brochure and then translates it to another language

translation_url = "https://huggingface.co"  # Change this to any website you want to analyze
target_language = "Spanish"  # Change this to any language you want

print(f"🚀 Generating and translating brochure for: {translation_url}")
print(f"🌍 Target language: {target_language}")
print("=" * 60)

# Generate and translate the brochure
translated_brochure = translate_brochure(translation_url, target_language, stream_mode=False)


## Interactive Widget Interface

Use the widgets below to interactively generate brochures for any website!


In [None]:
# Interactive Widget Interface
import ipywidgets as widgets
from IPython.display import display, clear_output

# Create widgets
url_input = widgets.Text(
    value='https://openai.com',
    placeholder='Enter website URL (e.g., https://example.com)',
    description='Website URL:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='500px')
)

language_dropdown = widgets.Dropdown(
    options=['English', 'Spanish', 'French', 'German', 'Chinese', 'Japanese', 'Portuguese', 'Italian'],
    value='English',
    description='Language:',
    style={'description_width': 'initial'}
)

stream_checkbox = widgets.Checkbox(
    value=False,
    description='Use streaming output',
    style={'description_width': 'initial'}
)

translate_checkbox = widgets.Checkbox(
    value=False,
    description='Translate brochure',
    style={'description_width': 'initial'}
)

generate_button = widgets.Button(
    description='Generate Brochure',
    button_style='success',
    icon='rocket'
)

output_area = widgets.Output()

def on_generate_clicked(b):
    with output_area:
        clear_output(wait=True)
        url = url_input.value.strip()
        
        if not url:
            print("❌ Please enter a valid URL")
            return
            
        if not url.startswith(('http://', 'https://')):
            url = 'https://' + url
            
        print(f"🚀 Generating brochure for: {url}")
        print("=" * 60)
        
        try:
            if translate_checkbox.value:
                # Generate and translate
                result = translate_brochure(url, language_dropdown.value, stream_mode=stream_checkbox.value)
            else:
                # Generate only
                if stream_checkbox.value:
                    result = stream_brochure(url)
                else:
                    result = create_brochure(url)
            
            print("\n✅ Brochure generation completed!")
            
        except Exception as e:
            print(f"❌ Error generating brochure: {str(e)}")
            print("Please check your API key and internet connection.")

generate_button.on_click(on_generate_clicked)

# Display widgets
print("🎯 Interactive Brochure Generator")
print("Enter a website URL and click 'Generate Brochure' to create a professional brochure!")
print()

display(url_input)
display(widgets.HBox([language_dropdown, stream_checkbox, translate_checkbox]))
display(generate_button)
display(output_area)


## Advanced Usage Examples

Here are some advanced examples showing different ways to use the brochure generator.


In [None]:
# Advanced Example 1: Analyze multiple websites and compare
websites_to_analyze = [
    "https://openai.com",
    "https://anthropic.com", 
    "https://huggingface.co"
]

print("🔍 Analyzing multiple websites...")
print("=" * 60)

brochures = {}
for url in websites_to_analyze:
    print(f"\n📊 Generating brochure for: {url}")
    try:
        brochure = create_brochure(url)
        brochures[url] = brochure
        print(f"✅ Successfully generated brochure for {url}")
    except Exception as e:
        print(f"❌ Failed to generate brochure for {url}: {str(e)}")
    
    print("-" * 40)

print(f"\n🎉 Generated {len(brochures)} brochures successfully!")


In [None]:
# Advanced Example 2: Generate brochures in multiple languages
target_website = "https://openai.com"  # Change this to any website
languages = ["Spanish", "French", "German", "Chinese"]

print(f"🌍 Generating brochures in multiple languages for: {target_website}")
print("=" * 60)

multilingual_brochures = {}
for language in languages:
    print(f"\n🔄 Translating to {language}...")
    try:
        translated_brochure = translate_brochure(target_website, language, stream_mode=False)
        multilingual_brochures[language] = translated_brochure
        print(f"✅ Successfully translated to {language}")
    except Exception as e:
        print(f"❌ Failed to translate to {language}: {str(e)}")
    
    print("-" * 40)

print(f"\n🎉 Generated brochures in {len(multilingual_brochures)} languages!")


## Custom Functions

Create your own custom functions for specific use cases.


In [None]:
# Custom Function: Save brochure to file
def save_brochure_to_file(brochure_content, filename, url):
    """Save brochure content to a markdown file"""
    try:
        with open(filename, 'w', encoding='utf-8') as f:
            f.write(f"# Brochure for {url}\n\n")
            f.write(f"Generated on: {time.strftime('%Y-%m-%d %H:%M:%S')}\n\n")
            f.write("---\n\n")
            f.write(brochure_content)
        print(f"✅ Brochure saved to: {filename}")
        return True
    except Exception as e:
        print(f"❌ Error saving brochure: {str(e)}")
        return False

# Custom Function: Generate brochure with custom analysis
def generate_custom_brochure(url, focus_areas=None):
    """Generate a brochure with focus on specific areas"""
    if focus_areas is None:
        focus_areas = ["company overview", "products", "culture", "careers"]
    
    website = Website(url)
    
    # Custom system prompt with focus areas
    custom_system_prompt = f"""
    You are an assistant that analyzes website content and creates a professional brochure.
    Focus specifically on these areas: {', '.join(focus_areas)}.
    Create a markdown brochure that emphasizes these aspects for prospective customers, investors and recruits.
    """
    
    response = website.client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": custom_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(url)}
        ]
    )
    
    result = response.choices[0].message.content
    display_content(result, is_markdown=True)
    return result

# Custom Function: Quick website analysis
def quick_website_analysis(url):
    """Perform a quick analysis of a website without generating full brochure"""
    website = Website(url)
    
    analysis = f"""
    # Quick Website Analysis: {url}
    
    **Title:** {website.title}
    **Total Links Found:** {len(website.links)}
    **Content Length:** {len(website.text)} characters
    
    ## Sample Content (first 500 characters):
    {website.text[:500]}...
    
    ## All Links:
    {chr(10).join(website.links[:10])}  # Show first 10 links
    """
    
    display_content(analysis, is_markdown=True)
    return analysis

print("✅ Custom functions loaded!")


## Usage Examples with Custom Functions

Try these examples with the custom functions we just created.


In [None]:
# Example: Quick website analysis
test_url = "https://openai.com"  # Change this to any website

print("🔍 Performing quick website analysis...")
print("=" * 50)

quick_analysis = quick_website_analysis(test_url)


In [None]:
# Example: Generate custom brochure with specific focus
custom_url = "https://anthropic.com"  # Change this to any website
focus_areas = ["AI safety", "research", "products", "team"]  # Custom focus areas

print("🎯 Generating custom brochure with specific focus...")
print(f"Focus areas: {', '.join(focus_areas)}")
print("=" * 50)

custom_brochure = generate_custom_brochure(custom_url, focus_areas)


In [None]:
# Example: Generate brochure and save to file
save_url = "https://huggingface.co"  # Change this to any website

print("💾 Generating brochure and saving to file...")
print("=" * 50)

# Generate brochure
brochure_content = create_brochure(save_url)

# Save to file
filename = f"brochure_{save_url.replace('https://', '').replace('/', '_')}.md"
save_success = save_brochure_to_file(brochure_content, filename, save_url)

if save_success:
    print(f"📁 You can find the saved brochure in: {filename}")
else:
    print("❌ Failed to save brochure to file")


## Troubleshooting and Tips

### Common Issues and Solutions

1. **API Key Issues**
   - Make sure your OpenAI API key is set in the `.env` file
   - Verify your API key has sufficient credits
   - Check that the key starts with `sk-proj-`

2. **Website Scraping Issues**
   - Some websites may block automated requests
   - Try different websites if one fails
   - The tool uses a standard User-Agent header to avoid basic blocking

3. **Memory Issues**
   - Large websites may consume significant memory
   - The tool truncates content to 15,000 characters to manage this

4. **Rate Limiting**
   - OpenAI has rate limits on API calls
   - If you hit limits, wait a few minutes before trying again

### Tips for Better Results

1. **Choose Good Websites**
   - Websites with clear About, Products, and Careers pages work best
   - Avoid websites that are mostly images or require JavaScript

2. **Use Streaming for Long Content**
   - Enable streaming for better user experience with long brochures
   - Streaming shows progress in real-time

3. **Custom Focus Areas**
   - Use the custom brochure function to focus on specific aspects
   - This can help generate more targeted content

4. **Save Your Work**
   - Use the save function to keep brochures for later reference
   - Files are saved in markdown format for easy editing


## Conclusion

This Jupyter notebook provides a comprehensive interface for the Website Brochure Generator. You can:

- ✅ Generate professional brochures from any website
- ✅ Translate brochures to multiple languages
- ✅ Use interactive widgets for easy operation
- ✅ Save brochures to files for later use
- ✅ Perform quick website analysis
- ✅ Create custom brochures with specific focus areas
- ✅ Generate brochures with streaming output for real-time feedback

### Next Steps

1. **Try the Interactive Widget**: Use the widget interface above to generate brochures for your favorite websites
2. **Experiment with Different URLs**: Test the tool with various types of websites
3. **Explore Translation Features**: Generate brochures in different languages
4. **Save Your Work**: Use the save function to keep your generated brochures
5. **Customize Focus Areas**: Create brochures tailored to specific aspects of companies

### Support

For issues and questions:
- Check the troubleshooting section above
- Verify your OpenAI API key is properly configured
- Ensure you have a stable internet connection
- Try different websites if one fails

Happy brochure generating! 🚀
