# Department Brochure Generator

## Overview
This notebook creates professional brochures for Math and Computer Science departments at the University of Maryland. The brochures are designed for prospective students and faculty members who want to join the department, helping them navigate through academic programs, faculty research areas, departmental activities, and career opportunities. 


In [None]:
import sys
import os

#add parent directory
sys.path.insert(0, os.path.abspath('..'))
import json
from IPython.display import Markdown, display, update_display
from dotenv import load_dotenv
from scraper import fetch_website_links, fetch_website_contents 
from openai import OpenAI 

In [None]:
# Load environment variables and configure OpenAI client
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Validate API key format
if api_key and api_key.startswith('sk-proj-') and len(api_key) > 10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")

# Set model and initialize OpenAI client
MODEL = 'gpt-5-nano'
openai = OpenAI()

## Example: UMD Math Department

This section demonstrates how to scrape the UMD Math department website and extract relevant links for brochure generation.

In [None]:
# Fetch all links from the UMD Math department homepage
links = fetch_website_links("https://www-math.umd.edu")
links

As we see, some relative links are not full urls yet, so we write a new function that turns them into full urls.

In [None]:
def new_fetch_website_links(url):
    links = fetch_website_links(url)
    links = [link if link.startswith("http") else url + link for link in links]
    return links

new_fetch_website_links("https://www-math.umd.edu")

Now we write system and user prompts, and use them to find the relevant links for brochure generation. 

In [None]:
link_system_prompt = """
You are provided with a list of links found on a webpage.
You are able to decide which links are most relevant to include in a department brochure.

Include links related to:
- Department overview
- Academic programs and requirements
- Faculty research (names and research areas)
- Student opportunities
- Career outcomes
- Contact information

Exclude links to:
- External sites (social media, third-party services)
- Privacy policies and legal pages
- Administrative forms

Respond in JSON format as shown in this example:
{
    "links": [
        {"type": "faculty page", "url": "https://full.url/goes/here/faculty"},
        {"type": "students page", "url": "https://another.full.url/students"}
    ]
}
"""

# Write the user prompt
def get_links_user_prompt(url):
    user_prompt = f"""
Here is the list of links found on the webpage {url}.
Please identify the relevant links for the department brochure.

Links:
    """
    links = new_fetch_website_links(url)
    user_prompt += "\n".join(links)
    return user_prompt

# Use the model to find the relevant links
def select_relevant_links(url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(url)}
        ],
        response_format={"type": "json_object"}  
    )
    
    result = response.choices[0].message.content
    links = json.loads(result)
    return links

# Test the function with UMD Math department
select_relevant_links("https://www-math.umd.edu")

## Brochure Generation

This section implements the complete brochure generation pipeline:
1. Fetch content from the department's main page and relevant links
2. Use AI to synthesize information into a professional brochure
3. Stream the output in real-time for better user experience

In [None]:
# fetch content in all relevant links
def fetch_page_and_all_relevant_links(url):
    contents = fetch_website_contents(url)
    relevant_links = select_relevant_links(url)
    result = f"## Landing Page:\n\n{contents}\n## Relevant Links:\n"
    for link in relevant_links['links']:
        result += f"\n\n### Link: {link['type']}\n"
        result += fetch_website_contents(link["url"])
    return result

# System prompt for brochure generation
brochure_system_prompt = """
You are an assistant that analyzes the contents of several relevant pages from a department website
and creates a short, professional brochure about the department for prospective students, faculty, and staff.

Respond in markdown format without code blocks.

Include the following sections:
- Academic programs and program requirements
- Faculty research (including research areas + names of faculty members doing research in those areas)
- Student opportunities (clubs, research programs, internships)
- Career outcomes for graduates
- Contact information

Please use reasonable formatting, such as bolding and bullet points.
"""


def get_brochure_user_prompt(department_name, url):
    user_prompt = f"""
You are looking at a department called: {department_name}

Here are the contents of its landing page and other relevant pages.
Use this information to build a short brochure of the department in markdown format (without code blocks).

    """
    user_prompt += fetch_page_and_all_relevant_links(url)
    user_prompt = user_prompt[:5_000]  
    return user_prompt

def create_brochure(department_name, url):
    stream = openai.chat.completions.create(
        model= MODEL,
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(department_name, url)}
        ],
        stream=True
    )
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        update_display(Markdown(response), display_id=display_handle.display_id)

In [None]:
create_brochure("UMD Math Department", "https://www-math.umd.edu")

Now let's try to generate a brochure for the UMD CS department. 

In [None]:
create_brochure("UMD CS Department", "https://www.cs.umd.edu")