Import the Libraries

In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

Initionalize the Constants

In [2]:
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key[:8]=='sk-proj-':
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


Getting all the links of a single website using the Website Class

In [3]:
class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [5]:
web = Website("https://lalgbtcenter.org/")
web.links

['#MainContent',
 '/events/',
 '/about/locations',
 '/?s=',
 'https://lalgbtcenter.org',
 'https://www.instagram.com/lalgbtcenter/',
 'https://www.facebook.com/lalgbtcenter/',
 'https://twitter.com/LALGBTCenter/',
 'https://lalgbtcenter.followmyhealth.com/',
 'https://volunteer.lalgbtcenter.org/',
 'https://seniors.lalgbtcenter.org/#/login',
 'https://translounge.org/trans-lounge-membership/',
 'https://lalgbtcenter.org/',
 'https://lalgbtcenter.org/es/',
 'https://donate.lalgbtcenter.org/donate/',
 'https://lalgbtcenter.org/services/',
 '#',
 'https://lalgbtcenter.org/our-services/',
 'https://lalgbtcenter.org/services/youth-services/',
 '#',
 'https://lalgbtcenter.org/services/youth-services/',
 'https://lalgbtcenter.org/services/youth-services/housing/',
 'https://lalgbtcenter.org/services/youth-services/drop-in-center/',
 'https://lalgbtcenter.org/services/youth-services/education/',
 'https://lalgbtcenter.org/services/youth-services/employment-assistance/',
 'https://lalgbtcenter.

GPT-4o-mini System Prompt to filter the most relevant links for the brochure

In [6]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

GPT-4o-mini User Prompt to filter

In [7]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

Get only the useful links for the brochure

In [8]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [10]:
get_links('https://lalgbtcenter.org/')

{'links': [{'type': 'about page', 'url': 'https://lalgbtcenter.org/about/'},
  {'type': 'careers page', 'url': 'https://lalgbtcenter.org/about/careers/'},
  {'type': 'services page', 'url': 'https://lalgbtcenter.org/services/'},
  {'type': 'programs page', 'url': 'https://lalgbtcenter.org/our-programs/'}]}

Define Get All Details Function to get the details from the most important pages in the website

In [11]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [13]:
get_all_details('https://lalgbtcenter.org/')

Found links: {'links': [{'type': 'about page', 'url': 'https://lalgbtcenter.org/about/'}, {'type': 'careers page', 'url': 'https://lalgbtcenter.org/about/careers/'}, {'type': 'services page', 'url': 'https://lalgbtcenter.org/services/'}, {'type': 'programs page', 'url': 'https://lalgbtcenter.org/our-programs/'}, {'type': 'get involved page', 'url': 'https://lalgbtcenter.org/get-involved/'}]}


"Landing page:\nWebpage Title:\nLos Angeles LGBT Center - A Safe Welcoming Space for LGBTQ+\nWebpage Contents:\nskip to main content\nCalendar\nLocations\nSearch\nPortal\nPatient\nVolunteer\nSenior\nTrans*Lounge\nen\nes\nDonate\nServices\nView All Services\nYouth Services\nView All Youth Services\nHousing\nDrop-In Center\nEducation\nEmployment Assistance\nLifeworks Mentorship Program\nFoster Youth Services (RISE)\nSenior Services\nView All Senior Services\nHousing\nCase Management\nFood & Nutrition Services\nEmployment Assistance\nActivities\nTransgender Services\nView All Transgender Services\nTransgender Wellness Center\nTrans* Lounge\nEmployment Assistance\nGroup Therapy\nTGI/ENBY+ Resource Index\nSurvivor Services\nView All Survivor Services\nHate Crimes and Police Misconduct\nDomestic Violence, Sexual Assault, and Stalking\nMedical Services\nView All Medical Services\nPrimary Care\nHIV Care\nGender Affirming Care\nAudre Lorde Health Program\nInsurance Plans\nPharmacy\nMental Healt

GPT-4o-mini system prompt to the the most important information for the brochure

In [14]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

GPT-4o-mini user prompt to get the most relevant information for the brochure truncated to 20'000 characters

In [15]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20_000] # Truncate if more than 20,000 characters
    return user_prompt

In [17]:
get_brochure_user_prompt('Los Angeles LGBT Center', 'https://lalgbtcenter.org/')

Found links: {'links': [{'type': 'about page', 'url': 'https://lalgbtcenter.org/about/'}, {'type': 'careers page', 'url': 'https://lalgbtcenter.org/about/careers/'}, {'type': 'services page', 'url': 'https://lalgbtcenter.org/services/'}, {'type': 'our programs page', 'url': 'https://lalgbtcenter.org/our-programs/'}]}


"You are looking at a company called: Los Angeles LGBT Center\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nLos Angeles LGBT Center - A Safe Welcoming Space for LGBTQ+\nWebpage Contents:\nskip to main content\nCalendar\nLocations\nSearch\nPortal\nPatient\nVolunteer\nSenior\nTrans*Lounge\nen\nes\nDonate\nServices\nView All Services\nYouth Services\nView All Youth Services\nHousing\nDrop-In Center\nEducation\nEmployment Assistance\nLifeworks Mentorship Program\nFoster Youth Services (RISE)\nSenior Services\nView All Senior Services\nHousing\nCase Management\nFood & Nutrition Services\nEmployment Assistance\nActivities\nTransgender Services\nView All Transgender Services\nTransgender Wellness Center\nTrans* Lounge\nEmployment Assistance\nGroup Therapy\nTGI/ENBY+ Resource Index\nSurvivor Services\nView All Survivor Services\nHate Crimes and Police Misconduct\nDo

Defining a class to streaming the brochure Open AI

In [18]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [19]:
stream_brochure('Los Angeles LGBT Center', 'https://lalgbtcenter.org/')

Found links: {'links': [{'type': 'about page', 'url': 'https://lalgbtcenter.org/about/'}, {'type': 'careers page', 'url': 'https://lalgbtcenter.org/about/careers/'}, {'type': 'services page', 'url': 'https://lalgbtcenter.org/services/'}, {'type': 'donate page', 'url': 'https://donate.lalgbtcenter.org/donate/'}, {'type': 'programs page', 'url': 'https://lalgbtcenter.org/our-programs/'}]}


# Los Angeles LGBT Center

---

## A Safe and Welcoming Space for LGBTQ+

Since its inception in 1969, the **Los Angeles LGBT Center** has emerged as the largest LGBTQ+ organization worldwide, providing invaluable support and services to the community. 

## Our Mission

We care for, champion, and celebrate LGBTQ individuals and families, ensuring they can thrive as healthy, equal, and complete members of society. Our diverse range of programs spans four primary areas:

- **Health**
- **Social Services and Housing**
- **Culture and Education**
- **Leadership and Advocacy**

## Our Vision

Founded by a group of passionate activists, we started as a humble shelter and have evolved into a multifaceted service provider. We aim to cultivate a world where everyone can bloom, shine, and live freely, regardless of their sexual orientation or gender identity.

## Our Services

The Center offers over **40 unique services**, including, but not limited to:

- **Youth Services**
  - Lifeworks Mentorship Program
  - Foster Youth Services
- **Transgender Services**
  - Transgender Wellness Center
  - Group Therapy
- **Medical Services**
  - Primary Care, including HIV and Gender Affirming Care
  - Mental Health & Psychiatry Services
- **Legal Services**
  - Legal Clinic & Lawyer Referral
  - Immigrant Legal Services
- **Housing Services**
  - Case Management and Shelter Resources

---

## Company Culture

### Our Values

- **Respect**: Embracing individuality and treating everyone with dignity.
- **Excellence**: Committing to high-quality services and support.
- **Inclusivity**: Fostering diverse representation and perspectives.
- **Innovation**: Pioneering programs to meet community needs.
- **Integrity**: Honor our mission through honest collaboration.

### Community Involvement

With **over 500,000 visits** annually and **800 dedicated staff members**, the Center stands tall as a beacon of hope and support. The community thrives on our engagement, which includes a wide array of social events and advocacy efforts.

---

## Careers

Join our dynamic team and contribute directly to positive change. The Los Angeles LGBT Center is continuously seeking passionate individuals who are committed to serving the LGBTQ+ community. Opportunities range from direct service roles to support functions across various departments.

### **Open Job Positions**
- Direct Service Providers
- Community Advocates
- Mental Health Professionals
- Administrative Support

---

## Get Involved

### Volunteer
Your time can make a difference! We offer various volunteer opportunities tailored to different interests and expertise.

### Donations
Support us in our mission to create a safe and welcoming environment. Your donations enable us to expand our reach and enhance our services.

---

For more information, visit our website or contact us!

**Together, we can create a world where LGBTQ+ people can thrive.**