<a href="https://colab.research.google.com/github/alexfazio/firecrawl-quickstart/blob/main/events-scout-examples/eventbrite.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Eventbrite AI Event Scout  
By [Alex Fazio](https://x.com/alxfazio/)  
Github repo: [social-crawler](https://github.com/alexfazio/social-crawler)

This Jupyter notebook demonstrates how to build an automated event discovery pipeline using Firecrawl's scraping and extraction capabilities combined with AI filtering. By the end of this tutorial, you'll be able to:

- Use Firecrawl to scrape event listings from Eventbrite by location
- Extract structured event data using Firecrawl's LLM extraction
- Implement AI-powered semantic filtering with OpenAI
- Set up automated Discord notifications for matching events

This workflow is designed for developers and community managers who want to automatically track relevant in-person events across multiple cities worldwide.

**Key Firecrawl Uses:**
- Location-based URL construction for event discovery
- Structured data extraction from event pages
- Schema validation for consistent event data formatting

**Practical Application:**
Automatically find and notify your team about AI-related meetups/conferences in specified cities, filtering out irrelevant events through semantic analysis.

Requirements  
Before proceeding, ensure you have:  
- Firecrawl API key (for web scraping/extraction)  
- OpenAI API key (for AI filtering)  
- Discord webhook URL (for notifications)  

We'll be using the following packages:  
- firecrawl-py: For location-based scraping and structured data extraction  
- openai: For semantic event filtering  
- requests: For Discord webhook integration  


## Input API Keys

In [29]:
from getpass import getpass
openai_api_key = getpass("Enter your OpenAI API key: ")
firecrawl_api_key = getpass("Enter your Firecrawl API key: ")
DISCORD_WEBHOOK_URL = getpass("Enter your Discord WEbhook URL: ")

# Firecrawl Scrape

## Install Dependencies

In [30]:
%pip install requests python-dotenv openai firecrawl-py -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Imports

In [31]:
import os
import requests
import json
import pprint
from firecrawl import FirecrawlApp

## Define City of Interest

In [32]:
# Define hierarchical location structure
locations = {
    "Americas": {
        "United States": [
            'atlanta', 'austin', 'boston', 'chicago', 'dallas', 'denver',
            'houston', 'los-angeles', 'miami', 'nyc', 'philadelphia',
            'phoenix', 'portland', 'salt-lake-city', 'san-diego', 'sf',
            'seattle', 'washington-dc'
        ],
        "Canada": ['montreal', 'toronto', 'vancouver', 'waterloo'],
        "Mexico": ['mexico-city'],
        "Brazil": ['sao-paulo'],
        "Colombia": ['bogota'],
        "Argentina": ['buenos-aires']
    },
    "Europe": {
        "Italy": ['milan'],
        "France": ['paris'],
        "Germany": ['berlin', 'munich'],
        "UK": ['london'],
        "Spain": ['barcelona', 'madrid'],
        "Netherlands": ['amsterdam'],
        "Switzerland": ['zurich'],
        "Sweden": ['stockholm'],
        "Portugal": ['lisbon'],
        "Belgium": ['brussels'],
        "Denmark": ['copenhagen'],
        "Finland": ['helsinki'],
        "Turkey": ['istanbul']
    },
    "Asia": {
        "Japan": ['tokyo'],
        "India": ['bengaluru', 'mumbai', 'new-delhi'],
        "China": ['hong-kong'],
        "Singapore": ['singapore'],
        "South Korea": ['seoul'],
        "Thailand": ['bangkok'],
        "Vietnam": ['ho-chi-minh-city'],
        "Indonesia": ['jakarta'],
        "Malaysia": ['kuala-lumpur'],
        "Philippines": ['manila'],
        "Taiwan": ['taipei'],
        "Israel": ['tel-aviv'],
        "UAE": ['dubai']
    },
    "Africa": {
        "Nigeria": ['lagos'],
        "Kenya": ['nairobi'],
        "South Africa": ['capetown']
    },
    "Oceania": {
        "Australia": ['melbourne', 'sydney']
    }
}

# Region selection
print("Select a region:")
regions = list(locations.keys())
for idx, region in enumerate(regions, 1):
    print(f"{idx}. {region}")
    
while True:
    try:
        region_sel = int(input("\nEnter region number: ")) - 1
        selected_region = regions[region_sel]
        break
    except (ValueError, IndexError):
        print("Invalid input. Please enter a valid number from the list.")

# Nation selection
print(f"\nSelect a nation in {selected_region}:")
nations = list(locations[selected_region].keys())
for idx, nation in enumerate(nations, 1):
    print(f"{idx}. {nation}")

while True:
    try:
        nation_sel = int(input("\nEnter nation number: ")) - 1
        selected_nation = nations[nation_sel]
        break
    except (ValueError, IndexError):
        print("Invalid input. Please enter a valid number from the list.")

# City selection
print(f"\nSelect a city in {selected_nation}:")
cities = locations[selected_region][selected_nation]
for idx, city in enumerate(cities, 1):
    print(f"{idx}. {city.replace('-', ' ').title()}")

while True:
    try:
        city_sel = int(input("\nEnter city number: ")) - 1
        selected_city = cities[city_sel]
        break
    except (ValueError, IndexError):
        print("Invalid input. Please enter a valid number from the list.")

# Create city variable
city = f"{selected_nation.lower().replace(' ', '-')}--{selected_city}"
print(f"\nSelected location: {city}")

Select a region:
1. Americas
2. Europe
3. Asia
4. Africa
5. Oceania

Select a nation in Europe:
1. Italy
2. France
3. Germany
4. UK
5. Spain
6. Netherlands
7. Switzerland
8. Sweden
9. Portugal
10. Belgium
11. Denmark
12. Finland
13. Turkey

Select a city in Italy:
1. Milan

Selected location: italy--milan


## Define Timeframe

In [33]:
# Timeframe selection options
timeframe_options = [
    'events--today',
    'events--tomorrow',
    'events--this-weekend',
    'events--this-week',
    'events--next-week',
    'events--this-month',
    'events--next-month'
]

# Display timeframe choices
print("Select a timeframe for events:")
for idx, timeframe in enumerate(timeframe_options, 1):
    # Format display name with "events" prefix and title case
    display_name = "events " + timeframe.split('--')[1].replace('-', ' ').title()
    print(f"{idx}. {display_name}")

# Get valid user input
while True:
    try:
        selection = int(input("\nEnter timeframe number (1-7): "))
        if 1 <= selection <= len(timeframe_options):
            timeframe = timeframe_options[selection - 1]
            break
        else:
            print(f"Please enter a number between 1 and {len(timeframe_options)}")
    except ValueError:
        print("Invalid input. Please enter a number.")

# Display confirmation with consistent formatting
display_name = "events " + timeframe.split('--')[1].replace('-', ' ').title()
print(f"\nSelected timeframe: {display_name}")

Select a timeframe for events:
1. events Today
2. events Tomorrow
3. events This Weekend
4. events This Week
5. events Next Week
6. events This Month
7. events Next Month

Selected timeframe: events Today


## Instantiate FirecrawlApp

In [34]:
# Initialize Firecrawl client
app = FirecrawlApp(api_key=firecrawl_api_key)

## Firecrawl Extract

### Imports

In [35]:
from firecrawl import FirecrawlApp
from pydantic import BaseModel, Field
from typing import Any, Optional, List

### Extract the Total Number of Pages Returned by the Search

In [36]:
class ExtractSchema(BaseModel):
    total_search_pages_returned: float

data = app.extract(
    [f'https://www.eventbrite.com/d/{city}/{timeframe}/'],
    {
        'prompt': 'Extract the total number of pages returned by the event search.',
        'schema': ExtractSchema.model_json_schema(),
    }
)

# Store the total pages in a variable with error handling
try:
    total_pages = int(data['data']['total_search_pages_returned'])
    print(f"Total search pages returned: {total_pages}")
except (KeyError, TypeError) as e:
    print(f"Error extracting page count: {str(e)}")
    print("Defaulting to 1 page")
    total_pages = 1  # Fallback value

Total search pages returned: 9


## Firecrawl Scrape

Extract all event cards found in all the returned search results.

In [37]:
# Firecrawl Scrape

# Verify city selection exists
if 'city' not in locals():
    raise ValueError("City not selected - please run the city selection cell first")

# Initialize list to collect all events
links_list = []

# Scrape all pages
for page in range(1, total_pages + 1):
    print(f"Scraping page {page}/{total_pages}")
    page_url = f'https://www.eventbrite.com/d/{city}/{timeframe}/?page={page}'
    
    response = app.scrape_url(
        url=page_url,
        params={
            'formats': ['links'],
            'includeTags': ['a.event-card-link']
        }
    )
    
    # Add page results to main list
    page_links = response.get('links', [])
    links_list.extend(page_links)
    print(f"Found {len(page_links)} events on this page")

# Extract and validate all links
if not links_list:
    print(f"No events found for {city.replace('-', ' ').title()}")
else:
    print(f"\nTotal {len(links_list)} events found across {total_pages} pages:")
    pprint.pprint(links_list)

Scraping page 1/9
Found 20 events on this page
Scraping page 2/9
Found 19 events on this page
Scraping page 3/9
Found 20 events on this page
Scraping page 4/9
Found 19 events on this page
Scraping page 5/9
Found 20 events on this page
Scraping page 6/9
Found 20 events on this page
Scraping page 7/9
Found 19 events on this page
Scraping page 8/9
Found 19 events on this page
Scraping page 9/9
Found 0 events on this page

Total 156 events found across 9 pages:
['https://www.eventbrite.it/e/biglietti-cattolica-goes-to-gattopardo-1244243468709?aff=ebdssbdestsearch',
 'https://www.eventbrite.com/e/klinikum-the-power-of-patient-centered-design-tickets-1123834091019?aff=ebdssbdestsearch',
 'https://www.eventbrite.com/e/klinikum-revolutionizing-healthcare-the-power-of-patient-centered-design-tickets-1123728324669?aff=ebdssbdestsearch',
 'https://www.eventbrite.com/e/ceramica-meditativa-dharana-pottery-corso-da-mini-design-club-tickets-1093556068639?aff=ebdssbdestsearch',
 'https://www.eventbrit

## Firecrawl Extract

### Imports

In [38]:
from firecrawl import FirecrawlApp
from pydantic import BaseModel, Field
from typing import Any, Optional, List

### Extract Event Cards

In [40]:
# extract name, date, location, and description from each event URL

# TEMPORARY SAFETY LIMIT - set to None to remove limit
MAX_EXTRACTIONS = 10  # Change this number or set to None for full processing

class NestedModel1(BaseModel):
    event_name: str
    event_date: str
    event_location_city: str
    event_description: str

class ExtractSchema(BaseModel):
    events: list[NestedModel1]

# Initialize an empty list to store the results
results = []

# Process each link in the links_list with safety limit
for link in links_list[:MAX_EXTRACTIONS] if MAX_EXTRACTIONS else links_list:
    data = app.extract([link], {
        'prompt': 'Extract the name, date, location, and description of each event. Ensure that the name, date, and location are always captured.',
        'schema': ExtractSchema.model_json_schema(),
    })
    # Append a dictionary containing the link and its corresponding data
    results.append({
        'url': link,
        'result': data
    })
    
if MAX_EXTRACTIONS:
    print(f"\n⚠️ TEMPORARY LIMIT ACTIVE: Processed first {MAX_EXTRACTIONS} events")
    print("To remove limit, set MAX_EXTRACTIONS = None at top of cell\n")

# Convert the results to JSON if needed
import json
results_json = json.dumps(results, indent=2)

# Print or use the JSON results as needed
pprint.pprint(results_json)


⚠️ TEMPORARY LIMIT ACTIVE: Processed first 10 events
To remove limit, set MAX_EXTRACTIONS = None at top of cell

('[\n'
 '  {\n'
 '    "url": '
 '"https://www.eventbrite.it/e/biglietti-cattolica-goes-to-gattopardo-1244243468709?aff=ebdssbdestsearch",\n'
 '    "result": {\n'
 '      "success": true,\n'
 '      "data": {\n'
 '        "events": [\n'
 '          {\n'
 '            "event_date": "2025-02-13T23:30:00+01:00",\n'
 '            "event_name": "CATTOLICA goes to GATTOPARDO",\n'
 '            "event_description": "WELCOME TO MILAN - OFFICIAL PARTY '
 'CATTOLICA EXCHANGE STUDENTS",\n'
 '            "event_location_city": "Milano"\n'
 '          }\n'
 '        ]\n'
 '      },\n'
 '      "status": "completed",\n'
 '      "expiresAt": "2025-02-13T19:58:12.000Z"\n'
 '    }\n'
 '  },\n'
 '  {\n'
 '    "url": '
 '"https://www.eventbrite.com/e/klinikum-the-power-of-patient-centered-design-tickets-1123834091019?aff=ebdssbdestsearch",\n'
 '    "result": {\n'
 '      "success": true,\n'
 ' 

# Semantic Filtering w/ OpenAI + Structured Outputs

## Imports

In [41]:
from openai import OpenAI

In [42]:
# Initialize OpenAI client
client = OpenAI(api_key=openai_api_key)

## Define Category of Interest

In [43]:
# Category Prompt

DESIRED_CATEGORY = """
AI related social events.
"""

## Score and Filter Events

In [44]:
import json
from pydantic import BaseModel, ValidationError
import openai

class CategoryMatch(BaseModel):
    belongs_to_category: bool
    confidence: float

def belongs_to_category(event_name: str, event_description: str, desired_category: str) -> tuple[bool, float]:
    system_instructions = (
        "You are an event classifier. "
        "Given: a desired_category, an event_name, and an event_description, "
        "determine if the event belongs to the desired_category. "
        "Output only valid JSON with the exact format: "
        "{ \"belongs_to_category\": boolean, \"confidence\": float }. "
        "Where 'belongs_to_category' is True if the event belongs to the specified desired_category, "
        "otherwise False, and 'confidence' is a float between 0 and 1. No additional keys or text."
    )

    user_prompt = (
        f"desired_category: {desired_category}\n"
        f"event_name: {event_name}\n"
        f"event_description: {event_description}"
    )

    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_instructions},
                {"role": "user", "content": user_prompt},
            ],
            temperature=0.7
        )
        
        message_content = response.choices[0].message.content.strip()
        
        if not message_content:
            return False, 0.0
            
        parsed_args = json.loads(message_content)

        classification = CategoryMatch(**parsed_args)
        return classification.belongs_to_category, classification.confidence

    except (JSONDecodeError, ValidationError) as e:
        return False, 0.0

# Parse the JSON string into a Python object
events_data = json.loads(results_json)

# Desired category
DESIRED_CATEGORY = "AI Events"

# Initialize a list to store the results
filtered_events = []

# Process each event
for event_data in events_data:
    url = event_data["url"]
    for event in event_data["result"]["data"]["events"]:
        name = event["event_name"]
        description = event["event_description"]
        belongs, confidence = belongs_to_category(name, description, DESIRED_CATEGORY)
        
        # Only store and print results that are True with confidence above 0.7
        if belongs and confidence > 0.7:
            filtered_events.append({
                "url": url,
                "event_name": name,
                "event_description": description,
                "belongs_to_category": belongs,
                "confidence": confidence
            })
            
            print(f"Event URL: {url}")
            print(f"Event: {name}")
            print(f"Belongs to '{DESIRED_CATEGORY}': {belongs} with confidence {confidence}\n")

# Handle empty results
if not filtered_events:
    print(f"\n⚠️ No events passed the {DESIRED_CATEGORY} filter with confidence > 0.7")
else:
    print(f"\n✅ Found {len(filtered_events)} events matching {DESIRED_CATEGORY} criteria")

# Now `filtered_events` contains all the results that are True with confidence above 0.7


⚠️ No events passed the AI Events filter with confidence > 0.7


# Discord Notification Test

In [14]:
import json
import requests

# Desired confidence threshold
CONFIDENCE_THRESHOLD = 0.95

def send_event_notification(event_name: str, event_description: str, url: str):
    """Send a new event notification to Discord"""
    message = {
        "embeds": [
            {
                "title": "🎉 New AI Event Detected!",
                "description": f"**Event Name:** {event_name}\n\n"
                               f"**Description:** {event_description}\n\n"
                               f"[View Event]({url})",
                "color": 5814783,  # Example color
            }
        ]
    }

    try:
        print(f"Sending notification for event: {event_name}")
        response = requests.post(DISCORD_WEBHOOK_URL, json=message)
        if response.status_code == 204:  # Discord returns 204 on success
            print("Successfully sent Discord notification")
        else:
            print(f"Discord API returned status {response.status_code}: {response.text}")
                    
    except requests.RequestException as e:
        print(f"Error sending Discord notification: {str(e)}")

# Process each filtered event
for event in filtered_events:
    url = event["url"]
    name = event["event_name"]
    confidence = event["confidence"]
    description = event.get("event_description", "")  # Ensure description is correctly extracted
    print(f"Processing event: {name} with confidence {confidence}")
    if confidence >= CONFIDENCE_THRESHOLD:
        # Ensure the correct description is passed
        send_event_notification(name, description, url)