# Luma AI Event Scout  
By [Alex Fazio](https://x.com/alxfazio/)  
Github repo: [social-crawler](https://github.com/alexfazio/social-crawler)

This Jupyter notebook demonstrates an automated pipeline for discovering AI-related events from Luma using Firecrawl's advanced web scraping capabilities. The workflow enables you to:

- Dynamically select cities from a global list using Firecrawl's URL construction
- Scrape event listings from Luma's city-specific pages
- Extract structured event data using Firecrawl's LLM-powered extraction
- Implement semantic filtering with OpenAI to identify AI-focused events
- Automate Discord notifications for high-confidence matches

This solution is particularly valuable for community builders and tech enthusiasts who want to track emerging AI meetups, workshops, and conferences across 60+ global cities.

**Key Firecrawl Uses:**
- City-specific URL scraping with CSS selector targeting
- Structured extraction of event details using Pydantic models
- Schema validation for consistent event data formatting
- Batch processing of multiple event URLs

**Practical Application:**
Automatically monitor Luma event platforms across major tech hubs worldwide, filter for AI-related gatherings using semantic analysis, and receive instant Discord alerts for high-quality matches.

Requirements  
Before proceeding, ensure you have:  
- Firecrawl API key (for city-based scraping and extraction)  
- OpenAI API key (for AI content filtering)  
- Discord webhook URL (for real-time notifications)  

We leverage these core packages:  
- firecrawl-py: For location-based event discovery and structured data extraction  
- openai: For semantic analysis of event descriptions  
- requests: For Discord integration and notification delivery  

## Input API Keys

In [1]:
from getpass import getpass
openai_api_key = getpass("Enter your OpenAI API key: ")
firecrawl_api_key = getpass("Enter your Firecrawl API key: ")
DISCORD_WEBHOOK_URL = getpass("Enter your Discord WEbhook URL: ")

# Firecrawl Scrape

## Install Dependencies

In [2]:
%pip install requests python-dotenv openai firecrawl-py -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Imports

In [3]:
import os
import requests
import json
import pprint
from firecrawl import FirecrawlApp

## Define City of Interest

In [26]:
# Define City of Interest

## Define City of Interest
# Define city mapping
cities = {
    1: 'atlanta',
    2: 'austin',
    3: 'bogota',
    4: 'boston',
    5: 'buenos-aires',
    6: 'chicago',
    7: 'dallas',
    8: 'denver',
    9: 'houston',
    10: 'los-angeles',
    11: 'mexico-city',
    12: 'miami',
    13: 'montreal',
    14: 'nyc',
    15: 'philadelphia',
    16: 'phoenix',
    17: 'portland',
    18: 'salt-lake-city',
    19: 'san-diego',
    20: 'sf',
    21: 'sao-paulo',
    22: 'seattle',
    23: 'toronto',
    24: 'vancouver',
    25: 'washington-dc',
    26: 'waterloo',
    27: 'lagos',
    28: 'nairobi',
    29: 'bangkok',
    30: 'bengaluru',
    31: 'dubai',
    32: 'ho-chi-minh-city',
    33: 'hong-kong',
    34: 'jakarta',
    35: 'kuala-lumpur',
    36: 'manila',
    37: 'melbourne',
    38: 'mumbai',
    39: 'new-delhi',
    40: 'seoul',
    41: 'singapore',
    42: 'sydney',
    43: 'taipei',
    44: 'tel-aviv',
    45: 'tokyo',
    46: 'amsterdam',
    47: 'barcelona',
    48: 'berlin',
    49: 'brussels',
    50: 'copenhagen',
    51: 'helsinki',
    52: 'istanbul',
    53: 'lisbon',
    54: 'london',
    55: 'madrid',
    56: 'milan',
    57: 'munich',
    58: 'paris',
    59: 'stockholm',
    60: 'zurich'
}

# Display city selection
print("Select a city to scrape:")
for num, name in cities.items():
    print(f"{num}. {name.replace('-', ' ').title()}")

# Get valid user input
while True:
    try:
        selection = int(input("\nEnter city number (1-60): "))
        if 1 <= selection <= 60:
            city = cities[selection]
            break
        else:
            print("Please enter a number between 1 and 60")
    except ValueError:
        print("Invalid input. Please enter a number.")

print(f"\nSelected city: {city.replace('-', ' ').title()}")

Select a city to scrape:
1. Atlanta
2. Austin
3. Bogota
4. Boston
5. Buenos Aires
6. Chicago
7. Dallas
8. Denver
9. Houston
10. Los Angeles
11. Mexico City
12. Miami
13. Montreal
14. Nyc
15. Philadelphia
16. Phoenix
17. Portland
18. Salt Lake City
19. San Diego
20. Sf
21. Sao Paulo
22. Seattle
23. Toronto
24. Vancouver
25. Washington Dc
26. Waterloo
27. Lagos
28. Nairobi
29. Bangkok
30. Bengaluru
31. Dubai
32. Ho Chi Minh City
33. Hong Kong
34. Jakarta
35. Kuala Lumpur
36. Manila
37. Melbourne
38. Mumbai
39. New Delhi
40. Seoul
41. Singapore
42. Sydney
43. Taipei
44. Tel Aviv
45. Tokyo
46. Amsterdam
47. Barcelona
48. Berlin
49. Brussels
50. Copenhagen
51. Helsinki
52. Istanbul
53. Lisbon
54. London
55. Madrid
56. Milan
57. Munich
58. Paris
59. Stockholm
60. Zurich

Selected city: Sf


## Instantiate FirecrawlApp

In [27]:
# Initialize Firecrawl client
app = FirecrawlApp(api_key=firecrawl_api_key)

## Firecrawl Scrape

In [28]:
## Firecrawl Scrape

# Verify city selection exists
if 'city' not in locals():
    raise ValueError("City not selected - please run the city selection cell first")

# Scrape event links for selected city
response = app.scrape_url(
    url=f'https://lu.ma/{city}',
    params={
        'formats': ['links'],
        'includeTags': ['.event-link.content-link']
    }
)

# Extract and validate links
links_list = response.get('links', [])
if not links_list:
    print(f"No events found for {city.replace('-', ' ').title()}")
else:
    print(f"Found {len(links_list)} events in {city.replace('-', ' ').title()}:")
    pprint.pprint(links_list)

Found 20 events in Sf:
['https://lu.ma/aidevfeb',
 'https://lu.ma/z9blbdti',
 'https://lu.ma/sfdata',
 'https://lu.ma/thoughtexperiments',
 'https://lu.ma/5ycmcpa8',
 'https://lu.ma/0zo4ebka',
 'https://lu.ma/6pk5espk',
 'https://lu.ma/trends2025',
 'https://lu.ma/vgss0zqv',
 'https://lu.ma/LSEevent',
 'https://lu.ma/kf9v1pd9',
 'https://lu.ma/rxpzx3gt',
 'https://lu.ma/thfrtcq7',
 'https://lu.ma/rdbts77j',
 'https://lu.ma/vyozu7eq',
 'https://lu.ma/rotky4am',
 'https://lu.ma/mi17jr8p',
 'https://lu.ma/wrxzfek3',
 'https://lu.ma/fyu8iqnk',
 'https://lu.ma/myb0z44f']


## Firecrawl Extract

### Imports

In [29]:
from firecrawl import FirecrawlApp
from pydantic import BaseModel, Field
from typing import Any, Optional, List

### Extract Event Cards

In [30]:
# extract name, date, location, and description from each event URL

class NestedModel1(BaseModel):
    event_name: str
    event_date: str
    event_location_city: str
    event_description: str

class ExtractSchema(BaseModel):
    events: list[NestedModel1]

# Initialize an empty list to store the results
results = []

# Process each link in the links_list
for link in links_list:
    data = app.extract([link], {
        'prompt': 'Extract the name, date, location, and description of each event. Ensure that the name, date, and location are always captured.',
        'schema': ExtractSchema.model_json_schema(),
    })
    # Append a dictionary containing the link and its corresponding data
    results.append({
        'url': link,
        'result': data
    })

# Convert the results to JSON if needed
import json
results_json = json.dumps(results, indent=2)

# Print or use the JSON results as needed
pprint.pprint(results_json)

('[\n'
 '  {\n'
 '    "url": "https://lu.ma/aidevfeb",\n'
 '    "result": {\n'
 '      "success": true,\n'
 '      "data": {\n'
 '        "events": [\n'
 '          {\n'
 '            "event_date": "February 12, 2025",\n'
 '            "event_name": "AI Dev Tools Night @ Cloudflare HQ (February)",\n'
 '            "event_description": "Join us for networking & learning '
 'opportunities with awesome speakers & dev tool builders. We\\u2019ll '
 'kickstart it with up to 4 lightning talks by AI dev tool builders, followed '
 'by plenty of time for networking, asking questions, and further '
 'discussions.",\n'
 '            "event_location_city": "San Francisco, California"\n'
 '          }\n'
 '        ]\n'
 '      },\n'
 '      "status": "completed",\n'
 '      "expiresAt": "2025-02-12T19:56:28.000Z"\n'
 '    }\n'
 '  },\n'
 '  {\n'
 '    "url": "https://lu.ma/z9blbdti",\n'
 '    "result": {\n'
 '      "success": true,\n'
 '      "data": {\n'
 '        "events": [\n'
 '          {\n'
 '

# Semantic Filtering w/ OpenAI + Structured Outpus

## Imports

In [31]:
from openai import OpenAI

In [32]:
# Initialize OpenAI client
client = OpenAI(api_key=openai_api_key)

In [33]:
# Category Prompt

DESIRED_CATEGORY = """
AI related social events.
"""

In [34]:
import json
from pydantic import BaseModel, ValidationError
import openai

class CategoryMatch(BaseModel):
    belongs_to_category: bool
    confidence: float

def belongs_to_category(event_name: str, event_description: str, desired_category: str) -> tuple[bool, float]:
    system_instructions = (
        "You are an event classifier. "
        "Given: a desired_category, an event_name, and an event_description, "
        "determine if the event belongs to the desired_category. "
        "Output only valid JSON with the exact format: "
        "{ \"belongs_to_category\": boolean, \"confidence\": float }. "
        "Where 'belongs_to_category' is True if the event belongs to the specified desired_category, "
        "otherwise False, and 'confidence' is a float between 0 and 1. No additional keys or text."
    )

    user_prompt = (
        f"desired_category: {desired_category}\n"
        f"event_name: {event_name}\n"
        f"event_description: {event_description}"
    )

    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_instructions},
                {"role": "user", "content": user_prompt},
            ],
            temperature=0.7
        )
        
        message_content = response.choices[0].message.content.strip()
        
        if not message_content:
            return False, 0.0
            
        parsed_args = json.loads(message_content)

        classification = CategoryMatch(**parsed_args)
        return classification.belongs_to_category, classification.confidence

    except (JSONDecodeError, ValidationError) as e:
        return False, 0.0

# Parse the JSON string into a Python object
events_data = json.loads(results_json)

# Desired category
DESIRED_CATEGORY = "AI Events"

# Initialize a list to store the results
filtered_events = []

# Process each event
for event_data in events_data:
    url = event_data["url"]
    for event in event_data["result"]["data"]["events"]:
        name = event["event_name"]
        description = event["event_description"]
        belongs, confidence = belongs_to_category(name, description, DESIRED_CATEGORY)
        
        # Only store and print results that are True with confidence above 0.7
        if belongs and confidence > 0.7:
            filtered_events.append({
                "url": url,
                "event_name": name,
                "event_description": description,  # Added missing description field
                "belongs_to_category": belongs,
                "confidence": confidence
            })
            
            print(f"Event URL: {url}")
            print(f"Event: {name}")
            print(f"Belongs to '{DESIRED_CATEGORY}': {belongs} with confidence {confidence}\n")

# Now `filtered_events` contains all the results that are True with confidence above 0.7

Event URL: https://lu.ma/aidevfeb
Event: AI Dev Tools Night @ Cloudflare HQ (February)
Belongs to 'AI Events': True with confidence 0.95

Event URL: https://lu.ma/sfdata
Event: SF Data + AI Happy Hour
Belongs to 'AI Events': True with confidence 0.95

Event URL: https://lu.ma/6pk5espk
Event: Groq Developer Meetup
Belongs to 'AI Events': True with confidence 0.85

Event URL: https://lu.ma/trends2025
Event: Funding Trends for Early Stage Founders in 2025 (Crunchbase, VC Panel, & Pitch)
Belongs to 'AI Events': True with confidence 0.85

Event URL: https://lu.ma/fyu8iqnk
Event: Multimodal AI Agents - 2 day Hackathon
Belongs to 'AI Events': True with confidence 0.95



# Discord Notification Test

In [35]:
import json
import requests

# Desired confidence threshold
CONFIDENCE_THRESHOLD = 0.95

def send_event_notification(event_name: str, event_description: str, url: str):
    """Send a new event notification to Discord"""
    message = {
        "embeds": [
            {
                "title": "🎉 New AI Event Detected!",
                "description": f"**Event Name:** {event_name}\n\n"
                               f"**Description:** {event_description}\n\n"
                               f"[View Event]({url})",
                "color": 5814783,  # Example color
            }
        ]
    }

    try:
        print(f"Sending notification for event: {event_name}")
        response = requests.post(DISCORD_WEBHOOK_URL, json=message)
        if response.status_code == 204:  # Discord returns 204 on success
            print("Successfully sent Discord notification")
        else:
            print(f"Discord API returned status {response.status_code}: {response.text}")
                    
    except requests.RequestException as e:
        print(f"Error sending Discord notification: {str(e)}")

# Process each filtered event
for event in filtered_events:
    url = event["url"]
    name = event["event_name"]
    confidence = event["confidence"]
    description = event.get("event_description", "")  # Ensure description is correctly extracted
    print(f"Processing event: {name} with confidence {confidence}")
    if confidence >= CONFIDENCE_THRESHOLD:
        # Ensure the correct description is passed
        send_event_notification(name, description, url)

Processing event: AI Dev Tools Night @ Cloudflare HQ (February) with confidence 0.95
Sending notification for event: AI Dev Tools Night @ Cloudflare HQ (February)
Successfully sent Discord notification
Processing event: SF Data + AI Happy Hour with confidence 0.95
Sending notification for event: SF Data + AI Happy Hour
Successfully sent Discord notification
Processing event: Groq Developer Meetup with confidence 0.85
Processing event: Funding Trends for Early Stage Founders in 2025 (Crunchbase, VC Panel, & Pitch) with confidence 0.85
Processing event: Multimodal AI Agents - 2 day Hackathon with confidence 0.95
Sending notification for event: Multimodal AI Agents - 2 day Hackathon
Successfully sent Discord notification
