# Welcome to the Lab 🏠 📊

### Property Data Downloader

This interactive notebook helps you download and analyze property-level data using the Parcl Labs API. You'll be guided through:

1. **Market Search**: Find and select specific markets using their names or location types
2. **Property Discovery**: Apply customizable filters to find properties that match your criteria
3. **Event Collection**: Retrieve detailed property histories including sales, rentals, and listings
4. **Data Export**: Save everything to Excel-friendly CSV files for further analysis

**You'll learn how to:**
* Search through 70,000+ markets efficiently
* Apply filters like property type, size, age, ownership status, and when properties were added to our database
* Track property histories and market events, including recent data updates
* Export data in analysis-ready formats

**Before you start:**
* Get your Parcl Labs API key [here](https://dashboard.parcllabs.com/signup)
* For immediate use, open in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ParclLabs/parcllabs-cookbook/blob/main/examples/getting_started/property_downloader_simplified.ipynb)

**Note**: For large-scale data downloads across multiple markets, consider upgrading to a Starter account ($99/mo, no commitment) at your [Parcl Labs dashboard](https://dashboard.parcllabs.com/login).

### Import the Parcl Labs Python Library and set up your API key

In [None]:
# if needed, install and/or upgrade to the latest verison of the Parcl Labs Python library
%pip install --upgrade parcllabs

In [None]:
# Import required libraries
import os
import pandas as pd
import numpy as np
from parcllabs import ParclLabsClient
from datetime import datetime

# Input your API key here
# If you don't have an API key, sign up at https://dashboard.parcllabs.com/signup
# PARCL_LABS_API_KEY = "YOUR_API_KEY" # Replace with your API key
PARCL_LABS_API_KEY = os.getenv('PARCL_LABS_API_KEY')
# Initialize the Parcl Labs Client
try:
    client = ParclLabsClient(
        api_key=PARCL_LABS_API_KEY,
        num_workers=20
    )

    # Test the client with a simple API call, 
    test_market = client.search.markets.retrieve(query='United States', limit=1)
    print("✓ API connection successful!")

except Exception as e:
    print("\n❌ Error initializing the Parcl Labs client!")
    print("\nTo fix this:")
    print("1. Sign up for an API key at https://dashboard.parcllabs.com/signup")
    print("2. Either:")
    print("   a) Replace the empty string in PARCL_LABS_API_KEY = \"\" with your key")
    print("   or")
    print("   b) Set your PARCL_LABS_API_KEY environment variable")
    print(f"\nError details: {str(e)}")
    raise

print("✓ Setup complete! You're ready to start searching for properties.")

## Market Search 🗺️
Let's find the market you want to analyze. You can search by:

* Name: City, metro area, or region name (e.g., "New York", "Miami", "Los Angeles")
* Location Type: Filter by specific geographic levels
  * `CBSA`: Metropolitan Statistical Areas
  * `CITY`: Individual cities
  * `ZIP5`: 5-digit ZIP codes
  * `COUNTY`: County level areas

You can search by name alone or combine it with a location type for more specific results.

Note: Each market has a unique parcl_id that we'll use to retrieve data in later steps.

In [None]:
# Prompt the user for the market name
print("🔍 Let's find your market!")
market_query = input("Enter the market name (e.g., 'Atlanta', 'New York', 'Miami'): ").strip()

print(f"Searching for markets with the name: {market_query}")
# Optional: Specify a location type to narrow your search
print("\nOptional: Specify a location type to narrow your search")
print("Common types: 'CBSA' (metro area), 'CITY', 'ZIP5', 'COUNTY'")
print("Press Enter to search all types")
location_type_input = input("Enter location type or press Enter to skip: ").strip()
if location_type_input == '':
    location_type_input = None
else:
    location_type_string = location_type_input
print(f"Searching for markets with the name: {market_query} and location type: {location_type_string}")
# Prepare search arguments
search_kwargs = {'query': market_query}
if location_type_input is not None:
    search_kwargs['location_type'] = location_type_input

try:
    market_df = client.search.markets.retrieve(**search_kwargs, limit=5)
except Exception as e:
    print(f"\n❌ Error during market search: {str(e)}")
    print("Please try again with a different market name or location type.")
    raise
# Check the results of the search
if market_df.empty:
        raise ValueError("No markets found for this query. Please try another search.")

    # If multiple results are returned, let the user pick one
if len(market_df) > 1:
    print("\n📍 Multiple markets found:")
    seen_indices = []
    for i, row in market_df.iterrows():
      seen_indices.append(i)
      print(f"{i}: {row['name']} (Type: {row['location_type']}, ID: {row['parcl_id']})")

    selection = input("\nEnter the number of the market you want to analyze: ").strip()
    try:
        selection_idx = int(selection)
        if selection_idx not in seen_indices:
            raise ValueError(f"Available input options include: {seen_indices}")
    except ValueError:
        raise ValueError("Invalid selection. Please restart and choose a valid number.")
else:
  selection_idx = 0

# Store selected market information
selected_market_id = int(market_df['parcl_id'].values.tolist()[selection_idx])
selected_market_name = market_df['name'][market_df['parcl_id']==selected_market_id].values[0]
selected_market_location_type = market_df['location_type'][market_df['parcl_id']==selected_market_id].values[0]
print(f"\n✓ Selected Market: {selected_market_name} (ID: {selected_market_id})")
print("\nMarket selection complete! Ready to search for properties.")

# Property Search 🏡

Now that we've selected a market, let's find properties that match your criteria. You can filter by:

* **Property Type**: Single Family, Condo, Townhouse, or Other
* **Size**: Square footage and number of bedrooms/bathrooms
* **Age**: Year built range
* **Status**: New construction, owner-occupied, or investor-owned
* **History**: Properties with sales, rentals, or listing events
* **Ownership**: Filter by specific institutional owners
* **Record Added Date**: Filter by when properties were added to our database

You can use any combination of these filters, or none at all to see all properties.

In [None]:
# Define helper functions
def get_numeric_input(prompt, allow_empty=True, min_value=None, max_value=None):
    """Helper function to get validated numeric input"""
    while True:
        value = input(prompt).strip()
        if allow_empty and value == '':
            return None
        try:
            num_value = int(value)
            if min_value is not None and num_value < min_value:
                print(f"Value must be at least {min_value}")
                continue
            if max_value is not None and num_value > max_value:
                print(f"Value must be no more than {max_value}")
                continue
            return num_value
        except ValueError:
            print("Please enter a valid number")

def get_boolean_input(prompt):
    """Helper function to get validated boolean input"""
    while True:
        value = input(prompt).strip().lower()
        if value == '':
            return None
        if value in ['y', 'yes']:
            return True
        if value in ['n', 'no']:
            return False
        print("Please enter 'y' for yes or 'n' for no (or press Enter to skip)")

def get_date_input(prompt):
    """Helper function to get and validate date input"""
    while True:
        date_str = input(prompt).strip()
        if date_str == '':
            return None
        try:
            # Parse date string and format it
            date = datetime.strptime(date_str, '%Y-%m-%d')
            return date.strftime('%Y-%m-%d')
        except ValueError:
            print("Please use YYYY-MM-DD format (e.g., 2020-01-01) or press Enter to skip")

print("\n🏠 Let's set up your property search filters!")
print("(Press Enter to skip any filter you don't want to apply)\n")

try:
    # Collect ALL parameters first
    search_params = {
        'parcl_ids': [selected_market_id]
    }

    # Size filters
    print("\n📏 SIZE FILTERS")
    sqft_min = get_numeric_input("Minimum square footage: ")
    if sqft_min:
        search_params['square_footage_min'] = sqft_min

    sqft_max = get_numeric_input("Maximum square footage: ")
    if sqft_max:
        search_params['square_footage_max'] = sqft_max

    beds_min = get_numeric_input("Minimum bedrooms: ")
    if beds_min:
        search_params['bedrooms_min'] = beds_min

    beds_max = get_numeric_input("Maximum bedrooms: ")
    if beds_max:
        search_params['bedrooms_max'] = beds_max

    baths_min = get_numeric_input("Minimum bathrooms: ")
    if baths_min:
        search_params['bathrooms_min'] = baths_min

    baths_max = get_numeric_input("Maximum bathrooms: ")
    if baths_max:
        search_params['bathrooms_max'] = baths_max

    # Age filters
    print("\n📅 AGE FILTERS")
    year_min = get_numeric_input("Earliest year built: ", min_value=1800, max_value=2024)
    if year_min:
        search_params['year_built_min'] = year_min

    year_max = get_numeric_input("Latest year built: ", min_value=1800, max_value=2024)
    if year_max:
        search_params['year_built_max'] = year_max

    # Event history filters
    print("\n📊 EVENT HISTORY FILTERS")
    print("Filter for properties with specific types of history:")

    has_sales = get_boolean_input("Include only properties with sales history? (y/n): ")
    if has_sales is not None:
        search_params['event_history_sale_flag'] = has_sales

    has_rentals = get_boolean_input("Include only properties with rental history? (y/n): ")
    if has_rentals is not None:
        search_params['event_history_rental_flag'] = has_rentals

    has_listings = get_boolean_input("Include only properties with listing history? (y/n): ")
    if has_listings is not None:
        search_params['event_history_listing_flag'] = has_listings

    # Current status filters
    print("\n🏗️ CURRENT STATUS FILTERS")
    is_new = get_boolean_input("Filter for new construction only? (y/n): ")
    if is_new is not None:
        search_params['current_new_construction_flag'] = is_new

    is_owner_occupied = get_boolean_input("Filter for owner-occupied properties? (y/n): ")
    if is_owner_occupied is not None:
        search_params['current_owner_occupied_flag'] = is_owner_occupied

    is_investor = get_boolean_input("Filter for investor-owned properties? (y/n): ")
    if is_investor is not None:
        search_params['current_investor_owned_flag'] = is_investor

    is_on_market = get_boolean_input("Filter for on market properties? (y/n): ")
    if is_on_market is not None:
        search_params['current_on_market_flag'] = is_on_market

    # Institutional owner filter
    print("\n🏢 INSTITUTIONAL OWNER FILTER")
    print("Available options: AMH, TRICON, INVITATION_HOMES, HOME_PARTNERS_OF_AMERICA,")
    print("PROGRESS_RESIDENTIAL, FIRSTKEY_HOMES, AMHERST, MAYMONT_HOMES,")
    print("VINEBROOK_HOMES, SFR3")
    owner = input("\nEnter institutional owner (or press Enter to skip): ").strip().upper()
    if owner in ['AMH', 'TRICON', 'INVITATION_HOMES', 'HOME_PARTNERS_OF_AMERICA',
                 'PROGRESS_RESIDENTIAL', 'FIRSTKEY_HOMES', 'AMHERST', 'MAYMONT_HOMES',
                 'VINEBROOK_HOMES', 'SFR3']:
        search_params['current_entity_owner_name'] = owner

    # Record Added Date filters
    print("\n📅 RECORD ADDED DATE FILTERS")
    print("Filter for properties based on when they were added to our database")
    print("Format: YYYY-MM-DD (e.g., 2024-12-13)")
    print("Press Enter to skip if you want all properties")
    print("Note: Properties added before December 13, 2024 will show that date")

    record_added_start = get_date_input("Start date for record added date: ")
    if record_added_start:
        search_params['record_added_date_start'] = record_added_start

    record_added_end = get_date_input("End date for record added date: ")
    if record_added_end:
        search_params['record_added_date_end'] = record_added_end

    # Now get property type and execute search
    print("\n📋 PROPERTY TYPE")
    print("Available types: SINGLE_FAMILY, CONDO, TOWNHOUSE, OTHER, ALL")
    print("\n⚠️  Note: Selecting 'ALL' will process each property type separately")
    print("   This may take longer and will download more data\n")

    prop_type = input("Enter property type (SINGLE_FAMILY, CONDO, TOWNHOUSE, OTHER, ALL): ").strip().upper()
    print()  # Add blank line after input

    # Initialize properties DataFrame
    properties_df = pd.DataFrame()

    if prop_type == 'ALL':
        print("\n📥 Processing all property types...")
        property_types = ['SINGLE_FAMILY', 'CONDO', 'TOWNHOUSE', 'OTHER']

        for pt in property_types:
            print(f"\n⏳ Searching {pt}...")
            search_params['property_type'] = pt
            try:
                temp_df = client.property.search.retrieve(**search_params)
                properties_df = pd.concat([properties_df, temp_df], ignore_index=True)
                print(f"✓ Found {len(temp_df)} {pt} properties")
            except Exception as e:
                print(f"❌ Error processing {pt}: {str(e)}")
                continue

    elif prop_type in ['SINGLE_FAMILY', 'CONDO', 'TOWNHOUSE', 'OTHER']:
        print("\n🔍 Searching for properties...", flush=True)
        search_params['property_type'] = prop_type
        properties_df = client.property.search.retrieve(**search_params)
        print(f"\n✓ Found {len(properties_df):,} properties of type {prop_type}!", flush=True)

    else:
        raise ValueError("Invalid property type selected. Please try again.")

    # Display results summary
    total_properties = len(properties_df)
    print("\n" + "="*50, flush=True)
    print("📊 RESULTS SUMMARY", flush=True)
    print("="*50, flush=True)
    print(f"✓ Found {total_properties:,} total properties matching your criteria!", flush=True)
    if not properties_df.empty:
        print("\nFirst few properties:")
        preview_cols = ['address', 'city', 'property_type', 'bedrooms', 'bathrooms', 'square_footage']
        print(properties_df[preview_cols].head().to_string())

    print("\nSearch complete! Ready for the next step.", flush=True)

except Exception as e:
    print(f"\n❌ Error during property search: {str(e)}")
    print("Please try again with different search criteria.")
    raise

# Save Results & Prepare for Events 💾

Now that we've found your properties, we can:
1. Save the current results to a CSV file (optional)
2. Get the property IDs ready for event history collection
3. Preview what we found before moving on

This step helps you:
* Keep a record of the properties we found
* Double-check the data before proceeding
* Prepare for collecting historical events

In [None]:
# save the results to a CSV file so we can use it later
if not properties_df.empty:
    by_type = properties_df['property_type'].value_counts()
    print("\nBreakdown by property type:")
    for prop_type, count in by_type.items():
        print(f"  • {prop_type}: {count:,} properties")

    # Get the list of parcl_property_ids for the next step
    property_ids = properties_df['parcl_property_id'].tolist()
    print(f"\n✓ Prepared {len(property_ids):,} property IDs for event collection")

    # Ask if user wants to save current results
    save_choice = input("\nWould you like to save the current property results to CSV? (y/n): ").strip().lower()

    if save_choice in ['y', 'yes']:
        # Create filename with market and timestamp
        from datetime import datetime
        import os
        
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"properties_{selected_market_name.replace(' ', '_')}_{timestamp}.csv"
        save_path = os.path.abspath(filename)
        
        # Save to CSV
        print(f"\n💾 Saving results to:")
        print(f"   {save_path}")
        properties_df.to_csv(filename, index=False)
        print("✓ Results saved successfully!")

        # Show file preview
        print("\nFile preview (first few rows and columns):")
        preview_cols = ['address', 'city', 'property_type', 'bedrooms',
                        'bathrooms', 'square_footage', 'year_built']
        print(properties_df[preview_cols].head().to_string())

# Store key variables for next step
total_properties = len(properties_df)
print(f"\n✓ Ready to collect events for {total_properties:,} properties!")
print("   Note: You can proceed with event collection in the next step.")

# Collect Property Events 📅

Now we'll gather historical events for our properties. You can:
* Choose which types of events to collect (Sales, Rentals, Listings, or All)
* Filter by when events occurred using event dates
* Filter by when event records were last updated in our system
* Filter by other criteria if needed

Available Event Types:
* 🏠 SALE: Property purchase transactions
* 📋 LISTING: Property listing events
* 🔑 RENTAL: Rental listings and leases
* ✨ ALL: Collect all types of events

Event Date vs Record Update Date:
* Event Date: When the actual event (sale, listing, etc.) occurred
* Record Update Date: When our database record of the event was last modified

**Note**: Collecting events for many properties might take a few minutes, especially if you choose "ALL" event types.

In [None]:
def get_date_input(prompt):
    """Helper function to get and validate date input"""
    while True:
        date_str = input(prompt).strip()
        if date_str == '':
            return None
        try:
            # Parse date string and format it
            date = datetime.strptime(date_str, '%Y-%m-%d')
            return date.strftime('%Y-%m-%d')
        except ValueError:
            print("Please use YYYY-MM-DD format (e.g., 2020-01-01) or press Enter to skip")
try:
    print("📅 Let's collect event history for your properties!")
    print(f"Ready to process {len(property_ids):,} properties\n")

    # Get event type preference
    print("Available event types (SELECT ONE): SALE, LISTING, RENTAL, ALL")
    print("Note: 'ALL' will collect all types of events\n")

    event_type = input("Enter event type: SALE, LISTING, RENTAL, ALL: ").strip().upper()
    if event_type not in ['SALE', 'LISTING', 'RENTAL', 'ALL']:
        raise ValueError("Invalid event type selected")

    # Get entity owner filter (optional)
    print("\n🏢 ENTITY OWNER FILTER (optional)")
    print("Available options: AMH, TRICON, INVITATION_HOMES, HOME_PARTNERS_OF_AMERICA,")
    print("PROGRESS_RESIDENTIAL, FIRSTKEY_HOMES, AMHERST, MAYMONT_HOMES,")
    print("VINEBROOK_HOMES, SFR3")
    print("Press Enter to skip if you don't want to filter by owner")
    owner = input("Enter institutional owner: ").strip().upper()

    # Event Date Range (when events occurred)
    print("\n📆 EVENT DATE RANGE (when events occurred)")
    print("Format: YYYY-MM-DD (e.g., 2020-01-01)")
    print("Press Enter to skip if you want all available dates")
    event_start_date = get_date_input("Event start date: ")
    event_end_date = get_date_input("Event end date: ")

    # Record Update Date Range (when our data was updated)
    print("\n🔄 RECORD UPDATE DATE RANGE (when our data was last modified)")
    print("Format: YYYY-MM-DD (e.g., 2024-12-13)")
    print("Note: Records from before December 13, 2024 will show that date")
    record_updated_start = get_date_input("Record update start date: ")
    record_updated_end = get_date_input("Record update end date: ")

    # Prepare event search parameters
    event_params = {
        'parcl_property_ids': property_ids,
        'event_type': event_type
    }

    # Add event date parameters if specified
    if event_start_date:
        event_params['start_date'] = event_start_date
    if event_end_date:
        event_params['end_date'] = event_end_date

    # Add record update date parameters if specified
    if record_updated_start:
        event_params['record_updated_date_start'] = record_updated_start
    if record_updated_end:
        event_params['record_updated_date_end'] = record_updated_end

    if owner in ['AMH', 'TRICON', 'INVITATION_HOMES', 'HOME_PARTNERS_OF_AMERICA',
                'PROGRESS_RESIDENTIAL', 'FIRSTKEY_HOMES', 'AMHERST', 'MAYMONT_HOMES',
                'VINEBROOK_HOMES', 'SFR3']:
        event_params['entity_owner_name'] = owner

    # Collect events
    print("\nProceed with event collection? (y/n): ", end="")
    proceed = input("do you want to proceed? y/n: ").strip().lower()
    if proceed not in ['y', 'yes']:
        print("Event collection cancelled.")
        raise SystemExit

    print("\n🔄 Collecting events...")
    events_df = client.property.events.retrieve(**event_params)

    # Process results
    total_events = len(events_df)
    print(f"\n✓ Found {total_events:,} events!")

    if not events_df.empty:
        # Show breakdown by event type
        print("\nBreakdown by event type:")
        by_event = events_df['event_type'].value_counts()
        for etype, count in by_event.items():
            print(f"  • {etype}: {count:,} events")

        # Show date ranges in data
        event_date_range = events_df['event_date'].agg(['min', 'max'])
        print(f"\nEvent date range: {event_date_range['min']} to {event_date_range['max']}")

        if 'record_updated_date' in events_df.columns:
            update_date_range = events_df['record_updated_date'].agg(['min', 'max'])
            print(f"Record update date range: {update_date_range['min']} to {update_date_range['max']}")

        # Preview of events
        print("\nPreview of recent events:")
        preview_cols = ['event_date', 'event_type', 'event_name', 'price', 'record_updated_date']
        print(events_df.sort_values('event_date', ascending=False)[preview_cols].head().to_string())

        # Ask to save events
        save_choice = input("\nWould you like to save the events to CSV? (y/n): ").strip().lower()
        if save_choice in ['y', 'yes']:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"events_{selected_market_name.replace(' ', '_')}_{timestamp}.csv"

            print(f"\n💾 Saving events to {filename}...")
            events_df.to_csv(filename, index=False)
            print("✓ Events saved successfully!")

        print("\n✓ Event collection complete! Ready for final data combination.")
except Exception as e:
    print(f"❌ Error during event collection: {str(e)}")
    print("Please try again with a different event type or date range.")
    raise

# Combine Property & Event Data 🔄

Now we'll combine the property details with their associated events into a single dataset. This will:
* Merge property characteristics with event history
* Keep all property metadata (bedrooms, bathrooms, square footage, etc.)
* Preserve event details including when events occurred and when records were updated
* Create a comprehensive record for each property-event combination
* Save the final dataset in an Excel-friendly format

In [None]:
try:
    print("🔄 Combining property and event data...")

    # Merge properties with events
    if events_df.shape[0] > 0:
        # Merge on parcl_property_id
        combined_df = pd.merge(
            events_df,
            properties_df,
            on='parcl_property_id',
            how='left'  # Keep all events, even if property details missing
        )

        # Organize columns in a logical order
        # Keep record tracking dates with their respective data
        property_cols = ['record_added_date'] + [col for col in properties_df.columns
                                               if col not in ['parcl_property_id', 'record_added_date']]
        event_cols = ['event_date', 'record_updated_date'] + [col for col in events_df.columns
                                                            if col not in ['parcl_property_id', 'event_date', 'record_updated_date']]
        organized_cols = ['parcl_property_id'] + property_cols + event_cols

        combined_df = combined_df[organized_cols]

        print(f"\n✓ Successfully combined {len(properties_df):,} properties with {len(events_df):,} events!")
        print(f"Final dataset has {len(combined_df):,} rows")

        # Save combined dataset
        save_choice = input("\nWould you like to save the combined dataset to CSV? (y/n): ").strip().lower()
        if save_choice in ['y', 'yes']:
            # Create filename with market and timestamp
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"combined_data_{selected_market_name.replace(' ', '_')}_{timestamp}.csv"
            save_path = os.path.abspath(filename)

            # Save to CSV
            print(f"\n💾 Saving combined dataset to:")
            print(f"   {save_path}")
            combined_df.to_csv(filename, index=False)
            print("✓ Combined dataset saved successfully!")

            # Show preview of final dataset
            print("\nPreview of combined dataset:")
            preview_cols = ['address', 'property_type', 'bedrooms', 'square_footage',
                          'record_added_date', 'event_date', 'record_updated_date',
                          'event_type', 'event_name', 'price']
            print(combined_df[preview_cols].head().to_string())

            print(f"\nℹ️  The combined dataset includes:")
            print(f"  • All property characteristics from the original search")
            print(f"  • Property record addition dates")
            print(f"  • Event details including dates and prices")
            print(f"  • Event record update dates")
            print(f"  • One row per property-event combination")
    else:
        print("\n⚠️  No events found to combine with property data")

except Exception as e:
    print(f"\n❌ Error combining data: {str(e)}")
    print("Please ensure both property and event data are available.")
    raise