<center>
<h1>Welcome to the Lab 🥼🧪</h1>
</center>

### We will learn how to download individual units for a given `parcl id`. 

#### Need help getting started?

As a reminder, you can get your Parcl Labs API key [here](https://dashboard.parcllabs.com/signup) to follow along.

To run this immediately, you can use Google Colab. Remember, you must set your `PARCL_LABS_API_KEY`.

You will need a paid account. 

Run in collab --> [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ParclLabs/parcllabs-cookbook/blob/main/examples/getting_started/bulk_data_download.ipynb)

In [1]:
%pip install --upgrade parcllabs

Note: you may need to restart the kernel to use updated packages.


In [5]:
import os
from datetime import datetime, timedelta
import pandas as pd
from parcllabs import ParclLabsClient

In [10]:
# Instantiate account and make sure we have folder to download data 
client = ParclLabsClient(
    api_key=os.environ.get('PARCL_LABS_API_KEY', "<your Parcl Labs API key if not set as environment variable>"), 
    limit=10 # set default limit
)

# Function to check and create directory if it doesn't exist
def ensure_directory(directory):
    if not os.path.exists(directory):
        os.makedirs(directory)
    return directory

# Create a 'downloads' directory for our CSV files
download_dir = ensure_directory('downloads')

print("Setup complete. You can now use the 'client' object to interact with the Parcl Labs API.")

Setup complete. You can now use the 'client' object to interact with the Parcl Labs API.


In [14]:
# Define the parameters we want to use in the search of properties, in this case we are using Pittasburgh, PA 
# as the market of interest 

# Define the search parameters
search_params = {
    'parcl_ids': [5377717],  # Required
    'property_type': 'SINGLE_FAMILY',  # Required
    #'current_entity_owner_name': 'NULL',  # Specify one of the options or 'NULL'
    'square_footage_min': 4000,
    'square_footage_max': 5000,
    #'bedrooms_min': 2,
    #'bedrooms_max': 4,
    #'bathrooms_min': 2,
    #'bathrooms_max': 3,
    #'year_built_min': 1990,
    #'year_built_max': 2023,
    #'event_history_sale_flag': True,
    #'event_history_rental_flag': False,
    #'event_history_listing_flag': True,
    #'current_new_construction_flag': False,
    #'current_owner_occupied_flag': True,
    #'current_investor_owned_flag': False
}

# We seach the properties in the market we defined above with the parameters that are not commented
# we use try and except to catch any error that might occur during the search
try:
    search_results = client.property.search.retrieve(**search_params)
    print(f"Found {len(search_results)} properties matching the criteria.")
except Exception as e:
    print(f"An error occurred while searching for properties: {str(e)}")
    print("Try adjusting your search parameters or checking your API key.")

Processing Parcl IDs |████████████████████████████████████████| 1/1 [100%] in 1.8s (0.57/s) 
Found 654 properties matching the criteria.


In [24]:
# We are interested in properties that have been sold, so we will filter the search results to only include
# properties that have been sold using the event_history_sale_flag. 
search_results = search_results.query("event_history_sale_flag==True")
print(f"Found {len(search_results)} properties that have been sold.")

Found 24 properties that have been sold.


Alternatively, we can filter the search results to only include properties that have been listed for sale 
using by uncommenting that parameter in the search_params dictionary above. Lets modify the code slightly to only retrieve Single Family Homes with a `event_history_sale_flag` set to true and that were built since 2000 by uncomenting the parameter  `year_built_min` and setting its value to 2000.

In [23]:
# Alternatively, we can filter the search results to only include properties that have been listed for sale 
# using by uncommenting that parameter in the search_params dictionary above.
search_params = {
    'parcl_ids': [5377717],  # Required
    'property_type': 'SINGLE_FAMILY',  # Required
    #'current_entity_owner_name': 'NULL',  # Specify one of the options or 'NULL'
    'square_footage_min': 4000,
    'square_footage_max': 5000,
    #'bedrooms_min': 2,
    #'bedrooms_max': 4,
    #'bathrooms_min': 2,
    #'bathrooms_max': 3,
    'year_built_min': 2000,
    #'year_built_max': 2023,
    'event_history_sale_flag': True,
    #'event_history_rental_flag': False,
    #'event_history_listing_flag': True,
    #'current_new_construction_flag': False,
    #'current_owner_occupied_flag': True,
    #'current_investor_owned_flag': False
}

try:
    search_results = client.property.search.retrieve(**search_params)
    print(f"Found {len(search_results)} properties matching the criteria.")
    print(search_results.head(2))
except Exception as e:
    print(f"An error occurred while searching for properties: {str(e)}")
    print("Try adjusting your search parameters or checking your API key.")

Processing Parcl IDs |████████████████████████████████████████| 1/1 [100%] in 1.5s (0.68/s) 
Found 24 properties matching the criteria.
   parcl_property_id             address  unit        city state_abbreviation  \
0           99430895       300 ANITA AVE  None  PITTSBURGH                 PA   
1           88547870  1543 PARKVIEW BLVD  None  PITTSBURGH                 PA   

    zip5  zip4   latitude  longitude  property_type  bedrooms  bathrooms  \
0  15217  3171  40.423689 -79.917562  SINGLE_FAMILY         4        4.0   
1  15217  2597  40.418775 -79.917150  SINGLE_FAMILY         4        4.0   

   square_footage  year_built  cbsa_parcl_id       cbsa_name  county_parcl_id  \
0            5000        2009        2900251  Pittsburgh, PA          5822911   
1            4132        2012        2900251  Pittsburgh, PA          5822911   

        county_name  city_parcl_id        city_name  zip_parcl_id zip_code  \
0  Allegheny County        5377717  Pittsburgh city       5350774    

Now that we have our subset of homes we can get the sale history associated with each property by using the
`parcl_property_id` in the search_results dataframe and feeding those ids to the parcl labs client.

In [30]:
search_results_ids = search_results['parcl_property_id'].tolist()
try:
    sale_events = client.property.events.retrieve(
        parcl_property_ids=search_results_ids,
        event_type='SALE',
        )
    print(f"Found {len(sale_events)} events matching the criteria.")
    print(sale_events.head(2))
except Exception as e:
    print(f"An error occurred while searching for properties: {str(e)}")
    print("Try adjusting your search parameters or checking your API key.")

|████████████████████████████████████████| 24/24 [100%] in 0.5s (52.16/s) 
Found 36 events matching the criteria.
   parcl_property_id event_date event_type event_name     price  \
0           66128550 2016-08-24       SALE       SOLD       NaN   
1           68822818 2011-02-16       SALE       SOLD  645000.0   

   owner_occupied_flag  new_construction_flag  sale_index investor_flag  \
0                  0.0                      1           2          None   
1                  1.0                      1           2          None   

  entity_owner_name  
0              None  
1              None  


We can also search for all events related to the properties in the search results by modifying the `event_type` parameter when looking for event history.

In [32]:
# Re run results for for all event types 
# Define the event_type to all
event_type = 'ALL'
try:
    all_events = client.property.events.retrieve(
        parcl_property_ids=search_results_ids,
        event_type=event_type,
        )
    print(f"Found {len(all_events)} events matching the criteria.")
    print(all_events.head(2))
except Exception as e:
    print(f"An error occurred while searching for properties: {str(e)}")
    print("Try adjusting your search parameters or checking your API key.")

|████████████████████████████████████████| 24/24 [100%] in 0.3s (75.69/s) 
Found 96 events matching the criteria.
   parcl_property_id event_date event_type event_name     price  \
0           66128550 2016-08-24       SALE       SOLD       NaN   
1           68822818 2011-02-16       SALE       SOLD  645000.0   

   owner_occupied_flag  new_construction_flag  sale_index investor_flag  \
0                  0.0                      1           2          None   
1                  1.0                      1           2          None   

  entity_owner_name  
0              None  
1              None  


In [33]:
# now that we have the sale events, we can download the data to a CSV file
final_data_events = all_events.merge(search_results, on='parcl_property_id', how='left')
print(f"Final data shape: {final_data_events.shape}")

Final data shape: (96, 39)


In [27]:
# Save the event results to a CSV file
events_filename = f'property_events_{event_type}_{datetime.now().strftime("%Y%m%d_%H%M%S")}.csv'
events_file_path = os.path.join(download_dir, events_filename)
final_data_events.to_csv(events_file_path, index=False)
print(f"Event history saved to {events_file_path}")
print(f"Total events retrieved: {len(final_data_events)}")

Final data shape: (36, 39)
