<center>
<h1>Welcome to the Lab 🥼🧪</h1>
</center>

### In this notebook, we will learn how to download individual units for a given `parcl id` and select the events of interest for those units using the Parcl Labs API.


#### Need help getting started?

As a reminder, you can get your Parcl Labs API key [here](https://dashboard.parcllabs.com/signup) to follow along.

To run this immediately, you can use Google Colab. Remember, you must set your `PARCL_LABS_API_KEY`.

You will need a paid account. 

Run in Colab --> [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ParclLabs/parcllabs-cookbook/blob/main/examples/getting_started/property_data_download.ipynb)

In [None]:
%pip install --upgrade parcllabs

After we install the required libraries, we need to load them and instantiate the Parcl Labs client. This will facilitate the process of searching, retrieving, and formatting the data for us. As a reminder, while you can simply enter your `API_KEY`, it is recommended that you save it as an environment variable to make it more secure. If you are using Colab, you can follow these [steps](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75).

In [None]:
import os
from datetime import datetime, timedelta
import pandas as pd
from parcllabs import ParclLabsClient

In [None]:
# Instantiate the client  and make sure we have a folder to download the data 
client = ParclLabsClient(
    api_key=os.environ.get('PARCL_LABS_API_KEY', "<your Parcl Labs API key if not set as environment variable>"), 
    limit=10 # set default limit
)

# Function to check and create directory if it doesn't exist
def ensure_directory(directory):
    if not os.path.exists(directory):
        os.makedirs(directory)
    return directory

# Create a 'downloads' directory for our CSV files
download_dir = ensure_directory('downloads')

print("Setup complete. You can now use the 'client' object to interact with the Parcl Labs API.")

With the client object, you can now interact with the Parcl Labs API. The client object has multiple methods that you can use to download data. You can find more information about the methods for the Parcl Labs Client [here](https://github.com/ParclLabs/parcllabs-python). In this case, we are interested in the [property search endpoint](https://docs.parcllabs.com/reference/search_v1_property_search_get), and for this example, we are getting all the single-family homes in the city of Pittsburgh, PA (`parcl_id`: `5377717`) that are between 4,000 and 5,000 square feet. In the example below, we define the necessary parameters to search for this information and then call the Parcl Labs Client to download the data for us. You will notice that we have a handful of additional parameters commented out; those help us narrow down our search, but for now, let's use the `parcl_ids`, `property_type`, `square_footage_min`, and `square_footage_max`.

If you are interested in other markets you can search them [following these steps](https://github.com/ParclLabs/parcllabs-cookbook/blob/main/examples/getting_started/search.ipynb).


In [None]:
# Define the parameters we want to use in the search for properties; in this case, we are using Pittsburgh, PA 
# as the market of interest parcl_id 5377717.

# Define the search parameters
search_params = {
    'parcl_ids': [5377717],  # Required
    'property_type': 'SINGLE_FAMILY',  # Required
    #'current_entity_owner_name': 'NULL',  # Specify one of the options or 'NULL'
    'square_footage_min': 4000,
    'square_footage_max': 5000,
    #'bedrooms_min': 2,
    #'bedrooms_max': 4,
    #'bathrooms_min': 2,
    #'bathrooms_max': 3,
    #'year_built_min': 1990,
    #'year_built_max': 2023,
    #'event_history_sale_flag': True,
    #'event_history_rental_flag': False,
    #'event_history_listing_flag': True,
    #'current_new_construction_flag': False,
    #'current_owner_occupied_flag': True,
    #'current_investor_owned_flag': False
}

# We search for properties in the market we defined above using the parameters that are not commented out.
# We use try and except to catch any errors that might occur during the search.
# we can pass the search_params dictionary to the retrieve method to get the search results using **search_params
search_results = client.property.search.retrieve(**search_params)

print(f"Found {len(search_results)} properties matching the criteria.")

Our call was successful, and we can see that we found several properties of interest. Now, if we want to focus only on properties that have sales information, we can filter those properties by using the `event_history_sale` field and setting it to true.

In [None]:
# We are interested in properties that have been sold, so we will filter the search results to include
# only properties that have been sold using the event_history_sale_flag.
search_results = search_results.query("event_history_sale_flag == True")

print(f"Found {len(search_results)} properties that have been sold.")

Alternatively, we can instruct the Parcl Labs client to return only properties that match the criteria we defined above (single-family homes between 4,000 and 5,000 square feet in Pittsburgh) and have been sold by uncommenting the `event_history_sale_flag` parameter in the `search_params` dictionary and setting it to `true`. But let's narrow our filters even further by also looking at houses that were built since the year 2000 by uncommenting the `year_built_min` parameter and setting its value to 2000. The full list of parameters with a detailed explanation can be found [here](https://docs.parcllabs.com/reference/search_v1_property_search_get).

In [None]:
# Alternatively, we can filter the search results to include only properties that have been listed for sale 
# by uncommenting that parameter in the search_params dictionary.
search_params = {
    'parcl_ids': [5377717],  # Required
    'property_type': 'SINGLE_FAMILY',  # Required
    #'current_entity_owner_name': 'NULL',  # Specify one of the options or 'NULL'
    'square_footage_min': 4000,
    'square_footage_max': 5000,
    #'bedrooms_min': 2,
    #'bedrooms_max': 4,
    #'bathrooms_min': 2,
    #'bathrooms_max': 3,
    'year_built_min': 2000,
    #'year_built_max': 2023,
    'event_history_sale_flag': True,
    #'event_history_rental_flag': False,
    #'event_history_listing_flag': True,
    #'current_new_construction_flag': False,
    #'current_owner_occupied_flag': True,
    #'current_investor_owned_flag': False
}

# we can pass the search_params dictionary to the retrieve method to get the search results using **search_params
search_results = client.property.search.retrieve(**search_params)

print(f"Found {len(search_results)} properties matching the criteria.")
print(search_results.head(2))

The newly added parameters narrowed the results. Now, if we are interested in retrieving the sales information for those homes, we can use the `parcl_property_id` field to get the sales information. The client has a different method (`client.property.events.retrieve`) to retrieve events associated with a home. We will use that method to retrieve sales for the homes we found in the search results. We need to pass the `parcl_property_id` of each home to a list and feed it to the client.

In [None]:
# Pass the parcl_property_ids from the search results to a list named search_results_ids to retrieve the sale events 
# for those properties.
search_results_ids = search_results['parcl_property_id'].tolist()

# Define the parameters we want to use in the search for property events.
property_events_parameters = {
    'parcl_property_ids': search_results_ids,
    'event_type': 'SALE',
    #'entity_owner_name': 'NULL',
    #'start_date': '2020-01-01',
    #'end_date': '2021-01-01',
}

# Call the client with the list of property ids and the event_type as 'SALE' to retrieve the sale events for the properties.
# we can pass the search_params dictionary to the retrieve method to get the search results using **property_events_parameters
sale_events = client.property.events.retrieve(
    **property_events_parameters
    )

print(f"Found {len(sale_events)} events matching the criteria.")
print(sale_events.head(2))

Just as in the case of property search, we can also modify parameters to get more information about a particular home. For a detailed list of parameters, you can see the documentation [here](https://docs.parcllabs.com/reference/property_events_v1_property_event_history_post). In the example below, we will modify the `event_type` parameter from `SALE` to `ALL`.

In [None]:
# Re-run the results for all event types by modifying the event_type parameter to 'ALL'
property_events_parameters = {
    'parcl_property_ids': search_results_ids,
    'event_type': 'ALL',
    #'entity_owner_name': 'NULL',
    #'start_date': '2020-01-01',
    #'end_date': '2021-01-01',
    }

all_events = client.property.events.retrieve(
    **property_events_parameters
    )

print(f"Found {len(all_events)} events matching the criteria.")
print(all_events.head(2))

With the new parameters, we now get all the events associated with the selected properties, including sales, listings, and rentals. If we want to look at events since 2022, we can simply uncomment the `start_date` parameter and set it to '2022-01-01'.

In [None]:
# Re-run the results for all event types and modify the event_type parameter to 'ALL'
property_events_parameters = {
    'parcl_property_ids': search_results_ids,
    'event_type': 'ALL',
    #'entity_owner_name': 'NULL',
    'start_date': '2022-01-01',
    #'end_date': '2023-01-01',
}

all_events = client.property.events.retrieve(
    **property_events_parameters
    )

print(f"Found {len(all_events)} events matching the criteria.")
print(all_events.head(2))

When you are ready to save your data for you can use the following code to save the data to a CSV file.

In [None]:
# Now that we have the sale events, we can download the data to a CSV file
final_data_events = all_events.merge(search_results, on='parcl_property_id', how='left')

print(f"Final data shape: {final_data_events.shape}")

In [None]:
# Save the event results to a CSV file using today's date in the filename for easier tracking
events_filename = f'pittsburgh_property_events_all_events_{datetime.now().strftime("%Y-%m-%d")}.csv'
events_file_path = os.path.join(download_dir, events_filename)
final_data_events.to_csv(events_file_path, index=False)

print(f"Event history saved to {events_file_path}")
print(f"Total events retrieved: {len(final_data_events)}")