<center>
<h1>Welcome to the Lab 🥼🧪</h1>
</center>

### We will learn how to bulk pull metrics and load as csv files or sql tables for thousands of markets across the country. 

#### Need help getting started?

As a reminder, you can get your Parcl Labs API key [here](https://dashboard.parcllabs.com/signup) to follow along.

To run this immediately, you can use Google Colab. Remember, you must set your `PARCL_LABS_API_KEY`.

You will need a paid account. 

Run in collab --> [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ParclLabs/parcllabs-cookbook/blob/main/examples/getting_started/bulk_data_download.ipynb)

### Initial Setup

In [None]:
# if needed, install and/or upgrade to the latest verison of the Parcl Labs Python library
%pip install --upgrade parcllabs

In [None]:
import os
import pandas as pd
from parcllabs import ParclLabsClient

client = ParclLabsClient(
    api_key=os.environ.get('PARCL_LABS_API_KEY', "<your Parcl Labs API key if not set as environment variable>"), 
    limit=12 # set default limit
)

### Define Market Criteria

In [None]:
# get all metros's
# in this case, lets look at US market overall
metros = client.search.markets.retrieve(
    sort_by='TOTAL_POPULATION',
    sort_order='DESC',
    location_type='CBSA',
    limit=1000,
    auto_paginate=True
)

In [None]:
# get top 1000 most populous zipcodes
zipcodes = client.search.markets.retrieve(
    location_type='ZIP5',
    limit=1000,
    sort_by='TOTAL_POPULATION',
    # auto_paginate=True # if you want to get all zipcodes, set this to true
)

In [None]:
# prepare one metadata table for metros and zipcodes
# this will allow you to do cross sectional analysis on income, population, etc. 
market_metadata = pd.concat([metros, zipcodes])

In [None]:
# join zips and metros together to do one pull of all listings
parcl_market_ids = metros['parcl_id'].tolist() + zipcodes['parcl_id'].tolist()
len(parcl_market_ids) # traversing 1000 most populous zip codes, and 927 metros/micro markets nationwide

### Pull Down Data

We are going to keep a tight scope, all active inventory and inventory with changing prices

In [None]:
# set the analysis start date
START_DATE = '2023-01-01'

In [None]:
# get for sale listings -- weekly metric
active_listings = client.for_sale_market_metrics.for_sale_inventory.retrieve(
    parcl_ids=parcl_market_ids,
    property_type='ALL_PROPERTIES', # can swap this with SINGLE_FAMILY, CONDO or TOWNHOUSE
    start_date=START_DATE # once you load into an internal system, will use this to do an incremental pull
)

In [None]:
# get for sale listings -- weekly metric
sfh_active_listings = client.for_sale_market_metrics.for_sale_inventory.retrieve(
    parcl_ids=parcl_market_ids,
    property_type='SINGLE_FAMILY', 
    start_date=START_DATE # once you load into an internal system, will use this to do an incremental pull
)

In [None]:
# get for sale listings -- weekly metric
condo_active_listings = client.for_sale_market_metrics.for_sale_inventory.retrieve(
    parcl_ids=parcl_market_ids,
    property_type='CONDO', # townhouse is another option, would follow the same pattern
    start_date=START_DATE # once you load into an internal system, will use this to do an incremental pull
)

In [None]:
# we now have weekly active listings for all metros and zipcodes in one file. 
# would recommend loading this directly as an augmentation table to your internal system and keeping the market metadata separate. 
active_listings_by_type = pd.concat([active_listings, sfh_active_listings, condo_active_listings])

In [None]:
# now we have one datafile with all active listings, all single family home active listings, and all condo active listings
# for every week dating back to 1/1/2023. 
active_listings_by_type.head()

In [None]:
# let's enrich this with price changes to act as a leading indicator for distressed seller signals
price_changes = client.for_sale_market_metrics.for_sale_inventory_price_changes.retrieve(
    parcl_ids=parcl_market_ids,
    property_type='ALL_PROPERTIES', # can swap this with SINGLE_FAMILY, CONDO or TOWNHOUSE
    start_date=START_DATE # once you load into an internal system, will use this to do an incremental pull
)

In [None]:
sfh_price_changes = client.for_sale_market_metrics.for_sale_inventory_price_changes.retrieve(
    parcl_ids=parcl_market_ids,
    property_type='SINGLE_FAMILY', # can swap this with SINGLE_FAMILY, CONDO or TOWNHOUSE
    start_date=START_DATE # once you load into an internal system, will use this to do an incremental pull
)

In [None]:
condo_price_changes = client.for_sale_market_metrics.for_sale_inventory_price_changes.retrieve(
    parcl_ids=parcl_market_ids,
    property_type='CONDO', # can swap this with SINGLE_FAMILY, CONDO or TOWNHOUSE
    start_date=START_DATE # once you load into an internal system, will use this to do an incremental pull
)

In [None]:
all_price_changes = pd.concat([price_changes, sfh_price_changes, condo_price_changes])
all_price_changes.head()

### Prepare data export

In [None]:
# we now have three files, one is market metadata, one on inventory for all actives, single family actives, and condos, and another on seller behavior via price changes, days between price changes, etc. We will store these as three separate files, which can all be efficiently joined via the parcl_id index. 

# to save as flat files, uncomment:
# market_metadata.to_csv('market_metadata.csv', index=False)
# active_listings_by_type.to_csv('active_listings.csv', index=False)
# all_price_changes.to_csv('price_changes.csv', index=False)

In [None]:
# to save straight to a database, uncomment and modify the connection string:
# import sqlalchemy
# engine = sqlalchemy.create_engine('postgresql://user:password@localhost:5432/database')
# market_metadata.to_sql('market_metadata', engine, if_exists='replace', index=False)
# active_listings_by_type.to_sql('active_listings', engine, if_exists='replace', index=False)
# all_price_changes.to_sql('price_changes', engine, if_exists='replace', index=False)