# Welcome to the Lab 🥼🧪

## Introduction

In this notebook, we will go over how to do a simple housing stock analysis. We will be explicitly addressing the following questions:
- Which markets have the highest/lowest percentage of single family homes?
- Which markets have seen the greatest increase/decrease in the percentage of single family home development out of all construction in the last 5 years?

**Note** This notebook will work with any of the 70k+ markets supported by the Parcl Labs API.

As a reminder, you can get your Parcl Labs API key [here](https://dashboard.parcllabs.com/signup) to follow along. 

To run this immediately, you can use Google Colab. Remember, you must set your `PARCL_LABS_API_KEY` as a secret. See this [guide](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75) for more information.

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ParclLabs/parcllabs-examples/blob/main/python/introduction.ipynb)

In [None]:
# Environment setup
import os
import sys
import subprocess
from datetime import datetime

# Collab setup from one click above
if "google.colab" in sys.modules:
    from google.colab import userdata
    %pip install parcllabs plotly kaleido
    api_key = userdata.get('PARCL_LABS_API_KEY')
else:
    api_key = os.getenv('PARCL_LABS_API_KEY')

In [None]:
import parcllabs
import pandas as pd
import plotly.express as px
from parcllabs import ParclLabsClient

print(f"Parcl Labs Version: {parcllabs.__version__}")

Parcl Labs Version: 0.1.16


In [None]:
# Set up client
client = ParclLabsClient(api_key=api_key)

In [None]:
# lets get the top 50 metros by popution size
top_50_metros = client.search_markets.retrieve(
    location_type='CBSA',
    sort_by='TOTAL_POPULATION',
    sort_order='DESC',
    params={
        'limit': 50
    },
    as_dataframe=True
)

top_50_metros.head()

Unnamed: 0,parcl_id,country,geoid,state_fips_code,name,state_abbreviation,region,location_type,total_population,median_income,parcl_exchange_market,pricefeed_market,case_shiller_10_market,case_shiller_20_market
0,2900187,USA,35620,,"New York-Newark-Jersey City, Ny-Nj-Pa",,,CBSA,19908595,93610,0,1,1,1
1,2900078,USA,31080,,"Los Angeles-Long Beach-Anaheim, Ca",,,CBSA,13111917,89105,0,1,1,1
2,2899845,USA,16980,,"Chicago-Naperville-Elgin, Il-In-Wi",,,CBSA,9566955,85087,0,1,1,1
3,2899734,USA,19100,,"Dallas-Fort Worth-Arlington, Tx",,,CBSA,7673379,83398,0,1,0,1
4,2899967,USA,26420,,"Houston-The Woodlands-Sugar Land, Tx",,,CBSA,7142603,78061,0,1,0,0


In [None]:
# let's set aside the NY MSA parcl_id for analysis
ny_msa_parcl_id = top_50_metros.iloc[0]['parcl_id']
ny_msa_parcl_id

2900187

In [None]:
# let's set aside all top markets as well
top_market_ids = top_50_metros['parcl_id'].tolist()

#### Retrieve Housing Stock for a Single Market

In [None]:
# let's start with the basics, let's get the breakdown of housing stock in New York Metro. 
# Housing stock is the mix of condos, single family homes, townhomes in a market. This mix changes all the time. Urban
# areas will get built out creating denser concentration of units. Covid caused a suburban shock, increasing the velocity of 
# suburban home developments. Assuming a fixed denominator in housing is a mistake. 

housing_stock_ny_msa = client.market_metrics_housing_stock.retrieve(
    parcl_id=ny_msa_parcl_id,
    params={
        'limit': 1 # let's get the most recent stock
    },
    as_dataframe=True # make life easy on ourselves
)

housing_stock_ny_msa

Unnamed: 0,date,single_family,condo,townhouse,other,all_properties,parcl_id
0,2024-03-01,2802362,957017,76608,1583688,5419675,2900187


#### Retrieve Housing Stock for Many Markets

In [None]:
# as of March, 2024, there are 5.4 million units within NY Metro, 2.8 million of which are single family homes, 
# and a million are condos. 

# let's see how this mix compares to other metros on a proportional basis. 
housing_stock = client.market_metrics_housing_stock.retrieve_many(
    parcl_ids=top_market_ids,
    params={
        'limit': 1 # let's get most recent again
    },
    as_dataframe=True
)

housing_stock.head()

|████████████████████████████████████████| 927/927 [100%] in 2:27.1 (6.30/s) 


Unnamed: 0,date,single_family,condo,townhouse,other,all_properties,parcl_id
0,2024-03-01,2802362,957017,76608,1583688,5419675,2900187
1,2024-03-01,1997656,858838,19719,555526,3431739,2900078
2,2024-03-01,2017899,768604,123514,588914,3498931,2899845
3,2024-03-01,1921281,457709,41641,373657,2794288,2899734
4,2024-03-01,1765489,383733,33520,355410,2538152,2899967


In [None]:
# add names back
housing_stock = pd.merge(housing_stock, top_50_metros, on='parcl_id')
housing_stock.head()

In [None]:
# let's focus on mix of single family homes, condos, and townhouses
housing_stock['pct_single_family'] = housing_stock['single_family']/housing_stock['all_properties']
housing_stock['pct_condo'] = housing_stock['condo']/housing_stock['all_properties']
housing_stock['pct_townhouse'] = housing_stock['townhouse']/housing_stock['all_properties']

In [None]:
# which market has the highest percentage of single family homes?
housing_stock.sort_values('pct_single_family', ascending=False).head(5)
# Oklahoma, Sacramento, Freso, Richmond, and Indianopolis all have over 75% of the mix allocated towards
# single family homes

In [None]:
# which markets have the smallest percentage of single family homes? 
housing_stock.sort_values('pct_single_family').head(5)
# Miami, Boston, Washington DC, Baltimore, and New York all approximately under 50% single family homes. 

# why is this important? Indices like the Case Shiller Index only track single family homes. They are leaving out a lot of the activity

#### Retrieve Housing Stock for Many Markets Over Time

In [None]:
# now lets see how this has changed over the last 5 years, by market.
# lets find the market that has the greatest share increase in Single Family Homes over the last 
# 5 years and the greatest decline in the proportion of single family homes

start_date = '2019-01-01'
end_date = '2024-04-01'
housing_stock_hist = client.market_metrics_housing_stock.retrieve_many(
    parcl_ids=top_market_ids,
    start_date=start_date,
    end_date=end_date,
    params={
        'limit': 200 # let's expand the limit to collect all observations in one call
    },
    as_dataframe=True
)

housing_stock_hist.head()
# add names
housing_stock_hist = pd.merge(housing_stock_hist, top_50_metros, on='parcl_id')

In [None]:
# recalc percentages
housing_stock_hist['pct_single_family'] = housing_stock_hist['single_family']/housing_stock_hist['all_properties']
housing_stock_hist['pct_condo'] = housing_stock_hist['condo']/housing_stock_hist['all_properties']
housing_stock_hist['pct_townhouse'] = housing_stock_hist['townhouse']/housing_stock_hist['all_properties']

In [None]:
# get the first value at 2019-01-01
hs_first = housing_stock_hist.loc[housing_stock_hist['date'] == start_date][['parcl_id', 'pct_single_family', 'pct_condo', 'pct_townhouse']]
hs_first = hs_first.rename(
    columns={
    'pct_single_family': 'pct_single_family_start',
    'pct_condo': 'pct_condo_start',
    'pct_townhouse': 'pct_townhouse_start'
    }
)

In [None]:
# join with full history
housing_stock_hist_v2 = pd.merge(housing_stock_hist, hs_first, on='parcl_id')
housing_stock_hist_v2.head()

In [None]:
# going back to our original question, which has had the highest increase in single family home percentage
housing_stock_hist_v2['pct_single_family_delta'] = housing_stock_hist_v2['pct_single_family']-housing_stock_hist_v2['pct_single_family_start']

housing_stock_hist_v2.loc[housing_stock_hist_v2['date'] == '2024-03-01'].sort_values('pct_single_family_delta', ascending=False)[['name', 'pct_single_family_delta']].head(5)

In [None]:
# Dallas, Austin, Jacksonville, Las Vegas, and Orlando have added over 50 basis points of the proportion of single family homes.
# of all development in these markets, single family homes have increased their share by 50 basis points. 
# consumers of these markets, or at least their is a thesis, that consumers in these markets particularly enjoy single family
# homes over other types of housing stock

In [None]:
# what about the inverse? 
housing_stock_hist_v2.loc[housing_stock_hist_v2['date'] == '2024-03-01'].sort_values('pct_single_family_delta', ascending=True)[['name', 'pct_single_family_delta']].head(5)

In [None]:
# Charlotte, Seattle, Salt Lake City, Nashville, Boston have decreased their share of single family homes 
# over the last 5 years relative to all new construction being built in these markets