<center>
<h1>Welcome to the Lab 🥼🧪</h1>
</center>

## How to identify markets that could disrupt the US?

In this notebook, we will be looking for markets that are outpacing supply growth nationwide to look for the needle in the haystack on markets changing faster than the US. We will look for the following criteria:
- Markets with a large, trending skew in supply & demand growth where supply is substantially outpacing demand
- Markets with signals for motivated sellers, specifically looking at the ratio of all inventory experiencing price drops
- Markets that appreciated significantly since COVID, yet have not given back any of those price gains

The notebook is broken up into the following sections:
1. [Import required packages and setup the Parcl Labs API key](#1-import-required-packages-and-setup-the-parcl-labs-api-key)
2. [Search for markets](#2-search-for-markets)
3. [Get the data](#3-retrieve-the-data)
4. [Initial data preparation](#4-initial-data-preparation)
5. [Supply & demand skew](#5-supply--demand-skew)
6. [Active supply price drops](#6-new-construction-impact-on-supply)
7. [New construction impact on supply](#7-active-supply-price-drops)
8. [Appreciation since COVID](#8-appreciation-since-covid)
9. [Real time price check](#9-real-time-price-check)

#### What will you create in this notebook?

##### Understand changes in supply and Demand YoY
<p align="center">
  <img src="../../../images/changes_supply_yoy_scatter.png" alt="Alt text">
</p>

##### Understanding gaps in supply and demand

<p align="center">
  <img src="../../../images/changes_supply_yoy_bar.png" alt="Alt text">
</p>

#### Prices since beginning of COVID-19 
<p align="center">
  <img src="../../../images/pct_change_home_values_since_covid_line_chart.png" alt="Alt text">
</p>

#### Need help getting started?

As a reminder, you can get your Parcl Labs API key [here](https://dashboard.parcllabs.com/signup) to follow along.

To run this immediately, you can use Google Colab. Remember, you must set your `PARCL_LABS_API_KEY`.

Run in collab --> [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ParclLabs/parcllabs-cookbook/blob/main/examples/experimental/supply_and_demand/markets_that_could_disrupt.ipynb)

### 1. Import required packages and setup the Parcl Labs API key

In [None]:
# if needed, install and/or upgrade to the latest verison of the Parcl Labs Python library
%pip install --upgrade parcllabs nbformat

In [2]:
import os
import pandas as pd
import plotly.express as px
from datetime import timedelta
import plotly.graph_objects as go
from parcllabs import ParclLabsClient
from parcllabs.beta.charting.styling import SIZE_CONFIG
from parcllabs.beta.ts_stats import TimeSeriesAnalysis
from parcllabs.beta.charting.utils import create_labs_logo_dict
from parcllabs.beta.charting.utils import (
    create_labs_logo_dict,
    save_figure,
    )
from parcllabs.beta.charting.styling import default_style_config as style_config


client = ParclLabsClient(
    api_key=os.environ.get('PARCL_LABS_API_KEY', "<your Parcl Labs API key if not set as environment variable>"), 
    limit=1000, 
    turbo_mode=True # set turbo mode to True
)

In [3]:
# define root dir to save assets
ROOT_DIR = "../../../outputs" # Replace with your own directory 
ANALYSIS_MONTHLY_SERIES = '9/1/2024'

### 2. Search for markets

In [4]:
# Retrieve top 100 metro markets, sorted by total population in descending order
metros = client.search.markets.retrieve(
    sort_by='TOTAL_POPULATION',  # Sort by total population
    sort_order='DESC',           # In descending order
    location_type='CBSA',        # Location type set to Core Based Statistical Area (CBSA)
    limit=100                    # Limit results to top 200 metros
)

# Retrieve national data for the United States to use as a benchmark
us = client.search.markets.retrieve(
    query='United States',  # Query for the United States as a whole
    limit=1                 # Limit results to one (national-level data)
)

# Concatenate metro market data with national data for comparison
markets = pd.concat([metros, us])

In [5]:
# Lets move the PARCL_ID of our metros to a list so we can retrieve the data
market_parcl_ids = markets['parcl_id'].tolist()

### 3. Retrieve the Data

In [None]:
# Retrieve different datasets from the SDK endpoints.
# Capturing weekly supply, demand, and price metrics for 200 metros across the country.

# Define the start date for supply and demand data
start_date = '2022-09-01'


# Retrieve the supply (for-sale inventory) data for the market starting from the specified date
supply_df = client.for_sale_market_metrics.for_sale_inventory.retrieve(
    parcl_ids=market_parcl_ids,
    auto_paginate=True,
    start_date=start_date,
)

# Retrieve the demand data (housing event counts) for the market starting from the specified date
demand_df = client.market_metrics.housing_event_counts.retrieve(
    parcl_ids=market_parcl_ids,
    auto_paginate=True,
    start_date=start_date,
)

# Retrieve the price data (housing event prices) for the market starting from Sept 2022
prices_df = client.market_metrics.housing_event_prices.retrieve(
    parcl_ids=market_parcl_ids,
    auto_paginate=True,
    start_date='2020-03-01',  # Different start date to capture historical price trends
)

In [None]:
# Combine the retrieved data lists into DataFrames
# Output the length of each DataFrame to understand the volume of data retrieved
print(f'Length of supply data: {len(supply_df)}, prices data: {len(prices_df)}, and demand data: {len(demand_df)}')

# Output the number of unique 'parcl_id' values in each DataFrame to check for coverage across different markets
print(f'There are {len(supply_df.parcl_id.unique())} unique parcl_ids in the supply data, '
      f'{len(prices_df.parcl_id.unique())} unique parcl_ids in the prices data, and '
      f'{len(demand_df.parcl_id.unique())} unique parcl_ids in the demand data')


In [None]:
# Check the date range of the data 
print(prices_df['date'].max())
print(demand_df['date'].max())
print(supply_df['date'].max())

The `supply_df` dataframe contains all the inventory available for sale across all our markets and its weekly. The `prices_df` dataframe contains information about the median price for sales, listings, and the standard deviation of prices on a monthly basis. The `demand_df` dataframe provides details about the number of events that occurred in the market, including new listings, sales, and units offered for rent on a monthly basis. This information constitutes the first step in our analysis, helping us understand the dynamics of supply and demand alongside price trends.

We also need information on price cuts. For this, we will use the `SDF`, specifically the `for_sale_market_metrics.for_sale_inventory_price_changes` method of our client. This endpoint will retrieve price cuts across all types of properties. This endpoint is updated weekly.


In [None]:
# Retrieve price changes in inventory for the market starting from the specified date
price_changes_df = client.for_sale_market_metrics.for_sale_inventory_price_changes.retrieve(
    parcl_ids=market_parcl_ids,        # Specify the market by its parcl_id
    auto_paginate=True,
    start_date=start_date     # Use the same start date defined earlier for consistency
)

In [None]:
# Output the length of the price changes DataFrame to verify the amount of data retrieved
print(f'Length of price changes data: {len(price_changes_df)}')

# Output the number of unique 'parcl_id' values in the price changes DataFrame to ensure market coverage
print(f'There are {len(price_changes_df.parcl_id.unique())} unique parcl_ids in the price changes data')

Now that we have our data we can start our analysis.

### 4. Initial data preparation

In [None]:
# Calculate monthly supply and percentage of price drops
# Note: Supply data is weekly, and price changes are weekly, so we resample both to a monthly frequency

supply_monthly = (
    supply_df.copy(deep=True)  # Create a deep copy of the supply DataFrame to avoid modifying the original data
    
    # Merge with price_changes_df on 'parcl_id' and 'date' to include price drop data for each market
    .merge(price_changes_df[['parcl_id', 'date', 'count_price_drop']], on=['parcl_id', 'date'])
    
    # Add new columns for percentage of price drops and resample dates to monthly
    .assign(
        pct_price_drops=lambda df: df['count_price_drop'] / df['for_sale_inventory'],  # Calculate percentage of price drops out of total suply
        date=lambda df: df['date'].dt.to_period('M').dt.to_timestamp()  # Convert the 'date' to monthly frequency
    )
    
    # Group the data by 'parcl_id' and 'date' (now monthly) and calculate the median
    .groupby(['parcl_id', 'date'])
    .agg({
        'for_sale_inventory': 'median',     # Calculate the median inventory for each market and month
        'pct_price_drops': 'median'         # Calculate the median percentage of price drops
    })
    
    # Reset the index to return a flat DataFrame
    .reset_index()

    # Calculate the mean percentage of price drops for each month for all markets
    .assign(
        pct_price_drops_mean=lambda df: df.groupby('date')['pct_price_drops'].transform('mean'),
    )

)

# Output the length of the final monthly supply DataFrame to verify the amount of data
print(f'Length of monthly supply data: {len(supply_monthly)}')

# Output the number of unique 'parcl_id' values in the monthly supply data to verify market coverage
print(f'There are {len(supply_monthly.parcl_id.unique())} unique parcl_ids in the monthly supply data')


In [12]:
# Calculate monthly supply and percentage of price drops
# Note: Supply data is weekly, and price changes are weekly, so we resample both to a monthly frequency

supply_weekly = (
    supply_df.copy(deep=True)  # Create a deep copy of the supply DataFrame to avoid modifying the original data
    # Add new columns for percentage of price drops and resample dates to monthly
    # Merge with price_changes_df on 'parcl_id' and 'date' to include price drop data for each market
    .merge(price_changes_df[['parcl_id', 'date', 'count_price_drop']], on=['parcl_id', 'date'])
    .assign(
        pct_price_drops=lambda df: df['count_price_drop'] / df['for_sale_inventory'])    
    # Add new columns for percentage of price drops and resample dates to monthly
    )


In [None]:
# Merge the monthly supply data (with price drops) with the demand data
# Note: Demand data is already in a monthly series, so we can directly join the datasets on 'parcl_id' and 'date'

supply_demand_data = (
    demand_df.copy(deep=True)
    .loc[:,['date', 'parcl_id', 'sales']]  # Select relevant columns from the demand DataFrame (date, parcl_id, and sales)
    .merge(supply_monthly,                    # Merge with the supply_monthly DataFrame that includes supply and price drop data
           on=['date', 'parcl_id'])           # Join on 'date' and 'parcl_id' to align data across markets and time periods
)

# Output the length of the combined supply and demand DataFrame to verify data consistency
print(f'Length of supply_demand_data: {len(supply_demand_data)}')

# Output the number of unique 'parcl_id' values to check how many markets are covered in the merged dataset
print(f'There are {len(supply_demand_data.parcl_id.unique())} unique parcl_ids in the supply_demand_data')


This new dataframe provides us with a snapshot of market status including the price cuts, share of inventory for sale with price cuts as well as sales activity. Next step involves calculating imbalances between supply and demand. The key idea is that with the data we have so far we can identify players with dwindling demand and price drop pressure. 

### 5. Supply & demand skew

In [None]:
# Sort the DataFrame by 'parcl_id' and 'date' to ensure chronological order for percentage change calculations
supply_demand_df_imbalances = (
    supply_demand_data.copy(deep=True)  # Create a deep copy of the supply_demand_data DataFrame to avoid modifying the original data
    .sort_values(['parcl_id', 'date'])  # Sort by 'parcl_id' and 'date'
    
    .assign(
        # Calculate percentage change in 'sales' over 12 periods (1 year) for each 'parcl_id'
        pct_change_demand=lambda df: df.groupby('parcl_id')['sales'].pct_change(periods=12),
       
        # Calculate percentage change in 'for_sale_inventory' over 12 periods for each 'parcl_id'
        pct_change_supply=lambda df: df.groupby('parcl_id')['for_sale_inventory'].pct_change(periods=12),
        
        # Calculate a 3-month moving average of percentage change in demand ('pct_change_demand')
        ma_pct_change_demand=lambda df: df.groupby('parcl_id')['pct_change_demand']
                                           .transform(lambda x: x.rolling(window=3).mean()),
        
        # Calculate a 3-month moving average of percentage change in supply ('pct_change_supply')
        ma_pct_change_supply=lambda df: df.groupby('parcl_id')['pct_change_supply']
                                           .transform(lambda x: x.rolling(window=3).mean())
                        
        # Drop rows with missing values in the calculated columns
        )
    .dropna(subset=['pct_change_demand', 'pct_change_supply', 'ma_pct_change_demand', 'ma_pct_change_supply'])
    .assign(
        gap_demand_supply=lambda df: df['ma_pct_change_supply'] - df['ma_pct_change_demand']   
        )
    .sort_values('gap_demand_supply', ascending=False)
    )
print(f'length of supply_demand_df_imbalances df is {len(supply_demand_df_imbalances)}')
print(f'there are {len(supply_demand_df_imbalances.parcl_id.unique())} unique parcl_ids in the supply_demand_df_imbalances data')


In [15]:
# Clean up the 'markets' DataFrame by extracting the state and cleaning the market names
markets = (
    markets.assign(
        # Extract the state from the 'name' column by splitting on commas and hyphens, then standardizing it
        state=lambda df: df['name'].apply(lambda x: x.split(',')[-1].strip().upper().split('-')[0]),

        # Create a 'clean_name' by extracting the first part of 'name' and appending the state
        clean_name=lambda df: df.apply(
            lambda x: f"{x['name'].split('-')[0].split(',')[0].strip()}, {x['state']}", axis=1
        )
    )
    # Replace 'United States Of America, UNITED STATES OF AMERICA' with 'USA'
    .replace({'clean_name': {'United States Of America, UNITED STATES OF AMERICA': 'USA'}})
)


In [16]:
# Filter the supply_demand_imbalance DataFrame to get data for the most recent date defined in ANALYSIS_MONTHLY_SERIES
# merge with the 'markets' DataFrame, and filter based on specific conditions.
# define us parcl_id
usa_parcl_id = us["parcl_id"].values[0]
max_date = pd.to_datetime(ANALYSIS_MONTHLY_SERIES)

# get latest month of imbalanced data
supply_demand_imbalance_last = (
    supply_demand_df_imbalances.copy(deep=True)  # Create a deep copy of the supply_demand_df_imbalances DataFrame
    .loc[lambda df: df['date'] == ANALYSIS_MONTHLY_SERIES]  # Filter for the most recent date
    .merge(markets[['parcl_id', 'clean_name', 'state']], on='parcl_id')  # Merge with 'markets' to add 'clean_name' and 'state'
    )
# get data only for the USA, we do this so it is not included in the ranking
supply_demand_imbalance_last_us = supply_demand_imbalance_last \
    .query('date == @max_date') \
    .query(f'parcl_id == {usa_parcl_id}'
           )
# add a rank none to the US data so we can concatenate with the supply and demand imbalanced data
supply_demand_imbalance_last_us['rank']=None

In [None]:
# create ranking for the gap_demand_supply, filter US parcl_id
# this has the df with the rank of the gap_demand_supply  minus the US data
input_final_df_supply_demand_imbalance_last = (
    supply_demand_imbalance_last.copy(deep=True)
    # filter out US data
    .query("parcl_id!=@usa_parcl_id")
    # create a rank based on gap_demand_supply
    .assign(rank = lambda x: x['gap_demand_supply'].rank(ascending=False))
    )

input_final_df_supply_demand_imbalance_last_df = (
    supply_demand_imbalance_last_us.copy(deep=True)
    .loc[:,['date', 'parcl_id', 'pct_change_demand', 'pct_change_supply',
          'ma_pct_change_demand','ma_pct_change_supply','gap_demand_supply','clean_name', 'state','rank']]
    )
# check that we have 100 markets in the input_final_df_supply_demand_imbalance_last dataframe 
print(len(input_final_df_supply_demand_imbalance_last))

In [None]:
# Preparate data to be exported, we append the usa data at the end
data_for_table = pd.concat([
    input_final_df_supply_demand_imbalance_last[
    ['date', 'parcl_id', 'pct_change_demand','pct_change_supply',
    'ma_pct_change_demand','ma_pct_change_supply',
    'gap_demand_supply','clean_name', 'state','rank']],
    input_final_df_supply_demand_imbalance_last_df[[
    'date', 'parcl_id', 'pct_change_demand', 'pct_change_supply',
    'ma_pct_change_demand','ma_pct_change_supply',
    'gap_demand_supply','clean_name', 'state','rank']]
    ]
    )
# print the length of the data, we should have the usa data and the rest of the data
print(len(data_for_table))

In [19]:
# save full rankings to csv
data_for_table_output_file = data_for_table[[
    'parcl_id', 
    'clean_name',
    'state',
    'rank',
    'date', 
    'ma_pct_change_demand',
    'ma_pct_change_supply',
    'gap_demand_supply',
]]

data_for_table_output_file = data_for_table_output_file.rename(columns={
    'clean_name': 'name',
    'ma_pct_change_demand': 'trend_pct_change_demand',
    'ma_pct_change_supply': 'trend_pct_change_supply',
    'gap_demand_supply': 'trend_gap_demand_supply'
})
imbalanced_with_price_changes_data_all = (
    supply_df.copy(deep=True)
    .merge(price_changes_df[['parcl_id', 'date', 'count_price_drop']], on=['parcl_id', 'date'])
    .sort_values(by=['parcl_id', 'date'], ascending=[True, True])
    .assign(pct_price_drops=lambda df: df['count_price_drop'] / df['for_sale_inventory'])
    # Calculate the 3-month rolling average of price changes for each parcl_id
    .assign(
        ma_price_changes=lambda df: df.groupby('parcl_id')['pct_price_drops'].transform(lambda x: x.rolling(window=3).mean())
    )
    .groupby('parcl_id').tail(1)
)

# merge data with price changes
data_for_table_output_file = data_for_table_output_file.merge(imbalanced_with_price_changes_data_all[['parcl_id', 'ma_price_changes']], on='parcl_id', how='left')
data_for_table_output_file = data_for_table_output_file.rename(columns={'ma_price_changes': 'pct_inventory_with_price_cuts'})

# Save output in directory
data_for_table_output_file.to_csv(f'{ROOT_DIR}/september_rankings.csv', index=False)

In [None]:

# Further filter based on sales, inventory, and percentage change conditions
# We want to filter out the markets with low sales, low inventory, and low gap between demand and supply
# we use a threshold of 500 for sales and inventory and 0.45 for gap_demand_supply, meaning a relative shift of 45 percent
# in favor of supply
supply_demand_imbalance_last_filtered = (
    supply_demand_imbalance_last.copy(deep=True)
    .loc[
        (supply_demand_imbalance_last['sales'] > 500) & 
        (supply_demand_imbalance_last['for_sale_inventory'] > 500) 
        & (supply_demand_imbalance_last['gap_demand_supply'] > 0.5)

    ]
)

# Concatenate US-specific data with the filtered data
supply_demand_imbalance_last = pd.concat([supply_demand_imbalance_last_us, supply_demand_imbalance_last_filtered])

print(f'length of supply_demand_imbalance_last is {len(supply_demand_imbalance_last)}')
print(f'there are {len(supply_demand_imbalance_last.parcl_id.unique())} unique parcl_ids in the supply_demand_imbalance_last data')

In [None]:
# pass the list of parcls to imblanced_parcl_ids, this already includes the USA national data
imbalanced_parcl_ids = supply_demand_imbalance_last['parcl_id'].unique().tolist() 
print(f'before filtering for price cuts larger than the national average we have {len(imbalanced_parcl_ids)} imbalanced markets')

### 6.Active supply price drops

In [None]:
# now we will filter based on demand
# Calculate the 3-period rolling average of price drops, filter using query, and extract parcl_ids
imbalanced_with_price_changes_data_all = (
    supply_df.copy(deep=True)
    .merge(price_changes_df[['parcl_id', 'date', 'count_price_drop']], on=['parcl_id', 'date'])
    .assign(pct_price_drops=lambda df: df['count_price_drop'] / df['for_sale_inventory'])
    .sort_values(by=['parcl_id', 'date'], ascending=[True, True])
    # Calculate the 3-month rolling average of price changes for each parcl_id
    .assign(
        ma_price_changes=lambda df: df.groupby('parcl_id')['pct_price_drops'].transform(lambda x: x.rolling(window=3).mean())
    )
    
    .sort_values(by=['parcl_id', 'date'], ascending=[True, True])
    .groupby('parcl_id').tail(1)
    # Sort by the rolling average of price changes in descending order
    .sort_values('ma_price_changes', ascending=False)
)
# define input for table
input_for_table_imbalanced_with_price_changes_data = imbalanced_with_price_changes_data_all.copy(deep=True)
print(f'length of imbalanced_with_price_changes_data_all is {len(imbalanced_with_price_changes_data_all)}')


In [None]:
# filter to only include markets with a gap larger than the defined threshold, should be the same as the lenght
# of imbalance_parcl_ids
imbalanced_with_price_changes_data = (
    imbalanced_with_price_changes_data_all.copy(deep=True)
    # Further filter to include only imbalanced parcl_ids using query
    .query('parcl_id in @imbalanced_parcl_ids')
)
print(f'length of imbalanced_with_price_changes_data is {len(imbalanced_with_price_changes_data)}')

In [None]:
# get the value for the usa
us_price_changes = (imbalanced_with_price_changes_data
                    .query('(parcl_id == @usa_parcl_id)')
                    )['ma_price_changes'].values[0]
 
# filter based on the last 3 months moving average of inventory with price cuts at the national level
print(f'the price cut threshold for this month is {us_price_changes:.2%}')
# Filter the imbalanced markets based on price changes
print(f'before filtering for price cuts larger than the national average we have {len(imbalanced_with_price_changes_data.parcl_id.unique())} imbalanced markets')

In [25]:
# merge with markets info
imbalanced_with_price_changes_data = (
    imbalanced_with_price_changes_data.copy(deep=True)
    .merge(markets[['parcl_id', 'clean_name', 'state']], on='parcl_id')
    .sort_values('ma_price_changes', ascending=False)
    # filter on max date
    .loc[lambda df: df['date'] == df['date'].max()]
    )

In [None]:
# Filter and print how many observations we have
imbalanced_with_price_changes_data = imbalanced_with_price_changes_data.query('ma_price_changes > @us_price_changes')
imbalanced_parcl_ids_final = imbalanced_with_price_changes_data['parcl_id'].unique().tolist()
# add back comp to US
imbalanced_parcl_ids_final = imbalanced_parcl_ids_final + [usa_parcl_id]
# drop baton rouge due to volatility
imbalanced_parcl_ids_final = [x for x in imbalanced_parcl_ids_final if x != 2899589]
print(len(imbalanced_parcl_ids_final))

In [27]:
# filter supply demand based on the new list
supply_demand_imbalance_last = supply_demand_imbalance_last.query('parcl_id in @imbalanced_parcl_ids_final')

In [None]:
# Add a column to identify selected states
target_states = {'TX', 'FL'}
supply_demand_imbalance_last['color_group'] = supply_demand_imbalance_last['state'].apply(
    lambda x: 'FL, TX' if x in target_states else 'Other')

# Get the maximum date for the chart title
chart_max_date = supply_demand_imbalance_last['date'].max()
chart_max_date = chart_max_date.strftime('%B, %Y')


CHART_WIDTH = 1000
CHART_HEIGHT = 800
# Creating the scatter plot
fig = px.scatter(
    supply_demand_imbalance_last, 
    x='ma_pct_change_demand', 
    y='ma_pct_change_supply', 
    color='color_group',  # Use the new color_group column for color
    hover_name='clean_name', 
    title=f'YoY Changes in Supply vs. Demand ({chart_max_date})',
    color_discrete_map={'FL, TX':'red' , 'Other': 'blue'},  # Customize colors,
    text='clean_name'
)

fig.update_traces(
    textposition='top center',
    mode='markers+text'  # Ensure that both markers and text are displayed
)

fig.add_layout_image(
        create_labs_logo_dict()
    )

# Update axes labels and layout to format as a square
fig.update_layout(
    margin=dict(l=40, r=40, t=80, b=40),
    title={
        'y': 0.98,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': style_config['title_font']
    },
     xaxis=dict(
            title_text='YoY % Change Demand (Sales)',
            showgrid=style_config['showgrid'],
            gridwidth=style_config['gridwidth'],
            gridcolor=style_config['grid_color'],
            # tickangle=style_config['tick_angle'],
            tickformat='.0%',
            linecolor=style_config['line_color_axis'],
            linewidth=style_config['linewidth'],
            titlefont=style_config['title_font_axis'],
            zeroline=False,
        ),
        yaxis=dict(
            title_text='YoY % Change Supply',
            showgrid=style_config['showgrid'],
            gridwidth=style_config['gridwidth'],
            gridcolor=style_config['grid_color'],
            tickfont=style_config['axis_font'],
            zeroline=False,
            tickformat='.0%',
            linecolor=style_config['line_color_axis'],
            linewidth=style_config['linewidth'],
            titlefont=style_config['title_font_axis']
        ),
    plot_bgcolor=style_config['background_color'],
    paper_bgcolor=style_config['background_color'],
    font=dict(color=style_config['font_color']),
    legend_title_text='',
    autosize=False,
    height=CHART_HEIGHT,
    width=CHART_WIDTH,
    title_font=dict(size=24),
    xaxis_title_font=dict(size=18),
    yaxis_title_font=dict(size=18),
    legend_title_font=dict(size=14),
    legend_font=dict(size=12),
    legend=dict(
            x=style_config['legend_x'],
            y=style_config['legend_y'],
            xanchor=style_config['legend_xanchor'],
            yanchor=style_config['legend_yanchor'],
            font=style_config['legend_font'],
            bgcolor='rgba(0, 0, 0, 0)'
        ),
)
save_figure(fig, save_path=f'{ROOT_DIR}/changes_supply_yoy_scatter.png', 
            width=CHART_WIDTH, height=CHART_HEIGHT)
fig.show()


In [None]:
# Create the bar chart

# Merge the gap data with the supply and demand data to ensure consistent x-values
merged_data = supply_demand_imbalance_last[['clean_name', 'ma_pct_change_demand', 'ma_pct_change_supply', 'gap_demand_supply']]
merged_data = merged_data.sort_values('gap_demand_supply', ascending=True)

# Melt the data for the bar chart
data_for_bar = pd.melt(merged_data, 
                       id_vars=['clean_name'], 
                       value_vars=['ma_pct_change_demand', 'ma_pct_change_supply'], 
                       var_name='type', 
                       value_name='percent_change')

data_for_bar['type'] = data_for_bar['type'].map({'ma_pct_change_demand': 'Demand', 
                                                 'ma_pct_change_supply': 'Supply',
                                                 })

fig = px.bar(data_for_bar, 
             x='clean_name', 
             y='percent_change', 
             color='type', 
             barmode='relative', 
             title=f'YoY Change in Supply and Demand ({chart_max_date})',
             labels={'percent_change': 'Percent Change', 'clean_name': 'Market'},
             color_discrete_map={'Demand': 'red', 'Supply': 'green'})

# Update the legend names
for trace in fig.data:
    if trace.name == 'Demand':
        trace.name = 'Demand (Sales)'
    elif trace.name == 'Supply':
        trace.name = 'Supply (Inventory)'

# Define dimensions
CHART_WIDTH = 1600
CHART_HEIGHT = 800

fig.update_layout(
    margin=dict(l=40, r=40, t=80, b=40),
    title={
        'y': 0.98,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': style_config['title_font']
    },
    xaxis=dict(
        title_text='',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis'],
        tickfont=dict(size=style_config['axis_font']['size'], color=style_config['axis_font']['color']),
        # showticklabels=False
    ),
    yaxis=dict(
        title_text='Percent Change',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        tickfont=style_config['axis_font'],
        zeroline=False,
        tickformat='.0%',
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    plot_bgcolor=style_config['background_color'],
    paper_bgcolor=style_config['background_color'],
    font=dict(color=style_config['font_color']),
    legend_title_text='',
    autosize=False,
    width=CHART_WIDTH,
    height=CHART_HEIGHT,
    title_font=dict(size=24),
    xaxis_title_font=dict(size=18),
    yaxis_title_font=dict(size=18),
    legend_title_font=dict(size=14),
    legend_font=dict(size=12),
    legend=dict(
        x=style_config['legend_x'],
        y=style_config['legend_y'],
        xanchor=style_config['legend_xanchor'],
        yanchor=style_config['legend_yanchor'],
        font=style_config['legend_font'],
        bgcolor='rgba(0, 0, 0, 0)'
    ),
)

fig.add_layout_image(create_labs_logo_dict())
save_figure(fig, save_path=f'{ROOT_DIR}/changes_supply_yoy_bar.png', 
            width=CHART_WIDTH, height=CHART_HEIGHT)

fig.show()

In [30]:
# save data from scatter plot and bar plot
supply_demand_imbalance_out = supply_demand_imbalance_last[[
    'parcl_id',
    'clean_name',
    'date',
    'ma_pct_change_demand',
    'ma_pct_change_supply',
    'gap_demand_supply'
]]

supply_demand_imbalance_out = supply_demand_imbalance_out.rename(columns={
    'clean_name': 'name',
    'ma_pct_change_demand': 'trend_pct_change_demand',
    'ma_pct_change_supply': 'trend_pct_change_supply'
})

supply_demand_imbalance_out.to_csv(f'{ROOT_DIR}/changes_supply_yoy_gap_bar_scatter_data.csv', index=False)

In [None]:
# Merge the gap data with the supply and demand data to ensure consistent x-values
merged_data = supply_demand_imbalance_last[['clean_name', 'gap_demand_supply']]
merged_data = merged_data.sort_values('gap_demand_supply', ascending=True)

# No need to melt the data as we are only using one value (gap_demand_supply)
data_for_bar = merged_data.copy()

# Create the bar chart for gap_demand_supply with orange color
fig = px.bar(data_for_bar, 
             x='clean_name', 
             y='gap_demand_supply', 
             title=f'Gap Between Demand and Supply ({chart_max_date})',
             labels={'gap_demand_supply': 'Gap (Percentage)', 'clean_name': 'Market'},
             color_discrete_sequence=['orange'])  # Set the bar color to orange

# Define dimensions
CHART_WIDTH = 1600
CHART_HEIGHT = 800

fig.update_layout(
    margin=dict(l=40, r=40, t=80, b=40),
    title={
        'y': 0.98,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': style_config['title_font']
    },
    xaxis=dict(
        title_text='',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis'],
        tickfont=dict(size=style_config['axis_font']['size'], color=style_config['axis_font']['color']),
    ),
    yaxis=dict(
        title_text='Gap (Percentage)',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        tickfont=style_config['axis_font'],
        zeroline=False,
        tickformat='.0%',
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    plot_bgcolor=style_config['background_color'],
    paper_bgcolor=style_config['background_color'],
    font=dict(color=style_config['font_color']),
    autosize=False,
    width=CHART_WIDTH,
    height=CHART_HEIGHT,
    title_font=dict(size=24),
    xaxis_title_font=dict(size=18),
    yaxis_title_font=dict(size=18),
)

# Add any custom images (like a logo)
fig.add_layout_image(create_labs_logo_dict())

# Save the figure
save_figure(fig, save_path=f"{ROOT_DIR}/changes_supply_gap_bar.png", 
            width=CHART_WIDTH, height=CHART_HEIGHT)

# Show the chart
fig.show()

### 7. New construction impact on supply

In [None]:
# we need  to iterated to get the housing event counts 
new_listings = client.market_metrics.housing_event_counts.retrieve(
    parcl_ids=imbalanced_parcl_ids_final,
    start_date='2024-09-01',
    auto_paginate=True
    # limit =1 # limit to 1 to get the most recent data
)

new_listings_construction = client.new_construction_metrics.housing_event_counts.retrieve(
    parcl_ids=imbalanced_parcl_ids_final,
    start_date='2024-09-01',
    auto_paginate=True
    # limit=1 # limit to 1 to get the most recent data
)

In [None]:
# Rename the columns to distinguish between new listings and new construction data
new_listings_construction = (
    new_listings_construction
    .rename(columns={'new_listings_for_sale': 'new_construction_new_listings_for_sale'})
    )

# Output the length of the new listings data to confirm the amount of data retrieved
print(f'Length of new_listings data: {len(new_listings)} and nc data: {len(new_listings_construction)}')

# Output the number of unique 'parcl_id' values to verify coverage across different markets
print(f'There are {len(new_listings.parcl_id.unique())} unique parcl_ids in the new_listings data and'
      f' {len(new_listings_construction.parcl_id.unique())} unique parcl_ids in the new construction data')


In [34]:
# Merge new listings data with new construction listings, calculate percentage, and merge with market names
new_listings_all = (
    new_listings
    # Merge new listings with new construction data on 'parcl_id'
    .merge(new_listings_construction[['parcl_id', 'new_construction_new_listings_for_sale']], 
           on='parcl_id')
    
    # Calculate the percentage of new construction listings out of total new listings
    .assign(
        pct_new_construction=lambda x: x['new_construction_new_listings_for_sale'] / x['new_listings_for_sale']
    )
    
    # Merge with the 'markets' DataFrame to add clean market names based on 'parcl_id'
    .merge(markets[['parcl_id', 'clean_name']], on='parcl_id')
    
)

In [35]:
# Prepare data for the bar chart with sorting, melting, and formatting in one step
data_for_bar = (
    new_listings_all  # Filter for the most recent date
    .sort_values('pct_new_construction', ascending=True)  # Sort by percentage of new construction
    .assign(
        chart_max_date=lambda df: df['date'].max().strftime('%B, %Y')  # Format the latest date
    )
    .pipe(
        lambda df: pd.melt(df, id_vars=['clean_name'], 
                           value_vars=['pct_new_construction'], 
                           var_name='type', 
                           value_name='percentage')  # Reshape for bar chart
    )
)

In [None]:
# Create the stacked bar chart
fig = px.bar(data_for_bar, 
             x='clean_name', 
             y='percentage', 
             color='type', 
             barmode='stack', 
             title=f'Percent of New Listings Coming from New Construction ({chart_max_date})',
             labels={'percentage': 'Percentage', 'clean_name': 'Market'},
             color_discrete_map={'type': 'orange', 'type': 'orange'})

CHART_WIDTH = 1600
CHART_HEIGHT = 800

# Update the layout to remove the legend
fig.update_layout(
    margin=dict(l=40, r=40, t=80, b=40),
    title={
        'y': 0.98,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': style_config['title_font']
    },
    xaxis=dict(
        title_text='',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis'],
        tickfont=dict(size=style_config['axis_font']['size'], color=style_config['axis_font']['color']),
    ),
    yaxis=dict(
        title_text='% of New Inventory',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        tickfont=style_config['axis_font'],
        zeroline=False,
        tickformat='.0%',
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    plot_bgcolor=style_config['background_color'],
    paper_bgcolor=style_config['background_color'],
    font=dict(color=style_config['font_color']),
    autosize=False,
    width=CHART_WIDTH,
    height=CHART_HEIGHT,
    title_font=dict(size=24),
    xaxis_title_font=dict(size=18),
    yaxis_title_font=dict(size=18),
    legend=dict(
        x=style_config['legend_x'],
        y=style_config['legend_y'],
        xanchor=style_config['legend_xanchor'],
        yanchor=style_config['legend_yanchor'],
        font=style_config['legend_font'],
        bgcolor='rgba(0, 0, 0, 0)'
    ),
    showlegend=False  # This will hide the legend
)

fig.add_layout_image(create_labs_logo_dict())
save_figure(fig, save_path=f'{ROOT_DIR}/pct_new_listings_construction_bar.png', 
            width=CHART_WIDTH, height=CHART_HEIGHT)

fig.show()


In [37]:
# save the data
nc_out_data = pd.merge(data_for_bar, markets[['clean_name', 'parcl_id']], on='clean_name')
nc_out_data = nc_out_data[['parcl_id', 'clean_name', 'percentage']]
nc_out_data = nc_out_data.rename(columns={
    'clean_name': 'name',
    'percentage': 'pct_new_construction'
})
nc_out_data.to_csv(f'{ROOT_DIR}/pct_new_listings_construction_bar.csv', index=False)

In [38]:
# Clean and process price changes data, calculating percentage of price drops and merging relevant columns
price_changes_skewed = (
    price_changes_df.copy(deep=True)
    # Filter for relevant parcl_ids using the pre-combined list
    .query('parcl_id in @imbalanced_parcl_ids_final')
    )


price_changes_skewed = (
    price_changes_skewed
    # Merge with the supply data on 'parcl_id' and 'date' to bring in for_sale_inventory
    .merge(supply_df[['parcl_id', 'date', 'for_sale_inventory']], on=['parcl_id', 'date'])
    
    # Calculate the percentage of price drops relative to the for_sale_inventory
    .assign(
        pct_price_drops=lambda df: df['count_price_drop'] / df['for_sale_inventory']
    )
    
    # Merge with the markets DataFrame to add clean market names
    .merge(markets[['parcl_id', 'clean_name']], on='parcl_id')
)

In [None]:
price_changes_skewed.groupby('parcl_id').tail(1)

In [None]:
# Get max date for chart
max_date_for_chart = price_changes_skewed['date'].max().date()
max_date_for_chart = max_date_for_chart.strftime('%B %d, %Y')

CHART_WIDTH = 1600
CHART_HEIGHT = 800
# Create the line chart using Plotly Express
fig = px.line(
    price_changes_skewed,
    x='date',
    y='pct_price_drops',
    color='clean_name',
    line_group='clean_name',
    labels={'pct_price_drops': '% of Inventory with Price Cuts'},
    title=f'Percentage of Inventory with Price Reductions ({max_date_for_chart})'
)

# Update traces to apply specific styles
for trace in fig.data:
    if trace.name == 'USA':
        trace.update(
            line=dict(color='red', width=4),
            opacity=1
        )
    else:
        trace.update(
            line=dict(color='lightblue', dash='dash', width=2),
            opacity=0.8
        )
    # Remove text annotations from traces
    trace.update(
        mode='lines'
    )

# Find the latest date in the dataset
latest_date = max(price_changes_skewed['date'])

# Add annotations for each line on the far right
annotations = []
y_positions = []

for trace in fig.data:
    # Get the last y-value for each clean_name
    last_y_value = price_changes_skewed[
        (price_changes_skewed['clean_name'] == trace.name) &
        (price_changes_skewed['date'] == latest_date)
    ]['pct_price_drops'].values[0]
    
    # Only add the annotation if it doesn't overlap with existing annotations
    if not any(abs(last_y_value - y) < 0.02 for y in y_positions):  # Adjust threshold as needed
        annotations.append(dict(
            x=latest_date,
            y=last_y_value,
            xref='x',
            yref='y',
            text=trace.name,
            showarrow=False,
            xanchor='left',
            font=dict(size=12)  # Adjust the font size if needed
        ))
        y_positions.append(last_y_value)

fig.add_layout_image(
        create_labs_logo_dict()
)

# Update layout for axes, title, and other styling
fig.update_layout(
    width=CHART_WIDTH,
    height=CHART_HEIGHT,
    xaxis=dict(
        title='',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        # tickangle=style_config['tick_angle'],
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    yaxis=dict(
        title='% Price Reductions',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        tickfont=style_config['axis_font'],
        zeroline=False,
        tickformat='.0%',
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    plot_bgcolor=style_config['background_color'],
    paper_bgcolor=style_config['background_color'],
    font=dict(color=style_config['font_color']),
    showlegend=False,  # Remove the legend
    margin=dict(l=40, r=40, t=80, b=40),
    title={
        'y': 0.98,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': dict(size=24)
    },
    annotations=annotations  # Add annotations
)
save_figure(fig, save_path=f'{ROOT_DIR}/pct_inventory_price_reductions_line_chart.png', 
            width=CHART_WIDTH, height=CHART_HEIGHT)

fig.show()

### 8. Appreciation since COVID

In [41]:
# Save data for the line chart
price_changes_data = price_changes_skewed[['parcl_id', 'clean_name', 'date', 'pct_price_drops']]
price_changes_data = price_changes_data.rename(columns={
    'clean_name': 'name',
    'pct_price_drops': 'pct_price_drops'
})
price_changes_data.to_csv(f'{ROOT_DIR}/pct_inventory_price_reductions_line_chart_data.csv', index=False)

In [None]:
# filter to most out of balance markets regarding supply and demand
prices_need_to_give_back = prices_df.copy(deep=True).loc[prices_df['parcl_id'].isin(imbalanced_parcl_ids_final + [5826765])]
print(f'There are {len(prices_need_to_give_back)} observations in the price history df.')
print(f'There are {len(prices_need_to_give_back["parcl_id"].unique())} with substantial price reductions and distressed demand.')

In [43]:
# We will iterate over the parcl_ids to get the time series analysis and identify what
# parcls need to give back the most from the beginning of the pandemic compared to the USA
all_rows = []
for pid in prices_need_to_give_back['parcl_id'].unique().tolist():
    prices_skew_test = prices_need_to_give_back.copy(deep=True).loc[prices_need_to_give_back['parcl_id']==pid]
    price_ts_analysis = TimeSeriesAnalysis(prices_skew_test, 'date', 'price_per_square_foot_median_sales', freq='M')
    price_rate_of_change_stats = price_ts_analysis.calculate_changes(change_since_date='3/1/2020')
    row = pd.json_normalize(price_rate_of_change_stats)
    row['parcl_id'] = pid
    all_rows.append(row)

In [44]:
# Perform time series analysis for each unique parcl_id in a chained and list comprehension style
all_rows = (
    prices_need_to_give_back['parcl_id'].unique()  # Get the unique parcl_ids
    .tolist()  # Convert to a list for iteration
)

ts_analysis = pd.concat([
    pd.json_normalize(
        TimeSeriesAnalysis(
            prices_need_to_give_back.query('parcl_id == @pid'),  # Filter for each parcl_id
            'date', 'price_per_square_foot_median_sales', freq='M'  # Perform time series analysis
        ).calculate_changes(change_since_date='3/1/2020')  # Calculate changes since 3/1/2020
    ).assign(parcl_id=pid)  # Add the parcl_id to the result
    for pid in all_rows  # Iterate over each unique parcl_id
], ignore_index=True)

In [45]:
# Get unique parcl_ids
all_rows = prices_need_to_give_back['parcl_id'].unique().tolist()

# Calculate price changes for the last month (September 2024)
last_month_changes = pd.concat([
    pd.json_normalize(
        TimeSeriesAnalysis(
            prices_need_to_give_back.query('parcl_id == @pid'),  # Filter for each parcl_id
            'date', 'price_per_square_foot_median_sales', freq='M'  # Perform time series analysis
        ).calculate_changes(change_since_date='3/1/2020')  # Calculate changes since 3/1/2020 for the last month
    ).assign(parcl_id=pid)  # Add the parcl_id to the result
    for pid in all_rows
], ignore_index=True)

# Calculate price changes for the second-to-last month (August 2024)
second_last_month_changes = pd.concat([
    pd.json_normalize(
        TimeSeriesAnalysis(
            prices_need_to_give_back.query('parcl_id == @pid and date <= "2024-08-31"'),  # Filter for each parcl_id up to August 2024
            'date', 'price_per_square_foot_median_sales', freq='M'  # Perform time series analysis
        ).calculate_changes(change_since_date='3/1/2020')  # Calculate changes since 3/1/2020 for the second-to-last month
    ).assign(parcl_id=pid)  # Add the parcl_id to the result
    for pid in all_rows
], ignore_index=True)
second_last_month_changes = second_last_month_changes.rename(columns={'change_since_date.percent_change': 'change_august'})

# Merge the two results on parcl_id
ts_analysis = last_month_changes.merge(
    second_last_month_changes[['parcl_id','change_august']],
    on='parcl_id',
)

# Now `ts_analysis` contains both the last month and second-to-last month changes.


In [46]:
# Prepare the data for the line chart
hf = (
    ts_analysis
)

In [None]:

# Merge filtered hf with markets DataFrame and retrieve the unique parcl_ids in a chained operation
parcls_need_to_give_back_list = (
    hf.loc[:, ['parcl_id', 'peak_to_current.percent_change', 'change_since_date.percent_change','change_august']]  # Use .loc[] for column selection
    # Merge with markets DataFrame to add 'clean_name'
    .merge(markets[['parcl_id', 'clean_name']], on='parcl_id')
    
    # Extract unique parcl_id values and convert them to a list
    .parcl_id.unique().tolist()
)

# parcls_need_to_give_back_list contains the unique parcl_ids after the merge3
print(len(parcls_need_to_give_back_list))


In [48]:
# Filter prices_df based on parcl_id from parcls_need_to_give_back_list and a specific parcl_id (5826765)

prices_need_to_give_back_df = (
    prices_df
    # Filter rows where parcl_id is in the list plus the specific parcl_id 5826765
    .loc[prices_df['parcl_id'].isin(parcls_need_to_give_back_list + [5826765])]
)

In [None]:
# Show percent change relative to the first value after 2020-03-01

chart = (
    prices_need_to_give_back_df
    # Filter rows where the date is greater than or equal to '2020-03-01'
    .loc[lambda df: df['date'] >= '2020-03-01']
    
    # Sort the filtered data by date
    .sort_values('date')
    
    # Select relevant columns for further processing
    .loc[:, ['date', 'parcl_id', 'price_per_square_foot_median_sales']]
    
    # Merge the current data with the first value for each 'parcl_id' on '3/1/2020'
    .merge(
        prices_need_to_give_back_df
        .loc[lambda df: df['date'] == '2020-03-01', ['parcl_id', 'price_per_square_foot_median_sales']]
        .rename(columns={'price_per_square_foot_median_sales': 'start'}),
        on='parcl_id'
    )
    
    # Calculate the percentage change relative to the start value
    .assign(
        pct_change=lambda df: (df['price_per_square_foot_median_sales'] - df['start']) / df['start'],
        max_value = lambda df: df.groupby('parcl_id')['price_per_square_foot_median_sales'].transform('max')
    )
    # Merge the data with the markets DataFrame to add clean market names
    .merge(markets[['parcl_id', 'clean_name']], on='parcl_id')
    .assign(diff_from_peak = lambda x: (x['max_value'] - x['price_per_square_foot_median_sales'])/x['max_value'])
)

prices_since_last_report = (
    prices_need_to_give_back_df.copy(deep=True)
    # Filter rows where the date is greater than or equal to '2020-03-01'
    .query('date >= "2020-03-01"')
    
    # Sort the filtered data by date
    .sort_values(by =['parcl_id','date'])
    
    # Select relevant columns for further processing
    .loc[:, ['date', 'parcl_id', 'price_per_square_foot_median_sales']]

    .assign(
        change_price_mom = lambda df: df.groupby('parcl_id')['price_per_square_foot_median_sales'].pct_change(),
    )
    
    .query('date == "2024-09-01"')
    .merge(markets[['parcl_id', 'clean_name']], on='parcl_id')
)
prices_since_last_report

In [None]:

# get max date
chart_max_date = chart['date'].max()
chart_max_date = chart_max_date.strftime('%B, %Y')
print(chart_max_date)

CHART_WIDTH = 1600
CHART_HEIGHT = 800

fig = px.line(
    chart,
    x='date',
    y='pct_change',
    color='clean_name',
    line_group='clean_name',
    labels={'pct_change': '% Change'},
    title=f'% Change in Home Values since the Start of the Pandemic ({chart_max_date})'
)

# Update traces to apply specific styles
for trace in fig.data:
    if trace.name == 'USA':
        trace.update(
            line=dict(color='red', width=4),
            opacity=1
        )
    else:
        trace.update(
            line=dict(color='lightblue', dash='dash', width=2),
            opacity=0.8
        )
    # Remove text annotations from traces
    trace.update(
        mode='lines'
    )

# Find the latest date in the dataset
latest_date = max(chart['date'])

# Add annotations for each line on the far right
annotations = []
y_positions = []

for trace in fig.data:
    # Get the last y-value for each clean_name
    last_y_value = chart[
        (chart['clean_name'] == trace.name) &
        (chart['date'] == latest_date)
    ]['pct_change'].values[0]
    
    # Only add the annotation if it doesn't overlap with existing annotations
    if not any(abs(last_y_value - y) < 0.02 for y in y_positions):  # Adjust threshold as needed
        annotations.append(dict(
            x=latest_date,
            y=last_y_value,
            xref='x',
            yref='y',
            text=trace.name,
            showarrow=False,
            xanchor='left',
            font=dict(size=12)  # Adjust the font size if needed
        ))
        y_positions.append(last_y_value)

fig.add_layout_image(
        create_labs_logo_dict()
)

# Update layout for axes, title, and other styling
fig.update_layout(
    width=CHART_WIDTH,
    height=CHART_HEIGHT,
    xaxis=dict(
        title='',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        # tickangle=style_config['tick_angle'],
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    yaxis=dict(
        title='% Change',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        tickfont=style_config['axis_font'],
        zeroline=False,
        tickformat='.0%',
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    plot_bgcolor=style_config['background_color'],
    paper_bgcolor=style_config['background_color'],
    font=dict(color=style_config['font_color']),
    showlegend=False,  # Remove the legend
    margin=dict(l=40, r=40, t=80, b=40),
    title={
        'y': 0.98,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': dict(size=24)
    },
    annotations=annotations  # Add annotations
)
save_figure(fig, save_path=f'{ROOT_DIR}/pct_change_home_values_since_covid_line_chart.png', 
            width=CHART_WIDTH, height=CHART_HEIGHT)
fig.show()


In [51]:
# save data
chart_data = chart[['parcl_id', 'clean_name', 'date', 'pct_change']]
chart_data.to_csv(f'{ROOT_DIR}/pct_change_home_values_since_covid_line_chart_data.csv', index=False)

### 9. Real time price check

In [None]:
START_DATE = '2020-03-01'

# isolate markets in the list that have price feeds
pf_ids = markets.loc[(markets['parcl_id'].isin(imbalanced_parcl_ids_final)) & markets['pricefeed_market']== 1]['parcl_id'].tolist()

sales_price_feeds = client.price_feed.price_feed.retrieve(
     parcl_ids=pf_ids,
     start_date=START_DATE,
     limit=1000,  # expand the limit to 1000, these are daily series
     auto_paginate=True, # auto paginate to get all the data - WARNING: ~6k credits can be used in one parcl price feed. Change the START_DATE to a more recent date to reduce the number of credits used
)
print(len(pf_ids))

In [53]:
# Show percent change for sales price feeds relative to the first value after 2020-03-01

chart_pf = (
    sales_price_feeds
    # Sort the data by date
    .sort_values('date')
    
    # Select relevant columns for further processing
    .loc[:, ['date', 'parcl_id', 'price_feed']]
    
    # Merge the current data with the first value for each 'parcl_id' on '3/1/2020'
    .merge(
        sales_price_feeds
        .loc[lambda df: df['date'] == '2020-03-01', ['parcl_id', 'price_feed']]
        .rename(columns={'price_feed': 'start'}),
        on='parcl_id'
    )
    
    # Calculate the percentage change relative to the start value
    .assign(
        pct_change=lambda df: (df['price_feed'] - df['start']) / df['start']
    )
    
    # Merge the data with the markets DataFrame to add clean market names
    .merge(markets[['parcl_id', 'clean_name']], on='parcl_id')
)


In [None]:
# create chart
chart_max_date = chart_pf['date'].max()
chart_max_date = chart_max_date.strftime('%B %d, %Y')
print(chart_max_date)

fig = px.line(
    chart_pf,
    x='date',
    y='pct_change',
    color='clean_name',
    line_group='clean_name',
    labels={'pct_change': '% Change'},
    title=f'% Change in Pricefeed since the Start of the Pandemic ({chart_max_date})'
)

# Update traces to apply specific styles
for trace in fig.data:
    if trace.name == 'USA':
        trace.update(
            line=dict(color='red', width=4),
            opacity=1
        )
    else:
        trace.update(
            line=dict(color='lightblue', dash='dash', width=2),
            opacity=0.8
        )
    # Remove text annotations from traces
    trace.update(
        mode='lines'
    )

# Find the latest date in the dataset
latest_date = max(chart_pf['date'])

# Add annotations for each line on the far right
annotations = []
y_positions = []

for trace in fig.data:
    # Get the last y-value for each clean_name
    last_y_value = chart_pf[
        (chart_pf['clean_name'] == trace.name) &
        (chart_pf['date'] == latest_date)
    ]['pct_change'].values[0]
    
    # Only add the annotation if it doesn't overlap with existing annotations
    if not any(abs(last_y_value - y) < 0.02 for y in y_positions):  # Adjust threshold as needed
        annotations.append(dict(
            x=latest_date,
            y=last_y_value,
            xref='x',
            yref='y',
            text=trace.name,
            showarrow=False,
            xanchor='left',
            font=dict(size=12)  # Adjust the font size if needed
        ))
        y_positions.append(last_y_value)

fig.add_layout_image(
        create_labs_logo_dict()
)

# Update layout for axes, title, and other styling
fig.update_layout(
    width=CHART_WIDTH,
    height=CHART_HEIGHT,
    xaxis=dict(
        title='',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        # tickangle=style_config['tick_angle'],
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    yaxis=dict(
        title='% Change',
        showgrid=style_config['showgrid'],
        gridwidth=style_config['gridwidth'],
        gridcolor=style_config['grid_color'],
        tickfont=style_config['axis_font'],
        zeroline=False,
        tickformat='.0%',
        linecolor=style_config['line_color_axis'],
        linewidth=style_config['linewidth'],
        titlefont=style_config['title_font_axis']
    ),
    plot_bgcolor=style_config['background_color'],
    paper_bgcolor=style_config['background_color'],
    font=dict(color=style_config['font_color']),
    showlegend=False,  # Remove the legend
    margin=dict(l=40, r=40, t=80, b=40),
    title={
        'y': 0.98,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': dict(size=24)
    },
    annotations=annotations  # Add annotations
)
save_figure(fig, save_path=f'{ROOT_DIR}/realtime_pct_change_home_values_since_covid_line_chart.png',
            width=CHART_WIDTH, height=CHART_HEIGHT)
fig.show()

In [55]:
# Save the data
chart_pf_data = chart_pf[['parcl_id', 'clean_name', 'date', 'pct_change']]
chart_pf_data = chart_pf_data.rename(columns={
    'clean_name': 'name',
    'pct_change': 'pct_change'
})
chart_pf_data.to_csv(f'{ROOT_DIR}/realtime_pct_change_home_values_since_covid_line_chart_data.csv', index=False)