<center>
<h1>Welcome to the Lab 🥼🧪</h1>
</center>

### How can I analyze dispostions by SFR operators and how profitable those sales were

In this notebook, we will be examining homes solf by Invitation Homes in the Houston MSA market for the last 12 months (as of July 22 2025). We will use the Parcl Labs API to get the event history for the properties of interest.

#### Need help getting started?

As a reminder, you can get your Parcl Labs API key [here](https://app.parcllabs.com/) to follow along.

To run this immediately, you can use Google Colab.

Run in collab --> [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ParclLabs/parcllabs-cookbook/blob/main/examples/experimental/inventory_analysis_invitation_homes.ipynb)

In [20]:
# install most recent version of parcllabs
%pip install --upgrade parcllabs

Note: you may need to restart the kernel to use updated packages.


In [22]:
# import the libraries for the analysis
import os
import pandas as pd
import numpy as np
from parcllabs import ParclLabsClient


# Create a ParclLabsClient instance
client = ParclLabsClient(
    api_key=os.environ.get('PARCL_LABS_API_KEY', "<your Parcl Labs API key if not set as environment variable>"), 
    limit=1000, 
)

In [23]:
# search for Houston market
markets = client.search.markets.retrieve(
    query = 'Houston',
    location_type = 'CBSA',
    sort_by='TOTAL_POPULATION',  # Sort by total population
    sort_order='DESC',           # In descending order
    limit=10                    # Limit results to top 100 metros
)
markets

Unnamed: 0,parcl_id,country,geoid,state_fips_code,name,state_abbreviation,region,location_type,total_population,median_income,parcl_exchange_market,pricefeed_market,case_shiller_10_market,case_shiller_20_market
0,2899967,USA,26420,48,"Houston-The Woodlands-Sugar Land, Tx",TX,WEST_SOUTH_CENTRAL,CBSA,7142603,78061,0,1,0,0


In [24]:
# save the parcl id of the houston market, in this case it is the first market 
# in the list with index 0
market_for_analysis_id = markets.iloc[0]['parcl_id']
print('Houston market parcl id: ', market_for_analysis_id)

Houston market parcl id:  2899967


In [25]:
# set the data frame to show all columns
pd.set_option('display.max_columns', None)

In [26]:
# now we will get the event history for the properties of interest using the same endpoint as before
# this will allow us to get the average time in market
df_properties_listed_last_12_months, metadata = client.property_v2.search.retrieve(
    parcl_ids=[market_for_analysis_id],
    event_names=['ALL_LISTINGS'], # search all listings events
    max_event_date="2025-07-22", # get the events for the last 12 months
    min_event_date="2024-07-22",
    include_property_details=True, # include the property details in the response
    owner_name=["INVITATION_HOMES"], # only get the properties that are owned by Invitation Homes at the time of the event
    limit = 10000 # set a high limit to get all the events
    )
all_properties_of_interest = df_properties_listed_last_12_months.parcl_property_id.unique().tolist()
print(f'We found {len(all_properties_of_interest)} properties that were listed for sale in the last 12 months')
print(f'total number of rows: {len(df_properties_listed_last_12_months)}')
df_properties_listed_last_12_months.head()

Processing property search request...
We found 85 properties that were listed for sale in the last 12 months
total number of rows: 296


Unnamed: 0,parcl_property_id,property_metadata_bathrooms,property_metadata_bedrooms,property_metadata_sq_ft,property_metadata_year_built,property_metadata_property_type,property_metadata_address1,property_metadata_address2,property_metadata_city,property_metadata_state,property_metadata_latitude,property_metadata_longitude,property_metadata_city_name,property_metadata_county_name,property_metadata_metro_name,property_metadata_record_added_date,property_metadata_current_on_market_flag,property_metadata_current_on_market_rental_flag,event_event_type,event_event_name,event_event_date,event_entity_owner_name,event_true_sale_index,event_price,event_transfer_index,event_investor_flag,event_owner_occupied_flag,event_new_construction_flag,event_current_owner_flag,event_record_updated_date
0,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,PRICE_CHANGE,2025-06-25,INVITATION_HOMES,7,150000.0,8,1,0,0,1,2025-07-23
1,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,LISTED_SALE,2025-06-25,INVITATION_HOMES,7,150000.0,8,1,0,0,1,2025-07-23
2,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,LISTING_REMOVED,2025-06-14,INVITATION_HOMES,7,219000.0,8,1,0,0,1,2025-07-23
3,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,PRICE_CHANGE,2025-05-22,INVITATION_HOMES,7,219000.0,8,1,0,0,1,2025-07-23
4,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,LISTED_SALE,2024-12-18,INVITATION_HOMES,7,245000.0,8,1,0,0,1,2025-07-23


In [27]:
# now lets get all the event data for those properties
# we dont include the owner name here 
# to make sure we also get whena property was sold by invitation homes
# and is no longer associated to them 
all_events_for_properties_of_interest, metadata = client.property_v2.search.retrieve(
    parcl_property_ids=all_properties_of_interest,
    event_names=['ALL_SOLD','ALL_LISTINGS'], # search all listings and all sold events
    max_event_date="2025-07-22", # get the events for the last 12 months
    min_event_date="2024-07-22",
    include_property_details=True, # include the property details in the response
    
    limit = 10000 # set a high limit to get all the events
    )
print(len(all_events_for_properties_of_interest))
all_events_for_properties_of_interest.head()

Processing property search request...
434


Unnamed: 0,parcl_property_id,property_metadata_bathrooms,property_metadata_bedrooms,property_metadata_sq_ft,property_metadata_year_built,property_metadata_property_type,property_metadata_address1,property_metadata_address2,property_metadata_city,property_metadata_state,property_metadata_latitude,property_metadata_longitude,property_metadata_city_name,property_metadata_county_name,property_metadata_metro_name,property_metadata_record_added_date,property_metadata_current_on_market_flag,property_metadata_current_on_market_rental_flag,event_event_type,event_event_name,event_event_date,event_entity_owner_name,event_true_sale_index,event_price,event_transfer_index,event_investor_flag,event_owner_occupied_flag,event_new_construction_flag,event_current_owner_flag,event_record_updated_date
0,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,PRICE_CHANGE,2025-06-25,INVITATION_HOMES,7,150000.0,8,1.0,0.0,0,1,2025-07-23
1,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,LISTED_SALE,2025-06-25,INVITATION_HOMES,7,150000.0,8,1.0,0.0,0,1,2025-07-23
2,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,LISTING_REMOVED,2025-06-14,INVITATION_HOMES,7,219000.0,8,1.0,0.0,0,1,2025-07-23
3,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,PRICE_CHANGE,2025-05-22,INVITATION_HOMES,7,219000.0,8,1.0,0.0,0,1,2025-07-23
4,66650970,2.0,3,1750,1995.0,SINGLE_FAMILY,4558 BUCKLERIDGE RD,,HOUSTON,TX,29.58446,-95.45311,Houston City,Fort Bend County,,2024-12-13,1,0.0,LISTING,LISTED_SALE,2024-12-18,INVITATION_HOMES,7,245000.0,8,1.0,0.0,0,1,2025-07-23


In [28]:
# we can save this data to a csv file
all_events_for_properties_of_interest.to_csv('all_events_for_properties_of_interest_invitation_homes_072225.csv', index=False)

In [29]:
# Lets find number of properties listed for sale in the last 12 months
number_of_unique_properties_listed_for_sale_in_12_months = (all_events_for_properties_of_interest
                                                .query('event_event_name == "LISTED_SALE"')
                                                .query('event_event_date >= "2024-07-22"')
                                                .parcl_property_id.nunique())
print(f'Number of unique properties listed for sale in the last 12 months: {number_of_unique_properties_listed_for_sale_in_12_months}')


Number of unique properties listed for sale in the last 12 months: 83


In [30]:
# now lets do how many were sold in the last 12 months
number_of_unique_properties_sold_in_12_months = (all_events_for_properties_of_interest
                                                .query('event_event_name == "SOLD"')
                                                .parcl_property_id.nunique())
print(f'Number of unique properties sold in the last 12 months: {number_of_unique_properties_sold_in_12_months}')
number_of_unique_properties_sold_in_12_months_ids = (all_events_for_properties_of_interest
                                                .query('event_event_name == "SOLD"')
                                                .parcl_property_id.unique().tolist())

Number of unique properties sold in the last 12 months: 51


In [31]:
# get a list of the properties that actually sold
properties_sold_in_12_months = (all_events_for_properties_of_interest
                                .query('event_event_name == "SOLD"')
                                .parcl_property_id.unique().tolist())
print(len(properties_sold_in_12_months))

51


In [32]:
# check values for a specific property
all_events_for_properties_of_interest.query('parcl_property_id==75571945').sort_values(by='event_event_date', ascending=True)


Unnamed: 0,parcl_property_id,property_metadata_bathrooms,property_metadata_bedrooms,property_metadata_sq_ft,property_metadata_year_built,property_metadata_property_type,property_metadata_address1,property_metadata_address2,property_metadata_city,property_metadata_state,property_metadata_latitude,property_metadata_longitude,property_metadata_city_name,property_metadata_county_name,property_metadata_metro_name,property_metadata_record_added_date,property_metadata_current_on_market_flag,property_metadata_current_on_market_rental_flag,event_event_type,event_event_name,event_event_date,event_entity_owner_name,event_true_sale_index,event_price,event_transfer_index,event_investor_flag,event_owner_occupied_flag,event_new_construction_flag,event_current_owner_flag,event_record_updated_date
18,75571945,2.0,3,2025,1979.0,SINGLE_FAMILY,12919 VENICE LN,,STAFFORD,TX,29.629315,-95.53977,Stafford City,Fort Bend County,,2024-12-13,0,0.0,LISTING,LISTED_SALE,2024-08-29,INVITATION_HOMES,3,225000.0,5,1.0,0.0,0,0,2025-02-06
17,75571945,2.0,3,2025,1979.0,SINGLE_FAMILY,12919 VENICE LN,,STAFFORD,TX,29.629315,-95.53977,Stafford City,Fort Bend County,,2024-12-13,0,0.0,SALE,SOLD,2024-10-01,,4,0.0,6,,,0,0,2025-02-07
16,75571945,2.0,3,2025,1979.0,SINGLE_FAMILY,12919 VENICE LN,,STAFFORD,TX,29.629315,-95.53977,Stafford City,Fort Bend County,,2024-12-13,0,0.0,SALE,SOLD,2024-10-03,,5,0.0,7,0.0,1.0,0,1,2025-02-07


In [33]:
# Ensure the 'event_event_date' column is of datetime type before any calculations
all_events_for_properties_of_interest['event_event_date'] = pd.to_datetime(all_events_for_properties_of_interest['event_event_date'])

# Calculate days from first listing event to first SOLD event for each property
days_on_market_df = (all_events_for_properties_of_interest
    .query('parcl_property_id in @properties_sold_in_12_months')
    .query('event_event_name in ["LISTED_SALE", "RELISTED", "PRICE_CHANGE", "SOLD"]')
    .sort_values(['parcl_property_id', 'event_event_date'])
    .assign(
        # Mark properties that have SOLD events
        has_sold = lambda x: x.groupby('parcl_property_id')['event_event_name'].transform(lambda y: 'SOLD' in y.values),
        # Mark the first event for each property
        is_first_event = lambda x: x.groupby('parcl_property_id').cumcount() == 0,
        # Create normalized event name - only transform if it's first event AND property has SOLD
        normalized_event_name = lambda x: np.where(
            (x['has_sold']) & 
            (x['is_first_event']) & 
            (x['event_event_name'].isin(['RELISTED', 'PRICE_CHANGE'])),
            'LISTED_SALE',
            x['event_event_name']
        )
    )
    .query('normalized_event_name in ["LISTED_SALE", "SOLD"]')
    .groupby(['parcl_property_id', 'normalized_event_name'])['event_event_date'].first()
    .unstack('normalized_event_name')
    .dropna()  # Only keep properties with both LISTED_SALE and SOLD events
    .assign(days_on_market = lambda x: (x['SOLD'] - x['LISTED_SALE']).dt.days)
    .query('days_on_market >= 0')  # Ensure SOLD comes after LISTED_SALE
)

# Calculate average days on market
avg_days_on_market = days_on_market_df['days_on_market'].mean()
median_days_on_market = days_on_market_df['days_on_market'].median()
print(f'Average days on market: {avg_days_on_market}')
print(f'Median days on market: {median_days_on_market}')
print(f'Number of properties sold used for the analysis: {len(days_on_market_df)}')
days_on_market_df.head(15)

Average days on market: 89.03921568627452
Median days on market: 68.0
Number of properties sold used for the analysis: 51


normalized_event_name,LISTED_SALE,SOLD,days_on_market
parcl_property_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
75571945,2024-08-29,2024-10-01,33
75577853,2025-02-21,2025-05-06,74
76017546,2025-02-21,2025-04-03,41
76468961,2024-09-20,2024-10-31,41
79456747,2024-08-14,2025-03-31,229
81163110,2024-09-27,2025-04-17,202
83049510,2024-12-23,2025-06-17,176
84115018,2025-03-12,2025-04-25,44
91835509,2024-08-30,2025-02-25,179
92917625,2024-08-27,2024-12-30,125


In [37]:
# Calculate price changes (absolute and percentage) for properties with price changes
price_changes_analysis = (all_events_for_properties_of_interest
    .query('event_event_name in ["LISTED_SALE", "PRICE_CHANGE", "RELISTED", "SOLD"]')  # Events with prices
    .query('event_price>0')
    .sort_values(['parcl_property_id', 'event_event_date'], ascending=[True, True])
    .assign(
        previous_price = lambda x: x.groupby('parcl_property_id')['event_price'].shift(1),
        price_change_absolute = lambda x: x['event_price'] - x['previous_price'],
        price_change_percentage = lambda x: ((x['event_price'] - x['previous_price']) / x['previous_price'] * 100)
    )
    .assign(
        is_price_change = lambda x: (
            (x['event_event_name'].isin(['PRICE_CHANGE', 'RELISTED','LISTED_SALE'])) & 
            (x['previous_price'].notna()) &  # Must have a previous price
            (x['price_change_absolute'] != 0)  # Price must actually change
        )
    )
    .assign(
        price_changes_per_property = lambda x: x.groupby('parcl_property_id')['is_price_change'].transform('sum')
    )
    .query('event_price > 0')  # remove properties with no price
    .query('is_price_change == True or event_event_name == "SOLD"')  # Show price changes and final sales
    # can be modified to only show is_price_change, meaning a true price change
    .loc[:, ['parcl_property_id', 'event_event_name', 'event_event_date', 
            'event_price', 'previous_price', 
            'price_change_absolute', 'price_change_percentage', 
            'price_changes_per_property']]
)

print(f'Number of properties with price changes: {price_changes_analysis.parcl_property_id.nunique()}')

# Summary statistics for price changes per property
price_change_counts_summary = (price_changes_analysis
    .groupby('parcl_property_id')['price_changes_per_property']
    .first()  # Get unique count per property
    .describe()
)

print(f'Price changes per property statistics:')
print(price_change_counts_summary)

price_changes_analysis.head(10)

Number of properties with price changes: 67
Price changes per property statistics:
count    67.000000
mean      1.686567
std       1.698564
min       0.000000
25%       0.000000
50%       1.000000
75%       3.000000
max       6.000000
Name: price_changes_per_property, dtype: float64


Unnamed: 0,parcl_property_id,event_event_name,event_event_date,event_price,previous_price,price_change_absolute,price_change_percentage,price_changes_per_property
3,66650970,PRICE_CHANGE,2025-05-22,219000.0,245000.0,-26000.0,-10.612245,2
0,66650970,PRICE_CHANGE,2025-06-25,150000.0,219000.0,-69000.0,-31.506849,2
9,70880233,PRICE_CHANGE,2025-06-14,248000.0,258000.0,-10000.0,-3.875969,3
8,70880233,PRICE_CHANGE,2025-07-07,235000.0,248000.0,-13000.0,-5.241935,3
7,70880233,PRICE_CHANGE,2025-07-14,224950.0,235000.0,-10050.0,-4.276596,3
24,75577853,PRICE_CHANGE,2025-03-20,237500.0,245000.0,-7500.0,-3.061224,3
22,75577853,PRICE_CHANGE,2025-05-01,227500.0,237500.0,-10000.0,-4.210526,3
20,75577853,PRICE_CHANGE,2025-05-22,219000.0,227500.0,-8500.0,-3.736264,3
26,76017546,SOLD,2025-04-04,225000.0,215000.0,10000.0,4.651163,0
42,76468961,SOLD,2024-10-31,350000.0,160000.0,190000.0,118.75,3


In [38]:
# Summary statistics for price changes
price_change_summary = (price_changes_analysis
    .agg({
        'price_change_absolute': ['mean', 'median', 'std', 'min', 'max'],
        'price_change_percentage': ['mean', 'median', 'std', 'min', 'max']
    })
    .round(2)
)
price_change_summary

Unnamed: 0,price_change_absolute,price_change_percentage
mean,-1247.94,0.92
median,-10000.0,-4.15
std,40227.41,19.53
min,-190000.0,-54.29
max,240079.0,118.75


In [39]:
# Calculate original listing vs final sale price analysis
listing_vs_sale_analysis = (all_events_for_properties_of_interest
    .query('event_event_name in ["LISTED_SALE", "RELISTED", "PRICE_CHANGE", "SOLD"]')
    .sort_values(['parcl_property_id', 'event_event_date'])
    .assign(
        # Mark first event for each property
        is_first_event = lambda x: x.groupby('parcl_property_id').cumcount() == 0,
        # Mark listing-type events
        is_listing_event = lambda x: x['event_event_name'].isin(['LISTED_SALE', 'RELISTED', 'PRICE_CHANGE']),
        # First listing could be LISTED_SALE, RELISTED, or PRICE_CHANGE if it's the first event
        is_first_listing = lambda x: x['is_first_event'] & x['is_listing_event'],
        
        # Mark valid sale events (SOLD with price > 0)
        is_valid_sale = lambda x: (x['event_event_name'] == 'SOLD') & (x['event_price'] > 0)
    )
    .groupby('parcl_property_id')
    .apply(lambda group: {
        'original_listing_price': group.loc[group['is_first_listing'], 'event_price'].iloc[0] 
            if group['is_first_listing'].any() else None,
        'final_sale_price': group.loc[group['is_valid_sale'], 'event_price'].iloc[-1] 
            if group['is_valid_sale'].any() else None
    })
    .apply(pd.Series)
    .dropna()  # Only keep properties with both valid listing and sale prices
    .assign(
        price_difference_absolute = lambda x: x['final_sale_price'] - x['original_listing_price'],
        price_difference_percentage = lambda x: (((x['final_sale_price'] - x['original_listing_price']) / x['original_listing_price']) * 100)
    )
)
listing_vs_sale_analysis.head(10)





Unnamed: 0_level_0,original_listing_price,final_sale_price,price_difference_absolute,price_difference_percentage
parcl_property_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
76017546,215000.0,225000.0,10000.0,4.651163
76468961,160000.0,350000.0,190000.0,118.75
81163110,260000.0,215000.0,-45000.0,-17.307692
83049510,245000.0,270019.0,25019.0,10.211837
84115018,240000.0,240000.0,0.0,0.0
91835509,200000.0,201875.0,1875.0,0.9375
99756922,195000.0,225625.0,30625.0,15.705128
110027738,280000.0,315000.0,35000.0,12.5
111123650,305000.0,270000.0,-35000.0,-11.47541
113497416,130000.0,266998.0,136998.0,105.383077


In [40]:

# Calculate comprehensive summary statistics
listing_vs_sale_summary = {
    'properties_analyzed': len(listing_vs_sale_analysis),
    'avg_original_listing_price': listing_vs_sale_analysis['original_listing_price'].mean(),
    'median_original_listing_price': listing_vs_sale_analysis['original_listing_price'].median(),
    'avg_final_sale_price': listing_vs_sale_analysis['final_sale_price'].mean(),
    'median_final_sale_price': listing_vs_sale_analysis['final_sale_price'].median(),
    'avg_absolute_difference': listing_vs_sale_analysis['price_difference_absolute'].mean(),
    'median_absolute_difference': listing_vs_sale_analysis['price_difference_absolute'].median(),
    'avg_percentage_difference': listing_vs_sale_analysis['price_difference_percentage'].mean(),
    'median_percentage_difference': listing_vs_sale_analysis['price_difference_percentage'].median(),
    'std_absolute_difference': listing_vs_sale_analysis['price_difference_absolute'].std(),
    'std_percentage_difference': listing_vs_sale_analysis['price_difference_percentage'].std()
}

# Show properties that sold above vs below listing
price_performance = (listing_vs_sale_analysis
    .assign(
        performance = lambda x: x['price_difference_absolute'].apply(
            lambda y: 'Sold Above Listing' if y > 0 else 'Sold Below Listing' if y < 0 else 'Sold at Listing'
        )
    )
    .groupby('performance')
    .agg({
        'price_difference_absolute': ['count', 'mean', 'median'],
        'price_difference_percentage': ['mean', 'median']
    })
    .round(2)
)

# Additional detailed statistics
detailed_stats = (listing_vs_sale_analysis
    .agg({
        'original_listing_price': ['count', 'mean', 'median', 'std', 'min', 'max'],
        'final_sale_price': ['count', 'mean', 'median', 'std', 'min', 'max'],
        'price_difference_absolute': ['mean', 'median', 'std', 'min', 'max'],
        'price_difference_percentage': ['mean', 'median', 'std', 'min', 'max']
    })
    .round(2)
)

# Print summary
print("=== LISTING VS SALE PRICE ANALYSIS ===")
print(f"Properties analyzed where final price is available and different to 0: {listing_vs_sale_summary['properties_analyzed']}")
print(f"Average original listing: ${listing_vs_sale_summary['avg_original_listing_price']:,.2f}")
print(f"Average final sale: ${listing_vs_sale_summary['avg_final_sale_price']:,.2f}")
print(f"Average absolute difference: ${listing_vs_sale_summary['avg_absolute_difference']:,.2f}")
print(f"Average percentage difference: {listing_vs_sale_summary['avg_percentage_difference']:.2f}%")
print(f"Median absolute difference: ${listing_vs_sale_summary['median_absolute_difference']:,.2f}")
print(f"Median percentage difference: {listing_vs_sale_summary['median_percentage_difference']:.2f}%")

print("\n=== PERFORMANCE BREAKDOWN ===")
print(price_performance)

print("\n=== DETAILED STATISTICS ===")
print(detailed_stats)

=== LISTING VS SALE PRICE ANALYSIS ===
Properties analyzed where final price is available and different to 0: 38
Average original listing: $218,828.95
Average final sale: $240,782.84
Average absolute difference: $21,953.89
Average percentage difference: 14.14%
Median absolute difference: $2,692.00
Median percentage difference: 1.49%

=== PERFORMANCE BREAKDOWN ===
                   price_difference_absolute                     \
                                       count      mean   median   
performance                                                       
Sold Above Listing                        20  58726.60  32994.0   
Sold Below Listing                        17 -20016.71 -12748.0   
Sold at Listing                            1      0.00      0.0   

                   price_difference_percentage         
                                          mean median  
performance                                            
Sold Above Listing                       33.85  16.44  
Sold Be

In [41]:
import plotly.express as px
import plotly.graph_objects as go

# First, let's combine the days on market with price changes data
scatter_plot_data = (all_events_for_properties_of_interest
    .pipe(lambda df: 
        # Get days on market data
        df.query('event_event_name in ["LISTED_SALE", "RELISTED", "PRICE_CHANGE", "SOLD"]')
        .sort_values(['parcl_property_id', 'event_event_date'])
        .assign(
            has_sold = lambda x: x.groupby('parcl_property_id')['event_event_name'].transform(lambda y: 'SOLD' in y.values),
            is_first_event = lambda x: x.groupby('parcl_property_id').cumcount() == 0,
            normalized_event_name = lambda x: np.where(
                (x['has_sold']) & 
                (x['is_first_event']) & 
                (x['event_event_name'].isin(['RELISTED', 'PRICE_CHANGE'])),
                'LISTED_SALE',
                x['event_event_name']
            )
        )
        .query('normalized_event_name in ["LISTED_SALE", "SOLD"]')
        .groupby(['parcl_property_id', 'normalized_event_name'])['event_event_date'].first()
        .unstack('normalized_event_name')
        .dropna()
        .assign(days_on_market = lambda x: (x['SOLD'] - x['LISTED_SALE']).dt.days)
        .query('days_on_market >= 0')
        .reset_index()
    )
    .merge(
        # Get price changes count
        all_events_for_properties_of_interest
        .groupby('parcl_property_id')['event_event_name']
        .apply(lambda x: (x == 'PRICE_CHANGE').sum())
        .reset_index(name='price_cuts'),
        on='parcl_property_id'
    )
    .query('price_cuts > 0')  # Exclude properties with 0 price cuts
    .merge(
        # Get property address and combine fields
        all_events_for_properties_of_interest[['parcl_property_id', 'property_metadata_address1', 'property_metadata_address2', 'property_metadata_city']]
        .drop_duplicates()
        .assign(
            full_address = lambda x: (
                x['property_metadata_address1'].fillna('').astype(str) + ' ' +
                x['property_metadata_address2'].fillna('').astype(str) + ', ' +
                x['property_metadata_city'].fillna('').astype(str)
            ).str.replace(r'\s+', ' ', regex=True).str.strip().str.rstrip(',')
        ),
        on='parcl_property_id'
    )
    .assign(
        hover_text = lambda x: x['full_address'] + '<br>' + 
                              'Days on Market: ' + x['days_on_market'].astype(str) + '<br>' +
                              'Price Cuts: ' + x['price_cuts'].astype(str)
    )
)

# Create the scatter plot
fig = px.scatter(
    scatter_plot_data,
    x='days_on_market',
    y='price_cuts',
    hover_data={'full_address': True, 'parcl_property_id': True},
    title='Days on Market vs. Number of Price Cuts (Properties with Price Cuts Only)',
    labels={
        'days_on_market': 'Days on Market',
        'price_cuts': 'Number of Price Cuts'
    },
    template='plotly_white'
)

# Customize the hover template
fig.update_traces(
    hovertemplate='<b>%{customdata[0]}</b><br>' +
                  'Days on Market: %{x}<br>' +
                  'Price Cuts: %{y}<br>' +
                  'Property ID: %{customdata[1]}<extra></extra>',
    customdata=scatter_plot_data[['full_address', 'parcl_property_id']].values
)

# Update layout for better appearance
fig.update_layout(
    width=800,
    height=600,
    title_font_size=16,
    xaxis_title_font_size=14,
    yaxis_title_font_size=14
)

# Show the plot
fig.show()

# Print summary statistics for context
print("=== SCATTER PLOT DATA SUMMARY (Properties with Price Cuts Only) ===")
print(f"Total properties plotted: {len(scatter_plot_data)}")
print(f"Average days on market: {scatter_plot_data['days_on_market'].mean():.1f}")
print(f"Average price cuts: {scatter_plot_data['price_cuts'].mean():.1f}")
print(f"Max days on market: {scatter_plot_data['days_on_market'].max()}")
print(f"Max price cuts: {scatter_plot_data['price_cuts'].max()}")
print(f"Min price cuts: {scatter_plot_data['price_cuts'].min()}")  # Should be 1 now

=== SCATTER PLOT DATA SUMMARY (Properties with Price Cuts Only) ===
Total properties plotted: 24
Average days on market: 116.2
Average price cuts: 1.8
Max days on market: 232
Max price cuts: 6
Min price cuts: 1
