<center>
<h1>Welcome to the Lab 🥼🧪</h1>
</center>

### How can I analyze dispostions by SFR operators and how profitable those sales were

In this notebook, we will be examining homes solf by Invitation Homes in the Houston MSA market for the last 12 months (as of July 22 2025). We will use the Parcl Labs API to get the event history for the properties of interest.

#### Need help getting started?

As a reminder, you can get your Parcl Labs API key [here](https://app.parcllabs.com/) to follow along.

To run this immediately, you can use Google Colab.

Run in collab --> [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ParclLabs/parcllabs-cookbook/blob/main/examples/experimental/inventory_analysis_invitation_homes.ipynb)

In [None]:
# install most recent version of parcllabs
%pip install --upgrade parcllabs

In [None]:
# import the libraries for the analysis
import os
import pandas as pd
import numpy as np
from parcllabs import ParclLabsClient


# Create a ParclLabsClient instance
client = ParclLabsClient(
    api_key=os.environ.get('PARCL_LABS_API_KEY', "<your Parcl Labs API key if not set as environment variable>"), 
    limit=1000, 
)

In [None]:
# search for Houston market
markets = client.search.markets.retrieve(
    query = 'Houston',
    location_type = 'CBSA',
    sort_by='TOTAL_POPULATION',  # Sort by total population
    sort_order='DESC',           # In descending order
    limit=10                    # Limit results to top 100 metros
)
markets

In [None]:
# save the parcl id of the houston market, in this case it is the first market 
# in the list with index 0
market_for_analysis_id = markets.iloc[0]['parcl_id']
print('Houston market parcl id: ', market_for_analysis_id)

In [None]:
# set the data frame to show all columns
pd.set_option('display.max_columns', None)

In [None]:
# now we will get the event history for the properties of interest using the same endpoint as before
# this will allow us to get the average time in market
df_properties_listed_last_12_months, metadata = client.property_v2.search.retrieve(
    parcl_ids=[market_for_analysis_id],
    event_names=['ALL_LISTINGS'], # search all listings events
    max_event_date="2025-07-22", # get the events for the last 12 months
    min_event_date="2024-07-22",
    include_property_details=True, # include the property details in the response
    owner_name=["INVITATION_HOMES"], # only get the properties that are owned by Invitation Homes at the time of the event
    limit = 10000 # set a high limit to get all the events
    )
all_properties_of_interest = df_properties_listed_last_12_months.parcl_property_id.unique().tolist()
print(f'We found {len(all_properties_of_interest)} properties that were listed for sale in the last 12 months')
print(f'total number of rows: {len(df_properties_listed_last_12_months)}')
df_properties_listed_last_12_months.head()

In [None]:
# now lets get all the event data for those properties
# we dont include the owner name here 
# to make sure we also get whena property was sold by invitation homes
# and is no longer associated to them 
all_events_for_properties_of_interest, metadata = client.property_v2.search.retrieve(
    parcl_property_ids=all_properties_of_interest,
    event_names=['ALL_SOLD','ALL_LISTINGS'], # search all listings and all sold events
    max_event_date="2025-07-22", # get the events for the last 12 months
    min_event_date="2024-07-22",
    include_property_details=True, # include the property details in the response
    
    limit = 10000 # set a high limit to get all the events
    )
print(len(all_events_for_properties_of_interest))
all_events_for_properties_of_interest.head()

In [None]:
# we can save this data to a csv file
all_events_for_properties_of_interest.to_csv('all_events_for_properties_of_interest_invitation_homes_072225.csv', index=False)

In [None]:
# Lets find number of properties listed for sale in the last 12 months
number_of_unique_properties_listed_for_sale_in_12_months = (all_events_for_properties_of_interest
                                                .query('event_event_name == "LISTED_SALE"')
                                                .query('event_event_date >= "2024-07-22"')
                                                .parcl_property_id.nunique())
print(f'Number of unique properties listed for sale in the last 12 months: {number_of_unique_properties_listed_for_sale_in_12_months}')


In [None]:
# now lets do how many were sold in the last 12 months
number_of_unique_properties_sold_in_12_months = (all_events_for_properties_of_interest
                                                .query('event_event_name == "SOLD"')
                                                .parcl_property_id.nunique())
print(f'Number of unique properties sold in the last 12 months: {number_of_unique_properties_sold_in_12_months}')
number_of_unique_properties_sold_in_12_months_ids = (all_events_for_properties_of_interest
                                                .query('event_event_name == "SOLD"')
                                                .parcl_property_id.unique().tolist())

In [None]:
# get a list of the properties that actually sold
properties_sold_in_12_months = (all_events_for_properties_of_interest
                                .query('event_event_name == "SOLD"')
                                .parcl_property_id.unique().tolist())
print(len(properties_sold_in_12_months))

In [None]:
# check values for a specific property
all_events_for_properties_of_interest.query('parcl_property_id==75571945').sort_values(by='event_event_date', ascending=True)


In [None]:
# Ensure the 'event_event_date' column is of datetime type before any calculations
all_events_for_properties_of_interest['event_event_date'] = pd.to_datetime(all_events_for_properties_of_interest['event_event_date'])

# Calculate days from first listing event to first SOLD event for each property
days_on_market_df = (all_events_for_properties_of_interest
    .query('parcl_property_id in @properties_sold_in_12_months')
    .query('event_event_name in ["LISTED_SALE", "RELISTED", "PRICE_CHANGE", "SOLD"]')
    .sort_values(['parcl_property_id', 'event_event_date'])
    .assign(
        # Mark properties that have SOLD events
        has_sold = lambda x: x.groupby('parcl_property_id')['event_event_name'].transform(lambda y: 'SOLD' in y.values),
        # Mark the first event for each property
        is_first_event = lambda x: x.groupby('parcl_property_id').cumcount() == 0,
        # Create normalized event name - only transform if it's first event AND property has SOLD
        normalized_event_name = lambda x: np.where(
            (x['has_sold']) & 
            (x['is_first_event']) & 
            (x['event_event_name'].isin(['RELISTED', 'PRICE_CHANGE'])),
            'LISTED_SALE',
            x['event_event_name']
        )
    )
    .query('normalized_event_name in ["LISTED_SALE", "SOLD"]')
    .groupby(['parcl_property_id', 'normalized_event_name'])['event_event_date'].first()
    .unstack('normalized_event_name')
    .dropna()  # Only keep properties with both LISTED_SALE and SOLD events
    .assign(days_on_market = lambda x: (x['SOLD'] - x['LISTED_SALE']).dt.days)
    .query('days_on_market >= 0')  # Ensure SOLD comes after LISTED_SALE
)

# Calculate average days on market
avg_days_on_market = days_on_market_df['days_on_market'].mean()
median_days_on_market = days_on_market_df['days_on_market'].median()
print(f'Average days on market: {avg_days_on_market}')
print(f'Median days on market: {median_days_on_market}')
print(f'Number of properties sold used for the analysis: {len(days_on_market_df)}')
days_on_market_df.head(15)

In [None]:
# Calculate price changes (absolute and percentage) for properties with price changes
price_changes_analysis = (all_events_for_properties_of_interest
    .query('event_event_name in ["LISTED_SALE", "PRICE_CHANGE", "RELISTED", "SOLD"]')  # Events with prices
    .query('event_price>0')
    .sort_values(['parcl_property_id', 'event_event_date'], ascending=[True, True])
    .assign(
        previous_price = lambda x: x.groupby('parcl_property_id')['event_price'].shift(1),
        price_change_absolute = lambda x: x['event_price'] - x['previous_price'],
        price_change_percentage = lambda x: ((x['event_price'] - x['previous_price']) / x['previous_price'] * 100)
    )
    .assign(
        is_price_change = lambda x: (
            (x['event_event_name'].isin(['PRICE_CHANGE', 'RELISTED','LISTED_SALE'])) & 
            (x['previous_price'].notna()) &  # Must have a previous price
            (x['price_change_absolute'] != 0)  # Price must actually change
        )
    )
    .assign(
        price_changes_per_property = lambda x: x.groupby('parcl_property_id')['is_price_change'].transform('sum')
    )
    .query('event_price > 0')  # remove properties with no price
    .query('is_price_change == True or event_event_name == "SOLD"')  # Show price changes and final sales
    # can be modified to only show is_price_change, meaning a true price change
    .loc[:, ['parcl_property_id', 'event_event_name', 'event_event_date', 
            'event_price', 'previous_price', 
            'price_change_absolute', 'price_change_percentage', 
            'price_changes_per_property']]
)

print(f'Number of properties with price changes: {price_changes_analysis.parcl_property_id.nunique()}')

# Summary statistics for price changes per property
price_change_counts_summary = (price_changes_analysis
    .groupby('parcl_property_id')['price_changes_per_property']
    .first()  # Get unique count per property
    .describe()
)

print(f'Price changes per property statistics:')
print(price_change_counts_summary)

price_changes_analysis.head(10)

In [None]:
# Summary statistics for price changes
price_change_summary = (price_changes_analysis
    .agg({
        'price_change_absolute': ['mean', 'median', 'std', 'min', 'max'],
        'price_change_percentage': ['mean', 'median', 'std', 'min', 'max']
    })
    .round(2)
)
price_change_summary

In [None]:
# Calculate original listing vs final sale price analysis
listing_vs_sale_analysis = (all_events_for_properties_of_interest
    .query('event_event_name in ["LISTED_SALE", "RELISTED", "PRICE_CHANGE", "SOLD"]')
    .sort_values(['parcl_property_id', 'event_event_date'])
    .assign(
        # Mark first event for each property
        is_first_event = lambda x: x.groupby('parcl_property_id').cumcount() == 0,
        # Mark listing-type events
        is_listing_event = lambda x: x['event_event_name'].isin(['LISTED_SALE', 'RELISTED', 'PRICE_CHANGE']),
        # First listing could be LISTED_SALE, RELISTED, or PRICE_CHANGE if it's the first event
        is_first_listing = lambda x: x['is_first_event'] & x['is_listing_event'],
        
        # Mark valid sale events (SOLD with price > 0)
        is_valid_sale = lambda x: (x['event_event_name'] == 'SOLD') & (x['event_price'] > 0)
    )
    .groupby('parcl_property_id')
    .apply(lambda group: {
        'original_listing_price': group.loc[group['is_first_listing'], 'event_price'].iloc[0] 
            if group['is_first_listing'].any() else None,
        'final_sale_price': group.loc[group['is_valid_sale'], 'event_price'].iloc[-1] 
            if group['is_valid_sale'].any() else None
    })
    .apply(pd.Series)
    .dropna()  # Only keep properties with both valid listing and sale prices
    .assign(
        price_difference_absolute = lambda x: x['final_sale_price'] - x['original_listing_price'],
        price_difference_percentage = lambda x: (((x['final_sale_price'] - x['original_listing_price']) / x['original_listing_price']) * 100)
    )
)
listing_vs_sale_analysis.head(10)

In [None]:

# Calculate comprehensive summary statistics
listing_vs_sale_summary = {
    'properties_analyzed': len(listing_vs_sale_analysis),
    'avg_original_listing_price': listing_vs_sale_analysis['original_listing_price'].mean(),
    'median_original_listing_price': listing_vs_sale_analysis['original_listing_price'].median(),
    'avg_final_sale_price': listing_vs_sale_analysis['final_sale_price'].mean(),
    'median_final_sale_price': listing_vs_sale_analysis['final_sale_price'].median(),
    'avg_absolute_difference': listing_vs_sale_analysis['price_difference_absolute'].mean(),
    'median_absolute_difference': listing_vs_sale_analysis['price_difference_absolute'].median(),
    'avg_percentage_difference': listing_vs_sale_analysis['price_difference_percentage'].mean(),
    'median_percentage_difference': listing_vs_sale_analysis['price_difference_percentage'].median(),
    'std_absolute_difference': listing_vs_sale_analysis['price_difference_absolute'].std(),
    'std_percentage_difference': listing_vs_sale_analysis['price_difference_percentage'].std()
}

# Show properties that sold above vs below listing
price_performance = (listing_vs_sale_analysis
    .assign(
        performance = lambda x: x['price_difference_absolute'].apply(
            lambda y: 'Sold Above Listing' if y > 0 else 'Sold Below Listing' if y < 0 else 'Sold at Listing'
        )
    )
    .groupby('performance')
    .agg({
        'price_difference_absolute': ['count', 'mean', 'median'],
        'price_difference_percentage': ['mean', 'median']
    })
    .round(2)
)

# Additional detailed statistics
detailed_stats = (listing_vs_sale_analysis
    .agg({
        'original_listing_price': ['count', 'mean', 'median', 'std', 'min', 'max'],
        'final_sale_price': ['count', 'mean', 'median', 'std', 'min', 'max'],
        'price_difference_absolute': ['mean', 'median', 'std', 'min', 'max'],
        'price_difference_percentage': ['mean', 'median', 'std', 'min', 'max']
    })
    .round(2)
)

# Print summary
print("=== LISTING VS SALE PRICE ANALYSIS ===")
print(f"Properties analyzed where final price is available and different to 0: {listing_vs_sale_summary['properties_analyzed']}")
print(f"Average original listing: ${listing_vs_sale_summary['avg_original_listing_price']:,.2f}")
print(f"Average final sale: ${listing_vs_sale_summary['avg_final_sale_price']:,.2f}")
print(f"Average absolute difference: ${listing_vs_sale_summary['avg_absolute_difference']:,.2f}")
print(f"Average percentage difference: {listing_vs_sale_summary['avg_percentage_difference']:.2f}%")
print(f"Median absolute difference: ${listing_vs_sale_summary['median_absolute_difference']:,.2f}")
print(f"Median percentage difference: {listing_vs_sale_summary['median_percentage_difference']:.2f}%")

print("\n=== PERFORMANCE BREAKDOWN ===")
print(price_performance)

print("\n=== DETAILED STATISTICS ===")
print(detailed_stats)

In [None]:
import plotly.express as px
import plotly.graph_objects as go

# First, let's combine the days on market with price changes data
scatter_plot_data = (all_events_for_properties_of_interest
    .pipe(lambda df: 
        # Get days on market data
        df.query('event_event_name in ["LISTED_SALE", "RELISTED", "PRICE_CHANGE", "SOLD"]')
        .sort_values(['parcl_property_id', 'event_event_date'])
        .assign(
            has_sold = lambda x: x.groupby('parcl_property_id')['event_event_name'].transform(lambda y: 'SOLD' in y.values),
            is_first_event = lambda x: x.groupby('parcl_property_id').cumcount() == 0,
            normalized_event_name = lambda x: np.where(
                (x['has_sold']) & 
                (x['is_first_event']) & 
                (x['event_event_name'].isin(['RELISTED', 'PRICE_CHANGE'])),
                'LISTED_SALE',
                x['event_event_name']
            )
        )
        .query('normalized_event_name in ["LISTED_SALE", "SOLD"]')
        .groupby(['parcl_property_id', 'normalized_event_name'])['event_event_date'].first()
        .unstack('normalized_event_name')
        .dropna()
        .assign(days_on_market = lambda x: (x['SOLD'] - x['LISTED_SALE']).dt.days)
        .query('days_on_market >= 0')
        .reset_index()
    )
    .merge(
        # Get price changes count
        all_events_for_properties_of_interest
        .groupby('parcl_property_id')['event_event_name']
        .apply(lambda x: (x == 'PRICE_CHANGE').sum())
        .reset_index(name='price_cuts'),
        on='parcl_property_id'
    )
    .query('price_cuts > 0')  # Exclude properties with 0 price cuts
    .merge(
        # Get property address and combine fields
        all_events_for_properties_of_interest[['parcl_property_id', 'property_metadata_address1', 'property_metadata_address2', 'property_metadata_city']]
        .drop_duplicates()
        .assign(
            full_address = lambda x: (
                x['property_metadata_address1'].fillna('').astype(str) + ' ' +
                x['property_metadata_address2'].fillna('').astype(str) + ', ' +
                x['property_metadata_city'].fillna('').astype(str)
            ).str.replace(r'\s+', ' ', regex=True).str.strip().str.rstrip(',')
        ),
        on='parcl_property_id'
    )
    .assign(
        hover_text = lambda x: x['full_address'] + '<br>' + 
                              'Days on Market: ' + x['days_on_market'].astype(str) + '<br>' +
                              'Price Cuts: ' + x['price_cuts'].astype(str)
    )
)

# Create the scatter plot
fig = px.scatter(
    scatter_plot_data,
    x='days_on_market',
    y='price_cuts',
    hover_data={'full_address': True, 'parcl_property_id': True},
    title='Days on Market vs. Number of Price Cuts (Properties with Price Cuts Only)',
    labels={
        'days_on_market': 'Days on Market',
        'price_cuts': 'Number of Price Cuts'
    },
    template='plotly_white'
)

# Customize the hover template
fig.update_traces(
    hovertemplate='<b>%{customdata[0]}</b><br>' +
                  'Days on Market: %{x}<br>' +
                  'Price Cuts: %{y}<br>' +
                  'Property ID: %{customdata[1]}<extra></extra>',
    customdata=scatter_plot_data[['full_address', 'parcl_property_id']].values
)

# Update layout for better appearance
fig.update_layout(
    width=800,
    height=600,
    title_font_size=16,
    xaxis_title_font_size=14,
    yaxis_title_font_size=14
)

# Show the plot
fig.show()

# Print summary statistics for context
print("=== SCATTER PLOT DATA SUMMARY (Properties with Price Cuts Only) ===")
print(f"Total properties plotted: {len(scatter_plot_data)}")
print(f"Average days on market: {scatter_plot_data['days_on_market'].mean():.1f}")
print(f"Average price cuts: {scatter_plot_data['price_cuts'].mean():.1f}")
print(f"Max days on market: {scatter_plot_data['days_on_market'].max()}")
print(f"Max price cuts: {scatter_plot_data['price_cuts'].max()}")
print(f"Min price cuts: {scatter_plot_data['price_cuts'].min()}")  # Should be 1 now