# Building-level Deltas of Energy Usage By Property Type

**Author:** Kiki Mei  
**Date:** October 26, 2025    
**Data Source:** Chicago Energy Benchmarking Dataset (2014-2023) - City of Chicago Open Data Portal  

## Section 1: Data Set up

In [None]:
# load in local helper functions
from utils.data_utils import clean_property_type, concurrent_buildings, load_data

In [None]:
import pandas as pd

# Load the data loader function
energy_data = load_data()

# Energy subset that includes only concurrent buildings appear across 2016-2023
energy_data = concurrent_buildings(energy_data, 2016, 2023)
energy_data = clean_property_type(energy_data)

print(
    f"Loaded dataset with {energy_data.shape[0]:,} rows and {energy_data.shape[1]} columns."
)
energy_data.head()

Load in the refactored data_utils.py to load the energy dataset. The default dataset is the concurrent buildings after 2016.

In [None]:
energy_data = energy_data[
    energy_data["Primary Property Type"].notna()
    & (energy_data["Primary Property Type"].str.lower() != "nan")
]
print(energy_data.shape)

energy_data.groupby("Primary Property Type")["ID"].nunique().reset_index().sort_values(
    by="ID", ascending=False
).head(10)

Check the composition of buildings in this time period. Most of them are multifamily housing.

## Section 2: Building Level Tendency by Property Type

We could start to analyze the tendency of key metrics of energy performance over year at property-type-level. Key metrics would include: Electricity Use (kBtu), Natural Gas Use (kBtu), District Steam Use (kBtu), GHG Intensity (kg CO2e/sq ft), Site EUI (kBtu/sq ft), Source EUI (kBtu/sq ft), Weather Normalized Site EUI (kBtu/sq ft), or Weather Normalized Source EUI (kBtu/sq ft). 

The overall tendency would be visualized as the median of energy metrics per year to reduce the impact of outliers.

Using Altair package for the layout and interactive function. Visualization focus on the top 8 categories for visibility.

In [None]:
import altair as alt

In [None]:
# load in local helper functions
from utils.plot_utils import (
    plot_delta_property_chart,
    plot_energy_persistence_by_year,
    plot_metric_by_property,
)

Load in the plot_utils.py with the helper functions for plotting the tendency.

In [None]:
# Keep only top 8 most common property types overall
top_types = energy_data["Primary Property Type"].value_counts().nlargest(8).index

top_energy = energy_data[energy_data["Primary Property Type"].isin(top_types)].copy()
print(top_energy.shape)
top_energy.head()

Keep only the top 8 most common property types for visibility of graphs.

In [None]:
median_site = plot_metric_by_property(
    df=top_energy, metric_col="Site EUI (kBtu/sq ft)", agg_func=pd.Series.median
)
median_site

In [None]:
median_source = plot_metric_by_property(
    df=top_energy, metric_col="Source EUI (kBtu/sq ft)", agg_func=pd.Series.median
)
median_source

In [16]:
median_elec = plot_metric_by_property(
    df=top_energy, metric_col="Electricity Use (kBtu)", agg_func=pd.Series.median
)
median_elec

In [18]:
median_ghg = plot_metric_by_property(
    df=top_energy, metric_col="GHG Intensity (kg CO2e/sq ft)", agg_func=pd.Series.median
)

## Section 3: Year-over-year change of key metrics by Property Type

We would further build interactive dashboards for exploring the year-over-year change of key metrics by property type.

In [None]:
alt.data_transformers.disable_max_rows()

In [None]:
delta_site = plot_delta_property_chart(
    df=top_energy,
    metric_col="Site EUI (kBtu/sq ft)",
)
delta_site

Multifamily housing shows large increase in Site EUI in 2020, but then huge drop in 2021. The building is The Wellington Condominium Association. Another building with weird drop of Site EUI in 2022 is Frank O Lowden Homes.

In [None]:
delta_source = plot_delta_property_chart(
    df=top_energy,
    metric_col="Electricity Use (kBtu)",
)
delta_source

Comparing to Site EUI, the rebounce of electricity use is much higher, after 2021 that COVID has passed and the companies went back to normal operation.  It appears obviously in the office category. The representative building with a sharp drop of electricity use in 2020 is Willis Tower. And most of them show increasing electricity use recovered, with example of 425 South Financial.

Mixed use property with large increase in electricity use in 2021 and huge drop in 2022 is Olympia Centre.

## Section 4: Compute Consecutive-Year Deltas

Key question: If a building’s energy use decreases from year N to N+1, is it likely to continue decreasing from N+1 to N+2 (momentum), or to increase instead (reversal or random walk)?

To answer this question, we should focus on the correlation between the change of energy usage between year N to N+1 and year N+1 to N+2.

In [None]:
cols = ["ID", "Data Year", "Primary Property Type", "Site EUI (kBtu/sq ft)"]
site_df = top_energy[cols].dropna().copy()

Focus on Site EUI for this part, we will create dataframes of delta and lagged to see the relationship between consecutive years.

In [None]:
site_df["Data Year"] = site_df["Data Year"].astype(int)
site_df["ID"] = site_df["ID"].astype(str)
site_df["Primary Property Type"] = site_df["Primary Property Type"].astype(str)
site_df["Site EUI (kBtu/sq ft)"] = pd.to_numeric(
    site_df["Site EUI (kBtu/sq ft)"], errors="coerce"
)
site_df = site_df.dropna(subset=["Site EUI (kBtu/sq ft)"])

In [20]:
import warnings

with warnings.catch_warnings():
    warnings.simplefilter("ignore", FutureWarning)
    df_delta = (
        site_df.sort_values(["ID", "Data Year"])
        .groupby("ID", group_keys=False)
        .apply(
            lambda g: g.assign(Delta=g["Site EUI (kBtu/sq ft)"].diff()),
            include_groups=True,
        )
        .dropna(subset=["Delta"])
        .reset_index(drop=True)
    )

In [None]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore", FutureWarning)
    df_lagged = (
        df_delta.sort_values(["Primary Property Type", "ID", "Data Year"])
        .groupby(["Primary Property Type", "ID"])
        .apply(lambda g: g.assign(Delta_next=g["Delta"].shift(-1)), include_groups=True)
        .dropna(subset=["Delta", "Delta_next"])
        .reset_index(drop=True)
    )

In [None]:
corrs = (
    df_lagged.groupby("Primary Property Type")[["Delta", "Delta_next"]]
    .corr()
    .iloc[0::2, -1]
    .reset_index()
    .rename(columns={"Delta_next": "Persistence (Δₜ → Δₜ₊₁)"})
)
corrs

From the correlation, we could see that mostly the persistence after time is negative, which means the Site EUI energy usage tend to decrease if the energy use before is positive.

In [None]:
plot_energy_persistence_by_year(df_lagged)

We make a distinction in the points before and after 2019, to see how the time period influence the delta change and correlation.
As we can see most of the energy persistence points are gathered as negatively correlation before 2019, but the slope is small. Which means that a small increase in energy use from N to N+1 would tend to correlate with a small decrease in energy use from N+1 to N+2. However, the slopes after 2019 is larger negative, which means that small increase in N to N+1 year would tend to correlate with a big decrease in energy use in N+1 to N+2 year.

## Section 5: Buildings divided by Affected extent by COVID

As suggested in last week, we would categorize building types affected by COVID into three groups: permanent effects (e.g. downtown office buildings), temporary effects that rebound in 2022/2023 (maybe schools), not much effect or even increased energy use (multifamily housing, hospitals).

First check the composition and then grouped them. We will pick the top 10 categories, as it includes most of the buildings in this dataset.

In [None]:
energy_data.groupby("Primary Property Type")["ID"].nunique().reset_index().sort_values(
    by="ID", ascending=False
).head(10)

In [None]:
top_10_types = energy_data["Primary Property Type"].value_counts().nlargest(10).index

top_10 = energy_data[energy_data["Primary Property Type"].isin(top_10_types)].copy()
print(top_10.shape)

In [None]:
permanent_effects = ["office", "hotel", "retail store"]

temporary_rebound = ["k-12 school", "college/university"]

stable_or_increased = [
    "multifamily housing",
    "residential",
    "senior care community",
    "senior living community",
    "supermarket/grocery store",
    "residence hall/dormitory",
]

We will group the property types in top 10 most common property types into the three groups of buildings affected differently by COVID, then compare the energy usage tendency of these three groups.

In [None]:
impact_categories = []
for ptype in top_10["Primary Property Type"]:
    if ptype in permanent_effects:
        impact_categories.append("Permanent COVID Effects")
    elif ptype in temporary_rebound:
        impact_categories.append("Temporary Effects (Rebounded)")
    elif ptype in stable_or_increased:
        impact_categories.append("Stable or Increased Usage")
    else:
        impact_categories.append("Other/Unclassified")

# Add new column
top_10["COVID Impact Category"] = impact_categories

In [34]:
top_10.groupby(["Primary Property Type", "COVID Impact Category"])[
    "ID"
].nunique().reset_index().sort_values(by="ID", ascending=False)

Unnamed: 0,Primary Property Type,COVID Impact Category,ID
4,multifamily housing,Stable or Increased Usage,1169
2,k-12 school,Temporary Effects (Rebounded),379
5,office,Permanent COVID Effects,313
0,college/university,Temporary Effects (Rebounded),75
1,hotel,Permanent COVID Effects,67
8,senior care community,Stable or Increased Usage,52
7,retail store,Permanent COVID Effects,51
9,supermarket/grocery store,Stable or Increased Usage,48
3,mixed use property,Other/Unclassified,36
6,residence hall/dormitory,Stable or Increased Usage,25


The unclassified category are mixed-use property.

In [None]:
covid_site = plot_metric_by_property(
    df=top_10,
    metric_col="Site EUI (kBtu/sq ft)",
    property_col="COVID Impact Category",
    agg_func=pd.Series.median,
)
covid_site

In [None]:
covid_elec = plot_metric_by_property(
    df=top_10,
    metric_col="Electricity Use (kBtu)",
    property_col="COVID Impact Category",
    agg_func=pd.Series.median,
)
covid_elec

Before 2019, all groups show modest variation or slight increases in median Site EUI, indicating that energy use per square foot was largely stable prior to policy intervention. After the Placard system’s introduction in 2019, every category shows a noticeable downward shift in EUI between 2019–2020, suggesting a broad efficiency improvement or reporting shift coinciding with the policy’s public-rating incentive.

The persistence and rebound patterns afterward differ by category:
- Permanent COVID Effects (offices, hotels, retail) maintain lower EUI and electricity use after 2020, consistent with remote work and reduced building occupancy.
- Temporary Effects (Rebounded)—schools and universities—drop sharply during 2020 but recover by 2022–2023 as in-person activity resumed.
- Stable or Increased Usage properties (multifamily housing, hospitals, care facilities) remain relatively steady, reflecting continuous occupancy and essential operation through the pandemic.