# Building-level analysis and Within-building Variation Manipulation

**Author:** Kiki Mei  
**Date:** November 13, 2025    
**Data Source:** Chicago Energy Benchmarking Dataset (2014-2023) - City of Chicago Open Data Portal  

## Section 1: Data Set up

In [None]:
# load in local helper functions
import altair as alt

from utils.data_utils import (
    clean_property_type,
    concurrent_buildings,
    covid_impact_category,
    load_data,
)
from utils.plot_utils import plot_delta_divergence, plot_delta_kernel_density

In [4]:
energy_data = load_data()
energy_data = concurrent_buildings(energy_data, 2016, 2023)
energy_data = clean_property_type(energy_data)
energy_data = covid_impact_category(energy_data)
energy_data = energy_data[
    energy_data["Primary Property Type"].notna()
    & (energy_data["Primary Property Type"].str.lower() != "nan")
]

print(
    f"Loaded dataset with {energy_data.shape[0]:,} rows and {energy_data.shape[1]} columns."
)
energy_data.head()

2025-11-19 04:31:50,628 [INFO] ✅ COVID Impact Category assignment (with 'Other' group) complete.
2025-11-19 04:31:50,632 [INFO] Category counts:
COVID Impact Category
Other                    985
Permanent               2370
Stable/Increased       10171
Temporary/Rebounded     5378


Loaded dataset with 18,854 rows and 31 columns.


Unnamed: 0,Data Year,ID,Property Name,Address,ZIP Code,Community Area,Primary Property Type,Gross Floor Area - Buildings (sq ft),Year Built,# of Buildings,...,GHG Intensity (kg CO2e/sq ft),Latitude,Longitude,Location,Reporting Status,Chicago Energy Rating,Exempt From Chicago Energy Rating,Water Use (kGal),Row_ID,COVID Impact Category
24486,2016,116336,lasalle private residences,1212 N LaSalle,60610,near north side,multifamily housing,367627.0,1986.0,1.0,...,7.0,41.904201,-87.633825,point (-87.63382507 41.90420084),,,,,,Stable/Increased
24496,2016,101745,161 north clark,161 North Clark,60601,loop,office,1200836.0,1992.0,1.0,...,12.9,41.884905,-87.630518,point (-87.6305179 41.88490511),,,,,,Permanent
24495,2016,101448,1401 w roosevelt - 2017 resubmit,1401 W. Roosevelt,60608,near west side,residential,69385.0,2006.0,1.0,...,3.5,41.849153,-87.670896,point (-87.67089596 41.84915346),,,,,,Stable/Increased
24494,2016,159892,promontory corporation,5530-5532 S Shore Drive,60637,hyde park,multifamily housing,180351.0,1949.0,1.0,...,7.3,41.794687,-87.580465,point (-87.58046479 41.794687),,,,,,Stable/Increased
24493,2016,103602,190 south lasalle,190 South LaSalle,60603,loop,office,882560.0,1985.0,1.0,...,13.1,41.879756,-87.632687,point (-87.63268685 41.8797561),,,,,,Permanent


In [5]:
energy_data.groupby("Primary Property Type")["ID"].nunique().reset_index().sort_values(
    by="ID", ascending=False
).head(10)

Unnamed: 0,Primary Property Type,ID
12,multifamily housing,1059
5,k-12 school,377
14,office,300
20,residential,123
0,college/university,73
15,other,61
4,hotel,59
23,senior care community,51
22,retail store,42
8,mall,41


In [6]:
# Keep only top 10 most common property types overall
top_types = energy_data["Primary Property Type"].value_counts().nlargest(10).index

top_energy = energy_data[energy_data["Primary Property Type"].isin(top_types)].copy()
print(top_energy.shape)

(17231, 31)


## Section 2: Building-level distribution of year-over-year deltas

We would explore the building-level energy change with a divergence bar chart which plot the building-level energy change with negative and positive value diverging toward two sides.

In [None]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [24]:
plot_delta_divergence(
    df=energy_data,
    ptype="office",
    year_before=2018,
    year_after=2019,
    eui_col="Weather Normalized Site EUI (kBtu/sq ft)",
    name_col="Property Name",
)

Year-over-year building-level energy change from 2019 to 2020.

In [27]:
plot_delta_divergence(
    df=energy_data,
    ptype="office",
    year_before=2019,
    year_after=2020,
    eui_col="Weather Normalized Site EUI (kBtu/sq ft)",
    name_col="Property Name",
)

We will do the similar thing to multifamily housing, seeing the building energy change distribution for a stable/increased property type influenced by COVID.

In [25]:
plot_delta_divergence(
    df=energy_data,
    ptype="multifamily housing",
    year_before=2018,
    year_after=2019,
    eui_col="Weather Normalized Site EUI (kBtu/sq ft)",
    name_col="Property Name",
)

Year-over-year distribution of 2019-2020

In [26]:
plot_delta_divergence(
    df=energy_data,
    ptype="multifamily housing",
    year_before=2019,
    year_after=2020,
    eui_col="Weather Normalized Site EUI (kBtu/sq ft)",
    name_col="Property Name",
)

Delta EUI distribution BEFORE and AFTER placards (kernel density plots)

In [43]:
plot_delta_kernel_density(
    energy_data,
    property_type="office",
    metric_col="Weather Normalized Site EUI (kBtu/sq ft)",
)

In [42]:
plot_delta_kernel_density(
    energy_data,
    property_type="multifamily housing",
    metric_col="Weather Normalized Site EUI (kBtu/sq ft)",
)

Most buildings change very little year to year, Delta Site EUI tends to be between -10 and +10 kBtu/sq ft for the majority of buildings. Post-2019 curves are slightly lower and narrower for offices, suggesting less variation in Site EUI after the placard program (with covid impact). Post-2019 curves are slightly higher and broader for multifamily housing, displaying more variation in Site Eui after placard policy, and residential buildings being less affected by COVID shutdown cycles.

## Section 3: Within-Building Fixed-Effects Model

Q: How did the same building’s energy usage change before and after the 2019 placard rollout, net of any fixed differences (like size, design, or age)?

Fixed-effect within regression to address that specific question.

**Within-building fixed-effect model**

$$
EUI_{it}
= \alpha_i
+ \beta_1 \,\text{post\_placard}_{it}
+ \beta_2 \,\text{post\_covid}_{it}
+ \beta_3 \,\text{EnergyStar}_{it}
+ \varepsilon_{it}
$$

- alpha_i: building fixed effects  
- No property-type indicators needed (fixed effects absorb all time-invariant building characteristics)  
- COVID and Energy Star Score vary over time → allowed in fixed-effects model  

*This model estimates how changes within a building over time relate to changes in its energy intensity.*

In [47]:
# prepare the key indicator variables for model
cols = [
    "ID",
    "Data Year",
    "Primary Property Type",
    "COVID Impact Category",
    "ENERGY STAR Score",
    "Weather Normalized Site EUI (kBtu/sq ft)",
]
mod_data = top_energy[cols].dropna(subset=["Weather Normalized Site EUI (kBtu/sq ft)"])

year_2019 = 2019
year_2020 = 2020
mod_data["post_placard"] = (mod_data["Data Year"] >= year_2019).astype(int)
mod_data["post_covid"] = (mod_data["Data Year"] >= year_2020).astype(int)
mod_data = mod_data.set_index(["ID", "Data Year"])

In [None]:
import statsmodels.api as sm
from linearmodels.panel import PanelOLS

y = mod_data["Weather Normalized Site EUI (kBtu/sq ft)"]
X = mod_data[
    [
        "post_placard",
        "post_covid",
        "ENERGY STAR Score",
    ]
]

In [48]:
X = sm.add_constant(X)

model_fe = PanelOLS(
    y,
    X,
    entity_effects=True,  # ← THIS enforces within-building variation
).fit(cov_type="clustered", cluster_entity=True)

print(model_fe)

                                     PanelOLS Estimation Summary                                      
Dep. Variable:     Weather Normalized Site EUI (kBtu/sq ft)   R-squared:                        0.2244
Estimator:                                         PanelOLS   R-squared (Between):              0.2067
No. Observations:                                     13233   R-squared (Within):               0.2244
Date:                                      Wed, Nov 19 2025   R-squared (Overall):              0.2646
Time:                                              05:51:37   Log-likelihood                -5.864e+04
Cov. Estimator:                                   Clustered                                           
                                                              F-statistic:                      1083.4
Entities:                                              1999   P-value                           0.0000
Avg Obs:                                             6.6198   Distributio

Inputs contain missing values. Dropping rows with missing observations.
  super().__init__(dependent, exog, weights=weights, check_rank=check_rank)


Post_placard significant p-value (coef = -3.28): the average building reduced its weather-normalized site EUI by about 3.28 kBtu/sq ft, relative to its own pre-2019 levels, controlling for COVID and Energy Star Score. 

**The effect is within-building, not driven by differences across types of buildings.**

Model_fe_B: add interaction with permanent COVID impact (varies over time)

In [None]:
mod_data["covid_perm"] = (mod_data["COVID Impact Category"] == "Permanent").astype(int)
mod_data["placard_perm"] = mod_data["post_placard"] * mod_data["covid_perm"]

X_1 = sm.add_constant(
    mod_data[
        [
            "post_placard",
            "post_covid",
            "ENERGY STAR Score",
            "covid_perm",
            "placard_perm",
        ]
    ]
)

In [52]:
model_1 = PanelOLS(
    y,
    X_1,
    entity_effects=True,
).fit(cov_type="clustered", cluster_entity=True)

full_params = model_1.summary.tables[1]
print(full_params)

                                 Parameter Estimates                                 
                   Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
-------------------------------------------------------------------------------------
const                 133.61     2.9963     44.591     0.0000      127.73      139.48
post_placard         -1.8647     0.3977    -4.6888     0.0000     -2.6443     -1.0852
post_covid            1.1512     0.3571     3.2238     0.0013      0.4512      1.8512
ENERGY STAR Score    -0.8492     0.0450    -18.887     0.0000     -0.9373     -0.7610
covid_perm            7.6018     10.370     0.7330     0.4636     -12.726      27.929
placard_perm         -8.9161     0.6878    -12.964     0.0000     -10.264     -7.5680


Inputs contain missing values. Dropping rows with missing observations.
  super().__init__(dependent, exog, weights=weights, check_rank=check_rank)


* Post_placard = -1.865 (p < 0.001): after 2019, the average building reduced its own weather-normalized EUI by ~1.86 kBtu/sq ft.
* Post_covid = +1.151 (p = 0.0013): after 2020, buildings increased energy intensity by ~1.15 kBtu/sq ft, relative to their own pre-COVID levels.
* Placard_perm = -8.916 (p < 0.001): Buildings in the “Permanent” COVID category experienced an additional 8.9 kBtu/sq ft decrease in EUI after the placard began, beyond the average placard effect. they responded more strongly to placard.