# Notebook 5: Synthesis & Recommendations

**OSU Campus Energy Analysis â€” Data I/O 2026 Advanced Track**

This final notebook synthesizes findings from all prior analyses to deliver actionable recommendations. We cluster buildings by load profile to tailor strategies, present a prioritized retrofit list, and summarize the path to energy optimization.

**Narrative arc**: "Here is the roadmap to victory: specific actions, prioritized by ROI, backed by data-driven insights."

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

sns.set_theme(style='whitegrid', font_scale=1.1)
plt.rcParams['figure.figsize'] = (14, 6)

DATA_DIR = Path('/Users/Siddarth/Data IO/processed')

print('Ready for synthesis.')

Ready for synthesis.


In [2]:
# Load all results
elec = pd.read_parquet(DATA_DIR / 'meter_electricity.parquet')
elec['date'] = pd.to_datetime(elec['date'])
buildings = pd.read_parquet(DATA_DIR / 'buildings.parquet')

savings = pd.read_parquet(DATA_DIR / 'savings_potential.parquet')
quality = pd.read_parquet(DATA_DIR / 'data_quality_scorecard.parquet')
peak = pd.read_parquet(DATA_DIR / 'peak_demand_predictions.parquet')
retrofits = pd.read_parquet(DATA_DIR / 'retrofit_candidates.parquet')

print(f'Loaded savings analysis for {len(savings)} buildings')
print(f'Loaded retrofit candidates: {len(retrofits)}')
print(f'Loaded peak predictions: {len(peak)} days')
# === DATA CLEANING: Remove Known Anomalies ===
anomalies = [1044, 79, 279, 134, 992, '1044', '0079', '0279', '0134', '0992']
print(f'Savings rows before filter: {len(savings):,}')
# Robust filtering: Convert to int to handle str/int ambiguity
savings['simscode_int'] = pd.to_numeric(savings['simscode'], errors='coerce')
target_anomalies = [1044, 79, 279, 134, 992]
savings = savings[~savings['simscode_int'].isin(target_anomalies)]
savings = savings.drop(columns=['simscode_int'])
print(f'Savings rows after filter: {len(savings):,}')


Loaded savings analysis for 131 buildings
Loaded retrofit candidates: 5
Loaded peak predictions: 58 days
Savings rows before filter: 131
Savings rows after filter: 126


## 1. Building Archetypes (Clustering)

We group buildings by their normalized daily load shapes to understand usage patterns (e.g., Office vs. Lab vs. Housing).

In [3]:
# Pivot to Building x Hour matrix for clustering
# We'll use average hourly profile per building
elec['hour'] = elec['readingtime'].dt.hour
hourly_profile = elec.groupby(['simscode', 'hour'])['readingvalue'].mean().reset_index()
pivot_profile = hourly_profile.pivot(index='simscode', columns='hour', values='readingvalue').dropna()

# Normalize each building to 0-1 range to focus on shape, not magnitude
scaler = MinMaxScaler()
norm_profile = pd.DataFrame(scaler.fit_transform(pivot_profile.T).T, 
                          index=pivot_profile.index, columns=pivot_profile.columns)

# K-Means Clustering
n_clusters = 4
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
clusters = kmeans.fit_predict(norm_profile)
pivot_profile['cluster'] = clusters

fig, axes = plt.subplots(1, n_clusters, figsize=(18, 5), sharey=True)
for i in range(n_clusters):
    cluster_data = norm_profile[clusters == i]
    # Plot individual lines (dimmed)
    for idx in cluster_data.index[:50]:  # Plot subset for performance
        axes[i].plot(cluster_data.loc[idx], color='gray', alpha=0.1)
    # Plot centroid
    axes[i].plot(kmeans.cluster_centers_[i], color='red', linewidth=2, label='Centroid')
    axes[i].set_title(f'Cluster {i} (n={len(cluster_data)})')
    axes[i].set_xticks([0, 6, 12, 18, 23])
    if i == 0:
        axes[i].set_ylabel('Normalized Load')
plt.suptitle('Building Load Profile Archetypes', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## 2. Prioritized Investment Plan

Combining our efficiency analysis (Notebook 3) with predictive modeling (Notebook 4), we recommend the following top 10 investments.

In [4]:
top_opportunities = savings.nlargest(10, 'excess_kwh')[['buildingname', 'campusname', 'eui', 'peer_median_eui', 'excess_kwh', 'savings_dollars']]
top_opportunities['roi_score'] = top_opportunities['savings_dollars'] / 1000  # Proxy for ROI

print('=== Top 10 Efficiency Retrofit Opportunities ===')
print(top_opportunities.to_string(index=False))

fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(data=top_opportunities, x='savings_dollars', y='buildingname', palette='viridis', ax=ax)
ax.set_title('Estimated Annual Savings by Building ($)', fontweight='bold')
ax.set_xlabel('Annual Savings ($)')
ax.set_ylabel('')
plt.tight_layout()
plt.show()

=== Top 10 Efficiency Retrofit Opportunities ===
                                     buildingname     campusname         eui  peer_median_eui   excess_kwh  savings_dollars    roi_score
             McPherson Chemical Laboratory (0053)       Columbus 2006.868719         3.179032 2.357561e+08     1.886049e+07 18860.490579
                              Hopkins Hall (0149)       Columbus   96.697937         3.179032 1.030148e+07     8.241185e+05   824.118516
                          Scott Laboratory (0148)       Columbus   39.317059         2.532811 9.651267e+06     7.721014e+05   772.101355
       Chiller Plant, South Campus Central (0388)       Columbus  107.667746         3.179032 8.678415e+06     6.942732e+05   694.273172
                     McCracken Power Plant (0069)       Columbus   58.231133         3.179032 6.063549e+06     4.850839e+05   485.083882
        Chilled Water Plant, East Regional (0376)       Columbus  127.247535         3.024960 4.254126e+06     3.403301e+05   340

## 3. Peak Demand Management

Our predictive model flagged specific days as high-risk for peak demand charges. We recommend demand response protocols for these days.

In [5]:
# Filter for high probability peak days in test set
high_risk = peak[peak['peak_prob'] > 0.7].sort_values('date')
print(f'Identified {len(high_risk)} high-risk peak days in test period:')
print(high_risk[['date', 'temp_mean', 'peak_prob']].head(10).to_string(index=False))

Identified 0 high-risk peak days in test period:
Empty DataFrame
Columns: [date, temp_mean, peak_prob]
Index: []


## 4. Final Recommendations

1. **Retrofit the "Top 10"**: These buildings account for a disproportionate share of energy waste. Targeting them ensures maximum ROI.
2. **Implement Clustering-Based Scheduling**: Use the 4 identified load profiles to tailor HVAC schedules. "Cluster 2" (likely offices) should have aggressive night setbacks.
3. **Deploy Peak Alerts**: Integrate the XGBoost peak demand classifier into the BMS to trigger pre-cooling or load shedding on high-risk days.
4. **Fix Data Gaps**: The 10 meters with >50% missing data (identified in Notebook 3) need immediate sensor repair to ensure visibility.

In [6]:
# === Communication: Building Report Generator ===
def generate_report_card(bldg_id):
    if bldg_id not in savings.index:
        return f'Building {bldg_id} not found in savings analysis.'
    
    row = savings.loc[bldg_id]
    name = row['buildingname']
    eui = row['eui']
    roi = row['roi_score']
    cluster = clusters.loc[bldg_id, 'cluster_label'] if bldg_id in clusters.index else 'Unknown'
    
    if roi > 1e6: grade = 'F'
    elif roi > 1e5: grade = 'D'
    elif roi > 1e4: grade = 'C'
    elif roi > 1e3: grade = 'B'
    else: grade = 'A'
    
    report = f'''
    ------------- BUILDING REPORT CARD -------------
    Name: {name} ({bldg_id})
    Grade: {grade}
    Cluster Profile: {cluster}
    ------------------------------------------------
    EUI: {eui:,.0f} (Peer Median: {row['peer_median_eui']:,.0f})
    Excess Spend: ${row['savings_dollars']:,.0f} / year
    Recommendation: {'Immediate Audit' if grade in ['F','D'] else 'Monitor'}
    ------------------------------------------------
    '''
    return report

print(generate_report_card('0148')) # Scott Lab
print(generate_report_card('0079')) # Substation


Building 0148 not found in savings analysis.
Building 0079 not found in savings analysis.


In [7]:
# === Dashboard Export ===
import json
dashboard_data = {
    'top_opportunities': savings.head(20).to_dict(orient='records'),
    'clusters': pivot_profile['cluster'].to_dict() if 'pivot_profile' in locals() else {},
    'peak_alert_days': peak_predictions['date'].astype(str).tolist() if 'peak_predictions' in locals() and not peak_predictions.empty else []
}

with open('processed/dashboard_export.json', 'w') as f:
    json.dump(dashboard_data, f, indent=2)
print('Saved processed/dashboard_export.json')


Saved processed/dashboard_export.json
