# OSU Energy Dataset Starter Notebook (Pandas)

### **Data Use Notice**
This dataset is sourced from Ohio State’s public dashboard and is being provided for use **only within the Data I/O 2026 challenge.** 

By participating, you agree to follow [Ohio State’s IDP policy](https://it.osu.edu/data/institutional-data-policy) and understand that this data should **not be used or shared outside of this competition.**

**Instructions:**
1. Run the **Install & Download Data** cell first.
2. Then run the rest of the code to load CSVs into Pandas DataFrames.
3. Access any CSV via `pdf_dict["<filename>.csv"]`, e.g., `pdf_dict["meter-readings-march-2025.csv"]`.



In [0]:
# ---------------------------
# Install gdown
# ---------------------------
%pip install gdown --quiet

import gdown
import zipfile
import pandas as pd
import os

# Temporary folder for downloads & extraction
tmp_folder = "/tmp/energy_dataset"
os.makedirs(tmp_folder, exist_ok=True)

# ---------------------------
# Step 1: Download Core + Bonus ZIPs
# ---------------------------
zip_files = {
    "core": "https://drive.google.com/uc?id=13o_2ojFRCCqwmYMN3w3qu5fQxieXATTd",
    "bonus": "https://drive.google.com/uc?id=1Hvqi5nv66m3b1aEN23NnUOBkVKQrfP5z"
}

extracted_csv_paths = []

for name, url in zip_files.items():
    zip_path = os.path.join(tmp_folder, f"{name}_dataset.zip")
    print(f"\nDownloading {name} ZIP...")
    gdown.download(url, zip_path, quiet=False)
    
    print(f"Extracting CSVs from {name} ZIP...")
    with zipfile.ZipFile(zip_path, "r") as z:
        for member in z.namelist():
            if member.endswith(".csv") and "__MACOSX" not in member:
                print(f"  Extracting {member}")
                z.extract(member, tmp_folder)
                extracted_csv_paths.append(os.path.join(tmp_folder, member))

# ---------------------------
# Step 2: Print list of CSV files
# ---------------------------
print("\nAll extracted CSV files:")
for csv_path in extracted_csv_paths:
    print(f" - {os.path.basename(csv_path)}")

# ---------------------------
# Step 3: Load CSVs into Pandas
# ---------------------------
pdf_dict = {}
for csv_path in extracted_csv_paths:
    csv_name = os.path.basename(csv_path)
    print(f"\nLoading {csv_name} into Pandas...")
    pdf_dict[csv_name] = pd.read_csv(csv_path, encoding="latin1")
    print(f"  {csv_name} loaded, shape: {pdf_dict[csv_name].shape}")

# Example usage
train = pdf_dict.get("meter-readings-march-2025.csv")
train.head()


[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m

Downloading core ZIP...


Downloading...
From (original): https://drive.google.com/uc?id=13o_2ojFRCCqwmYMN3w3qu5fQxieXATTd
From (redirected): https://drive.google.com/uc?id=13o_2ojFRCCqwmYMN3w3qu5fQxieXATTd&confirm=t&uuid=de41d6f5-08c6-4779-a85e-353c08782cc7
To: /tmp/energy_dataset/core_dataset.zip
  0%|          | 0.00/201M [00:00<?, ?B/s]  1%|          | 1.05M/201M [00:00<00:32, 6.23MB/s]  1%|          | 2.10M/201M [00:00<00:29, 6.78MB/s]  2%|▏         | 3.15M/201M [00:00<00:27, 7.23MB/s]  2%|▏         | 4.19M/201M [00:00<00:25, 7.73MB/s]  3%|▎         | 5.24M/201M [00:00<00:23, 8.49MB/s]  4%|▍         | 8.91M/201M [00:00<00:11, 16.6MB/s] 12%|█▏        | 24.1M/201M [00:00<00:03, 56.3MB/s] 21%|██        | 42.5M/201M [00:00<00:01, 93.7MB/s] 26%|██▋       | 53.0M/201M [00:01<00:01, 96.3MB/s] 33%|███▎      | 66.1M/201M [00:01<00:01, 107MB/s]  41%|████      | 81.8M/201M [00:01<00:00, 120MB/s] 47%|████▋     | 94.4M/201M [00:01<00:00, 112MB/s] 53%|█████▎    | 107M/201M [00:01<00:00, 115MB/s]  60%|████

Extracting CSVs from core ZIP...
  Extracting advanced_core/meter-readings-march-2025.csv
  Extracting advanced_core/meter-readings-april-2025.csv
  Extracting advanced_core/weather_data_hourly_2025.csv
  Extracting advanced_core/meter-readings-jan-2025.csv
  Extracting advanced_core/building_metadata.csv
  Extracting advanced_core/meter-readings-feb-2025.csv

Downloading bonus ZIP...


Downloading...
From (original): https://drive.google.com/uc?id=1Hvqi5nv66m3b1aEN23NnUOBkVKQrfP5z
From (redirected): https://drive.google.com/uc?id=1Hvqi5nv66m3b1aEN23NnUOBkVKQrfP5z&confirm=t&uuid=2b8daba3-d524-4a6c-9d84-1c359898390b
To: /tmp/energy_dataset/bonus_dataset.zip
  0%|          | 0.00/416M [00:00<?, ?B/s]  0%|          | 1.05M/416M [00:00<01:11, 5.81MB/s]  1%|          | 2.10M/416M [00:00<01:05, 6.30MB/s]  1%|          | 3.15M/416M [00:00<01:01, 6.73MB/s]  1%|          | 4.19M/416M [00:00<00:57, 7.18MB/s]  2%|▏         | 6.29M/416M [00:00<00:37, 11.0MB/s]  3%|▎         | 13.6M/416M [00:00<00:14, 28.7MB/s]  8%|▊         | 31.5M/416M [00:00<00:06, 58.0MB/s] 12%|█▏        | 48.8M/416M [00:01<00:04, 86.0MB/s] 16%|█▌        | 64.5M/416M [00:01<00:03, 104MB/s]  19%|█▉        | 80.7M/416M [00:01<00:02, 119MB/s] 24%|██▎       | 98.0M/416M [00:01<00:02, 134MB/s] 27%|██▋       | 112M/416M [00:01<00:02, 135MB/s]  32%|███▏      | 132M/416M [00:01<00:01, 152MB/s] 37%|███▋ 

Extracting CSVs from bonus ZIP...
  Extracting advanced_bonus/meter-readings-may-2025.csv
  Extracting advanced_bonus/meter-readings-sept-2025.csv
  Extracting advanced_bonus/meter-readings-nov-2025.csv
  Extracting advanced_bonus/meter-readings-dec-2025.csv
  Extracting advanced_bonus/meter-readings-aug-2025.csv
  Extracting advanced_bonus/meter-readings-oct-2025.csv
  Extracting advanced_bonus/meter-readings-june-2025.csv
  Extracting advanced_bonus/meter-readings-july-2025.csv

All extracted CSV files:
 - meter-readings-march-2025.csv
 - meter-readings-april-2025.csv
 - weather_data_hourly_2025.csv
 - meter-readings-jan-2025.csv
 - building_metadata.csv
 - meter-readings-feb-2025.csv
 - meter-readings-may-2025.csv
 - meter-readings-sept-2025.csv
 - meter-readings-nov-2025.csv
 - meter-readings-dec-2025.csv
 - meter-readings-aug-2025.csv
 - meter-readings-oct-2025.csv
 - meter-readings-june-2025.csv
 - meter-readings-july-2025.csv

Loading meter-readings-march-2025.csv into Pandas...

Unnamed: 0,meterid,siteid,sitename,simscode,utility,readingtime,readingvalue,readingunits,readingunitsdisplay,readingwindowstart,readingwindowend,expectedwindowreadings,totalwindowreadings,missingwindowreadings,filteredwindowreadings,readingwindowsum,readingwindowmean,readingwindowstandarddeviation,readingwindowmin,readingwindowmintime,readingwindowmax,readingwindowmaxtime,year,month,day
0,245933,44591,Ackerman Rd,,ELECTRICITY,2025-03-01T05:00:00,8.109953,kWh,Kilowatt hour,2025-03-01T05:00:00,2025-03-02T04:45:00,96,96,0,0,789.819283,8.227284,1.213462,4.721724,2025-03-01T15:15:00,11.313869,2025-03-02T04:30:00,2025,3,1
1,246104,44060,Hamilton Hall,38.0,ELECTRICITY,2025-03-01T05:00:00,0.0,kWh,Kilowatt hour,2025-03-01T05:00:00,2025-03-02T04:45:00,96,96,0,0,0.0,0.0,0.0,0.0,2025-03-01T05:00:00,0.0,2025-03-01T05:00:00,2025,3,1
2,301948,44132,Research Center,73.0,HEAT,2025-03-01T05:00:00,,kWh,Kilowatt hour,2025-03-01T05:00:00,2025-03-02T04:45:00,96,0,96,0,0.0,,,,,,,2025,3,1
3,246127,44073,Jennings Hall,14.0,ELECTRICITY,2025-03-01T05:00:00,48.705495,kWh,Kilowatt hour,2025-03-01T05:00:00,2025-03-02T04:45:00,96,96,0,0,4797.303761,49.971914,0.932538,46.698375,2025-03-01T08:00:00,51.984483,2025-03-02T03:15:00,2025,3,1
4,247554,44138,Schoenbaum Undergrad Program Bldg,251.0,HEAT,2025-03-01T05:00:00,51.037863,kWh,Kilowatt hour,2025-03-01T05:00:00,2025-03-02T04:45:00,96,96,0,0,2151.23756,22.408725,10.381618,5.944383,2025-03-02T01:15:00,51.258556,2025-03-01T05:30:00,2025,3,1
