# OSU Energy Dataset Starter Notebook (Pandas)

### **Data Use Notice**
This dataset is sourced from Ohio State’s public dashboard and is being provided for use **only within the Data I/O 2026 challenge.** 

By participating, you agree to follow [Ohio State’s IDP policy](https://it.osu.edu/data/institutional-data-policy) and understand that this data should **not be used or shared outside of this competition.**

**Instructions:**
1. Run the **Install & Download Data** cell first.
2. Then run the rest of the code to load CSVs into Pandas DataFrames.
3. Access any CSV via `pdf_dict["<filename>.csv"]`, e.g., `pdf_dict["meter-readings-march-2025.csv"]`.



In [None]:
# ---------------------------
# Install gdown
# ---------------------------
%pip install gdown --quiet

import gdown
import zipfile
import pandas as pd
import os

WORKSPACE_USER = "anshumanr10@gmail.com"
if os.environ.get("DATABRICKS_RUNTIME_VERSION"):
    workspace_folder = "/tmp/energy_dataset"
else:
    workspace_folder = os.path.join(os.getcwd(), "data")
os.makedirs(workspace_folder, exist_ok=True)

# ---------------------------
# Step 1: Download Core + Bonus ZIPs
# ---------------------------
zip_files = {
    "core": "https://drive.google.com/uc?id=13o_2ojFRCCqwmYMN3w3qu5fQxieXATTd",
    "bonus": "https://drive.google.com/uc?id=1Hvqi5nv66m3b1aEN23NnUOBkVKQrfP5z"
}

extracted_csv_paths = []


for name, url in zip_files.items():
    zip_path = os.path.join(workspace_folder, f"{name}_dataset.zip")
    print(f"\nDownloading {name} ZIP...")
    gdown.download(url, zip_path, quiet=False)
    
    print(f"Extracting CSVs from {name} ZIP...")
    with zipfile.ZipFile(zip_path, "r") as z:
        for member in z.namelist():
            if member.endswith(".csv") and "__MACOSX" not in member:
                print(f"  Extracting {member}")
                z.extract(member, workspace_folder)
                extracted_csv_paths.append(os.path.join(workspace_folder, member))

# ---------------------------
# Step 2: Print list of CSV files
# ---------------------------
print("\nAll extracted CSV files:")
for csv_path in extracted_csv_paths:
    print(f" - {os.path.basename(csv_path)}")

# ---------------------------
# Step 3: Load CSVs into Pandas
# ---------------------------
pdf_dict = {}
for csv_path in extracted_csv_paths:
    csv_name = os.path.basename(csv_path)
    print(f"\nLoading {csv_name} into Pandas...")
    pdf_dict[csv_name] = pd.read_csv(csv_path, encoding="latin1")
    print(f"  {csv_name} loaded, shape: {pdf_dict[csv_name].shape}")

# Example usage
train = pdf_dict.get("meter-readings-march-2025.csv")
train.head()


Note: you may need to restart the kernel to use updated packages.

Downloading core ZIP...


Downloading...
From (original): https://drive.google.com/uc?id=13o_2ojFRCCqwmYMN3w3qu5fQxieXATTd
From (redirected): https://drive.google.com/uc?id=13o_2ojFRCCqwmYMN3w3qu5fQxieXATTd&confirm=t&uuid=5f13f0d2-a686-4840-9d9f-cfcd09aa881a
To: /home/anshu/Github-Repositories/NYC_Housing_Price_Analysis/data/core_dataset.zip
100%|██████████| 201M/201M [00:18<00:00, 11.1MB/s] 


Extracting CSVs from core ZIP...
  Extracting advanced_core/meter-readings-march-2025.csv
  Extracting advanced_core/meter-readings-april-2025.csv
  Extracting advanced_core/weather_data_hourly_2025.csv
  Extracting advanced_core/meter-readings-jan-2025.csv
  Extracting advanced_core/building_metadata.csv
  Extracting advanced_core/meter-readings-feb-2025.csv

Downloading bonus ZIP...


Downloading...
From (original): https://drive.google.com/uc?id=1Hvqi5nv66m3b1aEN23NnUOBkVKQrfP5z
From (redirected): https://drive.google.com/uc?id=1Hvqi5nv66m3b1aEN23NnUOBkVKQrfP5z&confirm=t&uuid=57e66353-758f-4ea3-975a-0b016fa02160
To: /home/anshu/Github-Repositories/NYC_Housing_Price_Analysis/data/bonus_dataset.zip
100%|██████████| 416M/416M [00:30<00:00, 13.4MB/s] 


Extracting CSVs from bonus ZIP...
  Extracting advanced_bonus/meter-readings-may-2025.csv
  Extracting advanced_bonus/meter-readings-sept-2025.csv
  Extracting advanced_bonus/meter-readings-nov-2025.csv
  Extracting advanced_bonus/meter-readings-dec-2025.csv
  Extracting advanced_bonus/meter-readings-aug-2025.csv
  Extracting advanced_bonus/meter-readings-oct-2025.csv
  Extracting advanced_bonus/meter-readings-june-2025.csv
  Extracting advanced_bonus/meter-readings-july-2025.csv

All extracted CSV files:
 - meter-readings-march-2025.csv
 - meter-readings-april-2025.csv
 - weather_data_hourly_2025.csv
 - meter-readings-jan-2025.csv
 - building_metadata.csv
 - meter-readings-feb-2025.csv
 - meter-readings-may-2025.csv
 - meter-readings-sept-2025.csv
 - meter-readings-nov-2025.csv
 - meter-readings-dec-2025.csv
 - meter-readings-aug-2025.csv
 - meter-readings-oct-2025.csv
 - meter-readings-june-2025.csv
 - meter-readings-july-2025.csv

Loading meter-readings-march-2025.csv into Pandas...