# 01 - Download BDG2 Dataset

This notebook downloads the Building Data Genome 2 dataset from GitHub.

**What this notebook does:**
- Downloads metadata.csv and weather.csv
- Provides instructions for downloading meter data
- Logs provenance (URLs, checksums, timestamps)
- Verifies downloads


In [2]:
import sys
from pathlib import Path

# Add parent directory to path
sys.path.append(str(Path.cwd().parent))

from src.data.download import download_bdg2_data, verify_downloads
from src.utils.config import load_config, get_paths
from src.utils.logging_utils import setup_logger

# Setup logging
logger = setup_logger("download", level="INFO")


In [3]:
# Load configuration and check paths
config = load_config()
paths = get_paths(config)

print(f"Data will be downloaded to: {paths['raw_data']}")
print(f"\\nProject structure ready!")

# Download data
print("\\n" + "="*60)
print("Starting download...")
print("="*60)
results = download_bdg2_data(force_redownload=False)


Data will be downloaded to: /Users/aryaaa/Desktop/ML PROJECT/data/raw
\nProject structure ready!
Starting download...
2025-11-11 00:05:09,142 - src.data.download - INFO - Starting BDG2 data download to /Users/aryaaa/Desktop/ML PROJECT/data/raw
2025-11-11 00:05:09,143 - src.data.download - INFO - Files to download: ['metadata', 'weather']
2025-11-11 00:05:09,143 - src.data.download - INFO - Downloading metadata.csv...


metadata.csv: 131B [00:00, 108kB/s]                    

2025-11-11 00:05:09,471 - src.data.download - INFO - Logged provenance for metadata.csv
2025-11-11 00:05:09,471 - src.data.download - INFO - ✓ Successfully downloaded metadata.csv
2025-11-11 00:05:09,471 - src.data.download - INFO -   Size: 0.00 MB
2025-11-11 00:05:09,471 - src.data.download - INFO -   SHA256: 2a73307e7426c7a2...





2025-11-11 00:05:09,973 - src.data.download - INFO - Downloading weather.csv...


weather.csv: 133B [00:00, 376kB/s]                    

2025-11-11 00:05:10,188 - src.data.download - INFO - Logged provenance for weather.csv
2025-11-11 00:05:10,189 - src.data.download - INFO - ✓ Successfully downloaded weather.csv
2025-11-11 00:05:10,189 - src.data.download - INFO -   Size: 0.00 MB
2025-11-11 00:05:10,189 - src.data.download - INFO -   SHA256: 744ab9a3ed175f64...





2025-11-11 00:05:10,694 - src.data.download - INFO - 
2025-11-11 00:05:10,695 - src.data.download - INFO - Download Summary:
2025-11-11 00:05:10,696 - src.data.download - INFO -   Successful: 2
2025-11-11 00:05:10,696 - src.data.download - INFO -   Failed: 0
2025-11-11 00:05:10,696 - src.data.download - INFO -   Skipped: 0
2025-11-11 00:05:10,697 - src.data.download - INFO - 
2025-11-11 00:05:10,697 - src.data.download - INFO - IMPORTANT: Meter Data Download Instructions
2025-11-11 00:05:10,698 - src.data.download - INFO - 
The meter data files are large and need to be downloaded separately.
2025-11-11 00:05:10,698 - src.data.download - INFO - 
Option 1: Download from Kaggle (Recommended)
2025-11-11 00:05:10,699 - src.data.download - INFO -   1. Install Kaggle API: pip install kaggle
2025-11-11 00:05:10,699 - src.data.download - INFO -   2. Set up Kaggle credentials: https://www.kaggle.com/docs/api
2025-11-11 00:05:10,699 - src.data.download - INFO -   3. Download ASHRAE Great Energy P