# PROJET SEATTLE ENERGY BENCHMARKING
## Notebook 00 : initialisation de l'environnement et ingestion des données

---

### Identité du document
* **Statut :** Phase 1 (exploration & prototypage)
* **Date de création :** 26 Décembre 2025
* **Dernière mise à jour :** 29 Décembre 2025
* **Dépendances notebooks**: Aucune

### Description
Ce notebook établit les fondations du projet. Il assure la mise en place de l'infrastructure et la configuration des outils de gestion . L'objectif est de garantir une base reproductible avant d'entamer l'audit de qualité.

### Objectifs principaux
1. Initialiser le système de gestion de configuration via Hydra.
2. Déployer l'arborescence des répertoires du projet.
3. Récupérer et charger le jeu de données source (Immuable).
4. Établir un premier diagnostic structurel des données chargées.

### Dépendances critiques
* `hydra` : Gestion de la configuration.
* `src.utils` & `src.data` : Modules internes de support.

### LIVRABLES
1. Environnement technique initialisé (Hydra, Dossiers).
2. Dataset brut dans `data/raw/`.
3. Dictionnaire de données initial (`reports/data_dictionary.md`).
4. Note de synthèse structurelle.
5. Notebook propre

---

# Table des Matières
- [Section 0 : Importation des packages ](#section-1--configuration)
- [Section 1 : Configuration et initialisation ](#section-1--configuration)
- [Section 2 : Contextualisation & provenance](#section-2--contextualisation--provenance)
- [Section 3 : Chargement et inspection structurelle](#section-3--ingestion-et-premier-contact)
- [Section 4 : Typologie des variables](#section-4--typologie-et-cartographie-des-variables)
- [Section 5 : Focus sur la variable cible](#section-5--analyse-de-la-variable-cible)
- [Section 6 : Identification des redondances apparentes](#section-6--étude-des-redondances-et-unités)
- [Section 7 :Premières incohérences détectées](#section-7--détection-des-signaux-dalerte-immédiats)
- [Section 8 : Synthèse et génération du dictionnaire](#section-8--synthèse-et-génération-du-dictionnaire)
---

# Section 0 : Importation des packages

In [4]:
import logging
import pandas as pd
from pathlib import Path


# Import des fonctions utilitaires 

import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().parent
SRC_PATH = PROJECT_ROOT / "src"

if str(SRC_PATH) not in sys.path:
    sys.path.insert(0, str(SRC_PATH))


from data.load_data import load_data_raw
from utils.config_loader import load_config, create_directories
from utils.eda_logger import setup_eda_logger
 


# Section 1:Configuration et initialisation

In [5]:
# Chargement de la configuration principale (Hydra)
cfg = load_config()

# Initialisation du logger
setup_eda_logger(cfg)
logger = logging.getLogger(__name__)

# Création des dossiers nécessaires à l'exécution (raw, interim, processed, reports, etc.)
create_directories(cfg)


2025-12-30 20:34:03,872 - utils.config_loader - INFO - Configuration 'config' chargée (project_root=C:\Users\HP\Desktop\temp\TODO\SEMESTRE_1\ML1\ML-prediction-CO2)
2025-12-30 20:34:03,875 - utils.config_loader - INFO - Répertoire prêt : C:\Users\HP\Desktop\temp\TODO\SEMESTRE_1\ML1\ML-prediction-CO2\data\raw
2025-12-30 20:34:03,876 - utils.config_loader - INFO - Répertoire prêt : C:\Users\HP\Desktop\temp\TODO\SEMESTRE_1\ML1\ML-prediction-CO2\data\interim
2025-12-30 20:34:03,877 - utils.config_loader - INFO - Répertoire prêt : C:\Users\HP\Desktop\temp\TODO\SEMESTRE_1\ML1\ML-prediction-CO2\data\processed
2025-12-30 20:34:03,877 - utils.config_loader - INFO - Répertoire prêt : C:\Users\HP\Desktop\temp\TODO\SEMESTRE_1\ML1\ML-prediction-CO2\figures
2025-12-30 20:34:03,877 - utils.config_loader - INFO - Répertoire prêt : C:\Users\HP\Desktop\temp\TODO\SEMESTRE_1\ML1\ML-prediction-CO2\reports


# Section 2 : Chargement et inspection structurelle

Cette section se concentre sur le chargement initial des données à partir du fichier CSV et une inspection structurelle de base. L'objectif est d'obtenir une première vue d'ensemble du dataset, de vérifier son intégrité et d'identifier tout problème évident au niveau du format ou de la structure.

## Importation du CSV

In [6]:
# Chargement des données brutes
df_raw = load_data_raw(cfg)

# Validation dimensions
n_rows, n_cols = df_raw.shape
logger.info(f"Dataset chargé ({n_rows} lignes, {n_cols} colonnes)")



2025-12-30 20:43:27,072 - data.load_data - INFO - DataFrame chargé : 3376 lignes, 46 colonnes
2025-12-30 20:43:27,102 - data.load_data - INFO -  Intégrité des données validée (Aucune modification détectée).
2025-12-30 20:43:27,106 - __main__ - INFO - Dataset chargé (3376 lignes, 46 colonnes)


---
Le chargement nous donne un DataFrame avec 3376 lignes et 46 colonnes.

## Affichage pour première impression visuelle

In [None]:
# Forcer l'affichage de toutes les colonnes 
pd.set_option("display.max_columns", None)
# Affichage des premières lignes
print("Premières 10 lignes :")
display(df_raw.head(10))

# Affichage des dernières lignes
print("Dernières 10 lignes :")
display(df_raw.tail(10))

# Affichage d'un échantillon aléatoire
print("Échantillon aléatoire de 20 lignes :")
display(df_raw.sample(20))

Unnamed: 0,OSEBuildingID,DataYear,BuildingType,PrimaryPropertyType,PropertyName,Address,City,State,ZipCode,TaxParcelIdentificationNumber,CouncilDistrictCode,Neighborhood,Latitude,Longitude,YearBuilt,NumberofBuildings,NumberofFloors,PropertyGFATotal,PropertyGFAParking,PropertyGFABuilding(s),ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA,YearsENERGYSTARCertified,ENERGYSTARScore,SiteEUI(kBtu/sf),SiteEUIWN(kBtu/sf),SourceEUI(kBtu/sf),SourceEUIWN(kBtu/sf),SiteEnergyUse(kBtu),SiteEnergyUseWN(kBtu),SteamUse(kBtu),Electricity(kWh),Electricity(kBtu),NaturalGas(therms),NaturalGas(kBtu),DefaultData,Comments,ComplianceStatus,Outlier,TotalGHGEmissions,GHGEmissionsIntensity
0,1,2016,NonResidential,Hotel,Mayflower park hotel,405 Olive way,Seattle,WA,98101.0,659000030,7,DOWNTOWN,47.6122,-122.33799,1927,1.0,12,88434,0,88434,Hotel,Hotel,88434.0,,,,,,60.0,81.699997,84.300003,182.5,189.0,7226362.5,7456910.0,2003882.0,1156514.0,3946027.0,12764.5293,1276453.0,False,,Compliant,,249.98,2.83
1,2,2016,NonResidential,Hotel,Paramount Hotel,724 Pine street,Seattle,WA,98101.0,659000220,7,DOWNTOWN,47.61317,-122.33393,1996,1.0,11,103566,15064,88502,"Hotel, Parking, Restaurant",Hotel,83880.0,Parking,15064.0,Restaurant,4622.0,,61.0,94.800003,97.900002,176.100006,179.399994,8387933.0,8664479.0,0.0,950425.2,3242851.0,51450.81641,5145082.0,False,,Compliant,,295.86,2.86
2,3,2016,NonResidential,Hotel,5673-The Westin Seattle,1900 5th Avenue,Seattle,WA,98101.0,659000475,7,DOWNTOWN,47.61393,-122.3381,1969,1.0,41,956110,196718,759392,Hotel,Hotel,756493.0,,,,,,43.0,96.0,97.699997,241.899994,244.100006,72587024.0,73937112.0,21566550.0,14515440.0,49526664.0,14938.0,1493800.0,False,,Compliant,,2089.28,2.19
3,5,2016,NonResidential,Hotel,HOTEL MAX,620 STEWART ST,Seattle,WA,98101.0,659000640,7,DOWNTOWN,47.61412,-122.33664,1926,1.0,10,61320,0,61320,Hotel,Hotel,61320.0,,,,,,56.0,110.800003,113.300003,216.199997,224.0,6794584.0,6946800.5,2214446.0,811525.3,2768924.0,18112.13086,1811213.0,False,,Compliant,,286.43,4.67
4,8,2016,NonResidential,Hotel,WARWICK SEATTLE HOTEL (ID8),401 LENORA ST,Seattle,WA,98121.0,659000970,7,DOWNTOWN,47.61375,-122.34047,1980,1.0,18,175580,62000,113580,"Hotel, Parking, Swimming Pool",Hotel,123445.0,Parking,68009.0,Swimming Pool,0.0,,75.0,114.800003,118.699997,211.399994,215.600006,14172606.0,14656503.0,0.0,1573449.0,5368607.0,88039.98438,8803998.0,False,,Compliant,,505.01,2.88
5,9,2016,Nonresidential COS,Other,West Precinct,810 Virginia St,Seattle,WA,98101.0,660000560,7,DOWNTOWN,47.61623,-122.33657,1999,1.0,2,97288,37198,60090,Police Station,Police Station,88830.0,,,,,,,136.100006,141.600006,316.299988,320.5,12086616.0,12581712.0,0.0,2160444.0,7371434.0,47151.81641,4715182.0,False,,Compliant,,301.81,3.1
6,10,2016,NonResidential,Hotel,Camlin,1619 9th Avenue,Seattle,WA,98101.0,660000825,7,DOWNTOWN,47.6139,-122.33283,1926,1.0,11,83008,0,83008,Hotel,Hotel,81352.0,,,,,,27.0,70.800003,74.5,146.600006,154.699997,5758795.0,6062767.5,0.0,823919.9,2811215.0,29475.80078,2947580.0,False,,Compliant,,176.14,2.12
7,11,2016,NonResidential,Other,Paramount Theatre,911 Pine St,Seattle,WA,98101.0,660000955,7,DOWNTOWN,47.61327,-122.33136,1926,1.0,8,102761,0,102761,Other - Entertainment/Public Assembly,Other - Entertainment/Public Assembly,102761.0,,,,,,,61.299999,68.800003,141.699997,152.300003,6298131.5,7067881.5,2276286.0,1065843.0,3636655.0,3851.890137,385189.0,False,,Compliant,,221.51,2.16
8,12,2016,NonResidential,Hotel,311wh-Pioneer Square,612 2nd Ave,Seattle,WA,98104.0,939000080,7,DOWNTOWN,47.60294,-122.33263,1904,1.0,15,163984,0,163984,Hotel,Hotel,163984.0,,,,,,43.0,83.699997,86.599998,180.899994,187.199997,13723820.0,14194054.0,0.0,2138898.0,7297919.0,64259.0,6425900.0,False,,Compliant,,392.16,2.39
9,13,2016,Multifamily MR (5-9),Mid-Rise Multifamily,Lyon Building,607 - 3rd Ave.,Seattle,WA,98104.0,939000105,7,DOWNTOWN,47.60284,-122.33184,1910,1.0,6,63712,1496,62216,Multifamily Housing,Multifamily Housing,56132.0,,,,,,1.0,81.5,85.599998,182.699997,187.399994,4573777.0,4807679.5,1039735.0,742091.2,2532015.0,10020.25977,1002026.0,False,,Compliant,,151.12,2.37


Dernières 10 lignes :


Unnamed: 0,OSEBuildingID,DataYear,BuildingType,PrimaryPropertyType,PropertyName,Address,City,State,ZipCode,TaxParcelIdentificationNumber,CouncilDistrictCode,Neighborhood,Latitude,Longitude,YearBuilt,NumberofBuildings,NumberofFloors,PropertyGFATotal,PropertyGFAParking,PropertyGFABuilding(s),ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA,YearsENERGYSTARCertified,ENERGYSTARScore,SiteEUI(kBtu/sf),SiteEUIWN(kBtu/sf),SourceEUI(kBtu/sf),SourceEUIWN(kBtu/sf),SiteEnergyUse(kBtu),SiteEnergyUseWN(kBtu),SteamUse(kBtu),Electricity(kWh),Electricity(kBtu),NaturalGas(therms),NaturalGas(kBtu),DefaultData,Comments,ComplianceStatus,Outlier,TotalGHGEmissions,GHGEmissionsIntensity
3366,50210,2016,Nonresidential COS,Office,Central West HQ / Brown Bear,1403 w howe,Seattle,WA,,2425039137,7,MAGNOLIA / QUEEN ANNE,47.63572,-122.37525,1952,1.0,1,13661,0,13661,Office,Office,13661.0,,,,,,75.0,36.799999,40.900002,115.5,128.399994,502667.7,558525.1,0.0,147323.5,502667.8,0.0,0.0,True,,Error - Correct Default Data,,3.5,0.26
3367,50212,2016,Nonresidential COS,Other,Conservatory Campus,1400 E Galer St,Seattle,WA,,2925049087,3,EAST,47.63228,-122.31574,1912,1.0,1,23445,0,23445,Other - Recreation,Other - Recreation,23445.0,,,,,,,254.899994,286.5,380.100006,413.200012,5976246.0,6716330.0,0.0,369539.8125,1260870.0,47153.75781,4715376.0,False,,Compliant,,259.22,11.06
3368,50219,2016,Nonresidential COS,Mixed Use Property,Garfield Community Center,2323 East Cherry St,Seattle,WA,,7544800245,3,CENTRAL,47.60775,-122.30225,1994,1.0,1,20050,0,20050,"Fitness Center/Health Club/Gym, Office, Other ...",Other - Recreation,8108.0,Fitness Center/Health Club/Gym,7726.0,Office,3779.0,,,90.400002,99.400002,175.199997,184.600006,1813404.0,1993137.0,0.0,225513.7969,769453.1,10439.51074,1043951.0,False,,Compliant,,60.81,3.03
3369,50220,2016,Nonresidential COS,Office,Genesee/SC SE HQ,4420 S Genesee,Seattle,WA,,4154300585,2,SOUTHEAST,47.5644,-122.27813,1960,1.0,1,15398,0,15398,Office,Office,15398.0,,,,,,93.0,25.200001,26.9,64.099998,66.699997,387810.0,414172.4,0.0,81341.39844,277536.9,1102.72998,110273.0,True,,Error - Correct Default Data,,7.79,0.51
3370,50221,2016,Nonresidential COS,Other,High Point Community Center,6920 34th Ave SW,Seattle,WA,,2524039059,1,DELRIDGE NEIGHBORHOODS,47.54067,-122.37441,1982,1.0,1,18261,0,18261,Other - Recreation,Other - Recreation,18261.0,,,,,,,51.0,56.200001,126.0,136.600006,932082.1,1025432.0,0.0,185334.7031,632362.0,2997.199951,299720.0,False,,Compliant,,20.33,1.11
3371,50222,2016,Nonresidential COS,Office,Horticulture building,1600 S Dakota St,Seattle,WA,,1624049080,2,GREATER DUWAMISH,47.56722,-122.31154,1990,1.0,1,12294,0,12294,Office,Office,12294.0,,,,,,46.0,69.099998,76.699997,161.699997,176.100006,849745.7,943003.2,0.0,153655.0,524270.9,3254.750244,325475.0,True,,Error - Correct Default Data,,20.94,1.7
3372,50223,2016,Nonresidential COS,Other,International district/Chinatown CC,719 8th Ave S,Seattle,WA,,3558300000,2,DOWNTOWN,47.59625,-122.32283,2004,1.0,1,16000,0,16000,Other - Recreation,Other - Recreation,16000.0,,,,,,,59.400002,65.900002,114.199997,118.900002,950276.2,1053706.0,0.0,116221.0,396546.1,5537.299805,553730.0,False,,Compliant,,32.17,2.01
3373,50224,2016,Nonresidential COS,Other,Queen Anne Pool,1920 1st Ave W,Seattle,WA,,1794501150,7,MAGNOLIA / QUEEN ANNE,47.63644,-122.35784,1974,1.0,1,13157,0,13157,"Fitness Center/Health Club/Gym, Other - Recrea...",Other - Recreation,7583.0,Fitness Center/Health Club/Gym,5574.0,Swimming Pool,0.0,,,438.200012,460.100006,744.799988,767.799988,5765898.0,6053764.0,0.0,525251.6875,1792159.0,39737.39063,3973739.0,False,,Compliant,,223.54,16.99
3374,50225,2016,Nonresidential COS,Mixed Use Property,South Park Community Center,8319 8th Ave S,Seattle,WA,,7883603155,1,GREATER DUWAMISH,47.52832,-122.32431,1989,1.0,1,14101,0,14101,"Fitness Center/Health Club/Gym, Food Service, ...",Other - Recreation,6601.0,Fitness Center/Health Club/Gym,6501.0,Pre-school/Daycare,484.0,,,51.0,55.5,105.300003,110.800003,719471.2,782841.3,0.0,102248.0,348870.2,3706.01001,370601.0,False,,Compliant,,22.11,1.57
3375,50226,2016,Nonresidential COS,Mixed Use Property,Van Asselt Community Center,2820 S Myrtle St,Seattle,WA,,7857002030,2,GREATER DUWAMISH,47.53939,-122.29536,1938,1.0,1,18258,0,18258,"Fitness Center/Health Club/Gym, Food Service, ...",Other - Recreation,8271.0,Fitness Center/Health Club/Gym,8000.0,Pre-school/Daycare,1108.0,,,63.099998,70.900002,115.800003,123.900002,1152896.0,1293722.0,0.0,126774.3984,432554.2,7203.419922,720342.0,False,,Compliant,,41.27,2.26


Échantillon aléatoire de 20 lignes :


Unnamed: 0,OSEBuildingID,DataYear,BuildingType,PrimaryPropertyType,PropertyName,Address,City,State,ZipCode,TaxParcelIdentificationNumber,CouncilDistrictCode,Neighborhood,Latitude,Longitude,YearBuilt,NumberofBuildings,NumberofFloors,PropertyGFATotal,PropertyGFAParking,PropertyGFABuilding(s),ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA,YearsENERGYSTARCertified,ENERGYSTARScore,SiteEUI(kBtu/sf),SiteEUIWN(kBtu/sf),SourceEUI(kBtu/sf),SourceEUIWN(kBtu/sf),SiteEnergyUse(kBtu),SiteEnergyUseWN(kBtu),SteamUse(kBtu),Electricity(kWh),Electricity(kBtu),NaturalGas(therms),NaturalGas(kBtu),DefaultData,Comments,ComplianceStatus,Outlier,TotalGHGEmissions,GHGEmissionsIntensity
2227,24868,2016,NonResidential,Worship Facility,Magnolia,2415 31st Ave,Seattle,WA,98199.0,8127700530,7,MAGNOLIA / QUEEN ANNE,47.64,-122.39743,1965,1.0,1,20140,0,20140,"Parking, Worship Facility",Worship Facility,22000.0,Parking,0.0,,,,35.0,45.099998,52.900002,59.0,67.800003,992794.7,1164371.0,0.0,35870.4,122390.0,8704.049805,870405.0,False,,Compliant,,47.08,2.34
1119,20928,2016,NonResidential,Worship Facility,St. Paul Church & School,10001 57th Ave. S,Seattle,WA,98178.0,1686400005,2,SOUTHEAST,47.51042,-122.26277,1954,1.0,2,27876,0,27876,Worship Facility,Worship Facility,44816.0,,,,,,69.0,40.799999,46.799999,71.400002,78.5,1829981.0,2095655.0,0.0,179244.2,611581.0,12184.0,1218400.0,False,,Compliant,,68.97,2.47
2780,26919,2016,Multifamily LR (1-4),Low-Rise Multifamily,Lambda Chi Alpha,4509 19th Ave NE,Seattle,WA,98105.0,8823902740,4,NORTHEAST,47.66164,-122.30762,1928,1.0,3,20629,0,20629,Multifamily Housing,Multifamily Housing,20629.0,,,,,,52.0,58.599998,64.199997,96.199997,102.099998,1209434.0,1325284.0,0.0,100262.1,342094.0,8673.390625,867339.0,False,,Compliant,,48.45,2.35
1516,22194,2016,Multifamily LR (1-4),Low-Rise Multifamily,Sixty Five,1433 NW 64th,Seattle,WA,98107.0,2767600820,6,BALLARD,47.67504,-122.37529,1986,1.0,4,44957,0,44957,"Multifamily Housing, Parking",Multifamily Housing,30044.0,Parking,6000.0,,,,71.0,30.799999,33.0,96.800003,103.599998,926261.2,990882.9,0.0,271471.6,926261.0,0.0,0.0,False,,Compliant,,6.46,0.14
6,10,2016,NonResidential,Hotel,Camlin,1619 9th Avenue,Seattle,WA,98101.0,660000825,7,DOWNTOWN,47.6139,-122.33283,1926,1.0,11,83008,0,83008,Hotel,Hotel,81352.0,,,,,,27.0,70.800003,74.5,146.600006,154.699997,5758795.0,6062768.0,0.0,823919.9,2811215.0,29475.80078,2947580.0,False,,Compliant,,176.14,2.12
1913,23789,2016,Multifamily HR (10+),High-Rise Multifamily,Highlander Condominiums,525 Belmount Ave East,Seattle,WA,98102.0,3302700000,3,EAST,47.62395,-122.32442,1965,1.0,11,66150,11290,54860,Multifamily Housing,Multifamily Housing,66150.0,,,,,,21.0,37.700001,40.799999,118.5,128.0,2496808.0,2696894.0,0.0,731772.4,2496807.0,0.0,0.0,False,,Compliant,,17.41,0.26
710,19682,2016,Multifamily MR (5-9),Mid-Rise Multifamily,The Audrey at Belltown,2922 Western Avenue,Seattle,WA,98121.0,695000005,7,DOWNTOWN,47.61687,-122.35376,1992,1.0,8,188717,53284,135433,"Multifamily Housing, Parking",Multifamily Housing,135433.0,Parking,53284.0,,,,85.0,31.0,32.400002,90.099998,93.400002,4192895.0,4381679.0,0.0,1093983.0,3732669.0,4602.260254,460226.0,False,,Compliant,,50.46,0.27
3367,50212,2016,Nonresidential COS,Other,Conservatory Campus,1400 E Galer St,Seattle,WA,,2925049087,3,EAST,47.63228,-122.31574,1912,1.0,1,23445,0,23445,Other - Recreation,Other - Recreation,23445.0,,,,,,,254.899994,286.5,380.100006,413.200012,5976246.0,6716330.0,0.0,369539.8,1260869.84,47153.75781,4715375.781,False,,Compliant,,259.22,11.06
1015,20512,2016,Multifamily LR (1-4),Low-Rise Multifamily,Central Park East,2001 E Yesler Way,Seattle,WA,98122.0,1496130000,3,CENTRAL,47.60109,-122.30551,1980,1.0,3,52166,0,52166,Multifamily Housing,Multifamily Housing,43490.0,,,,,,28.0,33.0,35.900002,103.5,112.599998,1433573.0,1560122.0,0.0,420156.2,1433573.0,0.0,0.0,False,,Compliant,,9.99,0.19
2080,24381,2016,Nonresidential COS,Mixed Use Property,Rainier Community Center,4630 38th Ave S,Seattle,WA,98118.0,7950304230,2,SOUTHEAST,47.56214,-122.28143,1995,1.0,1,28425,0,28425,"Fitness Center/Health Club/Gym, Office, Other ...",Fitness Center/Health Club/Gym,14081.0,Other - Recreation,12824.0,Office,1479.0,,,75.199997,82.0,168.399994,175.5,2133798.0,2326238.0,0.0,356185.4,1215305.0,9184.930664,918493.0,False,,Compliant,,57.25,2.01
