The island of misfit buildings: Detecting mixed-use or primary-use-type outliers using load shape clustering
A project focused on clustering different buildings and identifying outliner within them from raw whole building data.
Data collection and pre-processing
- Data sources: Currently two datasets are being used, the Building Genome Dataset (BDG) and the Washington Dataset (DGS)
- Data Collection: Read temporal dataset and make sure readings are resambpled by the hour.
- Context Extraction: Generate resampled datasets filtering observations based on the specify context.
- Load Curve Generation: Based on the given aggregation function, generate one load curve for each building.
The naming format for all generated files is
DatasetName_Context_LoadCurveFunction_Algorithm_TypeOfFile.extension. Each section is decribed as follows:
BDGfor the Building Genome Dataset,
DGSfor the Washington Dataset, and
BDG-DGSfor the combination of both
- Context: Currently implemented
- LoadCurveFunction: Aggregation function used. Currently implemented
- Algorithm: Clustering algorithm used. Currently implemented
- TypeOfFile: The type of data that is stored in this file, most of the times is
- Extension: Usually
Clustering and Validation
- Clustering: Generate building clusters based on daily profiles (formed from hourley read data) using the specified algorithm
- Clustering Validation Metrics: Calculate validation metrics for the clustering results for different choice of k and different algorithms
- Experiments: Sandbox to run all possible combinations of datasets, contexts, and load curve functions as experiments for clustering
- Experiments Utils: Notebook where the main functions used in
ExperimentPlayground.ipynbare. Also, it serves as a middle layer between the playground and the rest of notebooks.
After an experiment is ran, a scores .csv is generated (following the naming conventions from above) that looks as following: