ALDI++: Automatic and parameter-less discord detection for daily load energy profiles

Initial codebase: https://github.com/intelligent-environments-lab/ALDI

This repository is the official implementation of ALDI++: Automatic and parameter-less discord detection for daily load energy profiles.

Requirements

Local

To run locally, you can execute the current environments:

conda env create --file env/environment_<OS>.yaml # replace OS with either `macos` or `ubuntu`

AWS

For the forecasting portion of this project (training and prediction), we recommend using the following EC2 instance which was used in our experiments:

Instance Type: g4dn.4xlarge (16 vCPUs, 64 GB RAM, and 600 GB disk)
AMI: Deep Learning AMI (Ubuntu 18.04)
Conda environment tensorflow2_p36

For the forecasting portion of this project, we recommend using the following EC2 instance which was used in our experiments:

Instance Type: g4dn.4xlarge (16 vCPUs, 64 GB RAM, and 600 GB disk)
AMI: Deep Learning AMI (Ubuntu 18.04)
Conda environment tensorflow2_p36

Data

We chose the following publicly available:

Building Data Genome Project 2

And specifically, the subset used for the Great Energy Predictor III (GEPIII) machine learning competition.

Download the datasets from the competition's data tab into data/.

The manually labeled outliers, from the top winning teams, are extracted from the following resources:

rank-1 winning team and are stored in data/outliers

Then, run the notebook bad_meter_preprocessing.ipynb to create the labeled train set.

Benchmarking models

Statistical model (2-Standard deviation)
ALDI
Variational Auto-encoder (VAE)
ALDI++ (our method)

Evaluation

Discord classification

Confusion matrices and ROC-AUC metrics are evaluated using the following notebooks:

classification_<model>.ipynb

where <model> is one of the benchmarked models: 2sd, vae, aldi, aldipp

Energy Forecasting

To specify different settings and parameters pertinent to the data pre-processing, training, and evaluation, modify the files inside the configs/ folder as a yaml file. The pipeline used for energy forecasting is based on the Rank-1 team's solution.

It is assumed, however, that at least the following folder structure exists:

.
├── configs
│   ├── ..
├── data
│   ├── outliers
│   │   ├── ...
│   ├── preprocessed
│       ├── ...
...

Training pipeline

Each yaml file inside configs/ holds the configuration of different discord detection algorithms. Thus, in order to execute a strip-down version of the Rank-1 team's solution the following line needs to be executed:

./rank1-solution-simplified.sh configs/{your_config}.yaml

Results

Dictionaries with the computed results can be found in results/. Our model achieves the following forecasting performance (RMSLE) and computation time (min) on the GEPIII dataset, the results of the original competition winning team, a simple statistical approach, a commonly used deep learning approch, and the original ALDI are shown too:

Discords labeled by	RMSLE	Computation time (min)
Kaggle winning team	2.841	480
2-Standard deviation	2.835	1
ALDI	2.834	40
VAE	2.829	32
ALDI++	2.665	8

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
img		img
results		results
.gitignore		.gitignore
GMM_training.py		GMM_training.py
LICENSE		LICENSE
README.md		README.md
aldi.py		aldi.py
aldi_evaluation_metrics.py		aldi_evaluation_metrics.py
aldi_gmm_dyn_none_both.py		aldi_gmm_dyn_none_both.py
anomaly_detection.py		anomaly_detection.py
bad_meter_processing.ipynb		bad_meter_processing.ipynb
classification_aldi.ipynb		classification_aldi.ipynb
classification_aldipp.ipynb		classification_aldipp.ipynb
classification_vae.ipynb		classification_vae.ipynb
data_import_ashrae.py		data_import_ashrae.py
encoders.py		encoders.py
final_plots.ipynb		final_plots.ipynb
forecasting_results.py		forecasting_results.py
predict_lgb_meter.py		predict_lgb_meter.py
prepare_predictions.py		prepare_predictions.py
preprocess_modeling.py		preprocess_modeling.py
rank1-solution-simplified.sh		rank1-solution-simplified.sh
train_lgb_meter.py		train_lgb_meter.py
utils.py		utils.py
vae.py		vae.py

License

buds-lab/aldiplusplus

Folders and files

Latest commit

History

Repository files navigation

ALDI++: Automatic and parameter-less discord detection for daily load energy profiles

Requirements

Local

AWS

Data

Benchmarking models

Evaluation

Discord classification

Energy Forecasting

Training pipeline

Results

About

Resources

License

Stars

Watchers

Forks

Languages