# Building Energy Clustering and Outlier Visualization

# Data Collection

Currently the following datasets are in the repo:
- Building Genome Dataset
- Washington D.C. dataset

In [1]:
import functions as func

# load building gnome dataset (BDG)
df_bdg = func.loadDataset('BDG')
print("Building Gnome Dataset: hourly meter data from {} buildings".format(len(df_bdg.columns)))

# load dc building dataset (DC)
df_dc = func.loadDataset('DC')
print("DC Dataset: 15min interval meter data (resampled to hourly) from {} buildings".format(len(df_dc.columns)))


Building Gnome Dataset: hourly meter data from 507 buildings


  if (yield from self.run_code(code, result)):


DC Dataset: 15min interval meter data (resampled to hourly) from 322 buildings


# Extract Context

The goal of this step is to make the data homogeneous by grouping the hourley read readings. Currently, the following context are being considered
- Weekday `weekday`
- Weekend `weekend`
- Entire Week `entireweek`

The function `extractContext(context, dataframe, datasetName)` from `functions.py` takes a time series dataframe and returns the context-related dataframe of the specified dataset. The dataset name is needed because depending on it, only some specific time periods are being evaluated. For more details view the file `RawFeatures_BDG.ipynb`

In [2]:
import functions as func

df_weekday_BDG = func.getContext('weekday', df_bdg, 'BDG')
df_weekend_BDG = func.getContext('weekend', df_bdg, 'BDG')
df_weekday_DC = func.getContext('weekday', df_dc, 'DC')
df_weekend_DC = func.getContext('weekend', df_dc, 'DC')

df_weekday_BDG.head(3)

Unnamed: 0_level_0,Office_Cristina,PrimClass_Jolie,PrimClass_Jaylin,Office_Jesus,PrimClass_Jayla,PrimClass_Janiya,PrimClass_Janice,Office_Jett,UnivLab_Paul,Office_Jerry,...,UnivDorm_Payton,Office_Alyson,UnivClass_Alfredo,UnivLab_Aurora,UnivLab_Alfonso,UnivLab_Carole,Office_Georgia,UnivDorm_Lysander,PrimClass_Jazmin,PrimClass_Jenna
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-01 00:00:00,,0.3,0.8,0.6,0.8,0.8,0.7,0.6,,2.4,...,,,,,,,2.648996,43.639849,18.12,0.9
2015-01-01 01:00:00,,0.6,1.5,1.3,1.8,1.6,1.5,1.2,,4.6,...,,,,,,,4.788145,91.579697,32.77,1.7
2015-01-01 02:00:00,,0.6,1.4,1.6,1.6,1.6,1.5,1.1,,4.5,...,,,,,,,5.018539,91.179697,24.77,1.7


In [3]:
df_weekend_BDG.head(3)

Unnamed: 0_level_0,Office_Cristina,PrimClass_Jolie,PrimClass_Jaylin,Office_Jesus,PrimClass_Jayla,PrimClass_Janiya,PrimClass_Janice,Office_Jett,UnivLab_Paul,Office_Jerry,...,UnivDorm_Payton,Office_Alyson,UnivClass_Alfredo,UnivLab_Aurora,UnivLab_Alfonso,UnivLab_Carole,Office_Georgia,UnivDorm_Lysander,PrimClass_Jazmin,PrimClass_Jenna
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-03 00:00:00,3.35,0.7,1.2,1.8,1.6,1.6,0.9,1.3,73.808334,3.9,...,50.875,244.6,186.3375,319.59,204.52,774.125,6.026254,98.98315,29.9375,1.6
2015-01-03 01:00:00,3.225,0.7,1.7,1.2,1.7,1.6,0.9,1.3,71.465278,3.7,...,42.416667,289.07,186.1875,296.7125,205.09,761.224976,5.212829,91.98315,37.07,1.6
2015-01-03 02:00:00,2.8,0.5,1.2,1.7,1.8,1.6,0.8,1.2,73.599999,3.7,...,26.3,272.84,190.4325,313.0575,190.64,749.974976,5.275703,89.88315,30.13,1.7


In [4]:
print(df_weekday_DC.shape)
df_weekday_DC.head(3)

(12984, 322)


Unnamed: 0_level_0,1st District Headquarters,Impound Lot #1 & Fleet Fueling Site,Jefferson Playing Fields,DC Village,200 I Street Municipal Building,C.W. Harris Elementary School,Nalle Elementary School,Fort Davis Recreation Center,Kimball Elementary School,Income Maintenance Administration Office,...,Shepard Park Library,Spring Road Community Support Services,Van Ness Elementary School,Warehouse/Office,Washington Seniors Wellness Center,Waterfront Municipal Center East,Waterfront Municipal Center West,Winston Educational Center,H.D. Woodson High School,Youth Services Administration #3
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-01-04 00:00:00,130.75,0.16,0.0,24.49,1022.66,0.0,185.33,26.82,37.29,37.9,...,15.34,25.42,39.91,17.53,9.91,244.16,360.65,0.0,247.25,37.75
2016-01-04 01:00:00,124.54,0.16,0.0,23.08,1024.1,0.0,184.21,27.48,37.89,38.67,...,15.55,24.72,37.52,23.9,10.21,241.73,355.51,0.0,244.92,33.11
2016-01-04 02:00:00,127.96,0.16,0.0,22.42,995.2,0.0,185.35,26.77,37.41,37.71,...,15.33,25.54,41.1,23.44,9.76,234.6,346.98,0.0,246.12,37.17


In [5]:
df_weekend_DC.head(3)

Unnamed: 0_level_0,1st District Headquarters,Impound Lot #1 & Fleet Fueling Site,Jefferson Playing Fields,DC Village,200 I Street Municipal Building,C.W. Harris Elementary School,Nalle Elementary School,Fort Davis Recreation Center,Kimball Elementary School,Income Maintenance Administration Office,...,Shepard Park Library,Spring Road Community Support Services,Van Ness Elementary School,Warehouse/Office,Washington Seniors Wellness Center,Waterfront Municipal Center East,Waterfront Municipal Center West,Winston Educational Center,H.D. Woodson High School,Youth Services Administration #3
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-01-03 00:00:00,127.37,0.2,0.0,418.14,1083.51,0.0,173.52,32.5,42.61,43.41,...,33.7,26.64,15.77,43.08,18.36,261.26,474.55,281.77,268.98,27.89
2016-01-03 01:00:00,126.49,0.2,0.0,432.4,1036.84,0.0,144.73,33.26,42.5,43.34,...,33.4,26.77,16.18,47.42,18.58,260.68,482.08,282.56,266.03,27.84
2016-01-03 02:00:00,125.85,0.2,0.0,420.77,1128.8,0.0,145.97,32.89,41.94,43.4,...,33.48,26.55,16.27,49.51,19.11,259.83,482.73,281.24,267.86,27.37


# Load Curves Aggregation

Performance aggregation of the energy consumption based on a specific context and aggregation function. Currently the following functions are implemented:
- Average
- Median
- Linear Regression

In [10]:
df_average_weekday_BDG = func.doAggregation('average', df_weekday_BDG)

AttributeError: 'int' object has no attribute 'index'

# Feature Extraction

### Raw Values

The goal is to have sample the original raw data in such a way that we end up with a n x m matrix, where n is the number of buildings and m is the meter data time series values. An additional feature can be appended at the beginning to indicate the building ID, although the index of the table can be used for this purpose.

\begin{bmatrix}%
x_1^1 & x_2^1 & \dots & x_m^1 \\
x_1^2 & x_2^2 & \dots & x_m^2 \\
\vdots & \vdots & \ddots & \vdots \\
x_1^n & x_2^n & \dots & x_m^n \\
\end{bmatrix}

Raw values of the BDG time series, within from 01/01/15 and 30/11/15:
- Week day context: 368 buildings x 238 days
- Weekend context: 368 buildings x 97 days

Raw values of the DC time series, within from TODO
- Week day context: 325 buildings x 238 days
- Weekend context: 325 buildings x 97 days

### Features learned using TSFRESH and DTW

We will use Time Series Feature extraction based on scalable hypothesis tests (TSFRESH) library (https://github.com/blue-yonder/tsfresh). Additonally, another feature that will be appended will be the Dynamic Time Wrapping (DTW).

It is important to highlight that TSFRESH will require the raw time series values from above.

### Temporal Features from existing work on BGD

Approximately 215 features have already been extracted in previous work (https://github.com/buds-lab/temporal-features-for-nonres-buildings-library)

# Experiments

First, generate the csv files for each context. Currently, the **week day** and **weekend** context csv files can be found in **data/**

### Experiment 1: k-Shape on Raw Time Series

1. Select context csv to work with (see above)
2. Download k-Shape library (https://github.com/Mic92/kshape)
3. Run k-Shape algorithm
4. Evaluation:
    1. Evaluate resulting clusters with sillouhette coefficient plot
    2. Evaluate resulting clusters with elbow method

### Experiment 2: Feature Extraction and Clustering

1. Select context csv to work with (see above)
2. Download TSFERSH library (https://github.com/blue-yonder/tsfresh)
3. Run TSFRESH on dataset
4. Calculate Dynamic time Warping (DTW) (https://pypi.org/project/fastdtw/) as an extra feature
5. Run clustering algorithms
    1. Run K-means on resulting features (TSFRESH + DTW)
        1. Run with K = 5
        2. Run with K $\epsilon$ [2,10]
    2. Run Hierarchical clustering on resulting features (TSFRESH + DTW)
        1. Run with K = 5
        2. Run with K $\epsilon$ [2,10]
6. Evaluation:
    1. Evaluate resulting clusters with sillouhette coefficient plot
    2. Evaluate resulting clusters with elbow method

### Experiment 3: Feature Extraction and  Classification

1. Select context csv to work with (see above)
2. Download TSFERSH library (https://github.com/blue-yonder/tsfresh)
3. Run TSFRESH on dataset
4. Calculate Dynamic time Warping (DTW) (https://pypi.org/project/fastdtw/) as an extra feature
5. Run classification algorithms:
    1. Append primary use type as ground truth labels from meta data **(data/meta_open.csv)**
    2. Run Random-Forest on resulting features (TSFRESH + DTW)
    3. Run SVM on resulting features (TSFRESH + DTW)
6. Evaluation:
    1. F-1 micro score using ground truth labels from metadata