# Home Dataset Creation

This notebook is used to create the home dataset, by composing different appliances and their respective power consumption. The datasets used are GREEND and UK-DALE. The power consumption and metadata are read from the dataset using the [nilmtk](https://github.com/nilmtk) library.

The notebook does not include the datasets themselves, which can be downloaded from the respective websites:
- [GREEND](https://sourceforge.net/projects/greend/files/)
- [UK-DALE](https://data.ukedc.rl.ac.uk/browse/edc/efficiency/residential/EnergyConsumption/Domestic/UK-DALE-2017/UK-DALE-FULL-disaggregated?dataid=7d78f943-f9fe-413b-af52-1816f9d968b0)

Both datasets are provided in the h5 format supported by NILMTK, so no conversion should be needed. If the provided h5 formats do not work for any reason, download the raw data and convert it to the h5 format using the [NILMTK converters](https://github.com/nilmtk/nilmtk/blob/master/docs/manual/user_guide/data.ipynb).

It is recommended to run this notebook in a local `conda` environment. Installing NILMTK in a Google colab notebook or other cloud environment is very hacky. Also, using `venv` rather than `conda` does not work, as NILMTK refuses to install in a `venv` environment.

## Setup

In [None]:
%pip install git+https://github.com/nilmtk/nilmtk # Install from git because the conda package is out of date at the time of writing
%pip install git+https://github.com/nilmtk/nilm_metadata # Manual installation required because nilmtk is installed from git

In [None]:
GREEND_PATH = '/data/greend.h5'
UK_DALE_PATH = '/data/uk_dale.h5'
APPLIANCES_OUT_DIR = '/appliances'
ACTIVATIONS_OUT_DIR = '/activations'

## GREEND

In [1]:
from nilmtk import DataSet

greend = DataSet(GREEND_PATH)

### Extract Raw Appliance Data

In [18]:
# Map of building number to list of appliances
# In NILMTK the building numbers start from 1, not 0
appliances = {
    3: ['microwave', 'washing machine', 'audio amplifier'],
    5: ['desktop computer', 'television', 'fridge', 'dish washer']
}

for building, appliance_list in appliances.items():
    print("> Building:", building)

    for appliance in appliance_list:
        print(f"  - Appliance: {appliance} ... ", end="")
        name_undercase = appliance.replace(' ', '_')

        df = next(greend.buildings[building].elec[appliance].load()).reset_index()

        # Rename columns and drop timestamp
        df.columns = ["timestamp", "power"]
        df.drop(columns=["timestamp"], inplace=True)

        # Compute the average power over 1 minute intervals
        df = df.rolling(60, min_periods=1).mean()[::60]

        df.to_csv(f"{APPLIANCES_OUT_DIR}/{name_undercase}.csv", index=False)
        print("OK")

> Building: 1
  - Appliance: lamp ... OK
> Building: 3
  - Appliance: microwave ... OK
  - Appliance: washing machine ... OK
  - Appliance: audio amplifier ... OK
> Building: 5
  - Appliance: desktop computer ... OK
  - Appliance: television ... OK
  - Appliance: fridge ... OK
  - Appliance: dish washer ... OK


### Extract Activations

In [3]:
import os 

activations ={
    3: {'microwave': [20,10,20], 'washing machine': [30,30,20], 'audio amplifier': [5,5,10]},
    # Building 5 is on kaggle due to local mem constraints
}

for building, appliance_list in activations.items():
    print("> Building:", building)

    for appliance, params in appliance_list.items():
        print(f"  - Appliance: {appliance} ... ", end="")
        name_undercase = appliance.replace(' ', '_')

        app = greend.buildings[building].elec[appliance].get_activations(min_off_duration=params[0], min_on_duration=params[1], on_power_threshold=params[2])

        os.mkdir(f"{ACTIVATIONS_OUT_DIR}/{name_undercase}")

        for i, activ in enumerate(app):
            activ.reset_index(drop=True, inplace=True)
            activ.dropna(inplace=True)
        
            activ.to_csv(f"{ACTIVATIONS_OUT_DIR}/{name_undercase}/{i}.csv", index=False)

        print("OK")

> Building: 1
  - Appliance: fridge ... 

  activ.to_csv(f"activations/{name_undercase}/{i}.csv", index=False)


OK
  - Appliance: lamp ... OK
  - Appliance: television ... OK
> Building: 3
  - Appliance: microwave ... OK
  - Appliance: washing machine ... OK
  - Appliance: audio amplifier ... OK


## UK-DALE

In [1]:
from nilmtk import DataSet

ukdale = DataSet(UK_DALE_PATH)

### AC

In [173]:
ac = ukdale.buildings[5].elec.all_meters()[24]
df = next(ac.load())
df = df["power"]["active"].reset_index()
df.columns = ["timestamp", "power"]
df.drop(columns=["timestamp"], inplace=True)

df = df.rolling(60, min_periods=1).mean()[::60]

df.to_csv(f"{APPLIANCES_OUT_DIR}/ac.csv", index=False)
print("OK")

OK


In [32]:
import numpy as np
import os

activations = ac.get_activations(min_off_duration=120, min_on_duration=120, on_power_threshold=300)

os.mkdir(f"{ACTIVATIONS_OUT_DIR}/ac")

for i, activ in enumerate(activations):
    activ.reset_index(drop=True, inplace=True)
    activ.dropna(inplace=True)

    activ.to_csv(f"{ACTIVATIONS_OUT_DIR}/ac/{i}.csv", index=False)

print("OK")

1095 5763.330593607306


  activ.to_csv(f"activations/ac/{i}.csv", index=False)


OK


### Boiler

In [181]:
boiler = ukdale.buildings[4].elec["boiler"]
df = next(boiler.load())
df = df["power"]["active"].reset_index()
df.columns = ["timestamp", "power"]
df.drop(columns=["timestamp"], inplace=True)

df = df.rolling(10, min_periods=1).mean()[::10]

df.to_csv(f"{APPLIANCES_OUT_DIR}/boiler.csv", index=False)
print("OK")

OK


In [34]:
activations = boiler.get_activations(min_off_duration=10, min_on_duration=5, on_power_threshold=50)

os.mkdir(f"{ACTIVATIONS_OUT_DIR}/boiler")

for i, activ in enumerate(activations):
    activ.reset_index(drop=True, inplace=True)
    activ.dropna(inplace=True)

    activ.to_csv(f"{ACTIVATIONS_OUT_DIR}/boiler/{i}.csv", index=False)

print("OK")

607 903.6457990115322


  activ.to_csv(f"activations/boiler/{i}.csv", index=False)


OK


### Lamp

In [185]:
lamp = ukdale.buildings[1].elec.all_meters()[21]
df = next(lamp.load())
df = df["power"]["active"].reset_index()
df.columns = ["timestamp", "power"]
df.drop(columns=["timestamp"], inplace=True)

df = df.rolling(10, min_periods=1).mean()[::10]

df.to_csv(f"{APPLIANCES_OUT_DIR}/lamp.csv", index=False)
print("OK")

OK


In [36]:
activations = lamp.get_activations(min_off_duration=1, min_on_duration=1, on_power_threshold=5)

os.mkdir(f"{ACTIVATIONS_OUT_DIR}/lamp")

for i, activ in enumerate(activations):
    activ.reset_index(drop=True, inplace=True)
    activ.dropna(inplace=True)

    activ.to_csv(f"{ACTIVATIONS_OUT_DIR}/lamp/{i}.csv", index=False)

print("OK")

812 412.7894088669951


  activ.to_csv(f"activations/lamp/{i}.csv", index=False)


OK
