All up-to-date models are found here.
The research paper is found here.
The (final) presentation given at the symposium is found here.
This repository contains code for the KB-74-OPSCHALER project. KB-74 stands for the minor Applied Data Science at The Hague University of Applied Sciences, with Opschaler being the project name. The goal of this project is to predict the energy usage of houses, 1 week ahead with a 10 second resolution. More information about Opschaler can be found at their website.
Links to the personal portfolio's of the KB-74-OPSCHALER group members are listed below.
There also is sensor data (occupancy, CO2 values, humidity, temperature and more) from within the dwellings available, this has not been added to this file.
Parameter | Unit | Sample rate | Description |
---|---|---|---|
Timestamp | - | 10 s | Timestamp of data telegram (set by smart meter) in local time |
eMeter | kWh | 10 s | Meter reading electricity delivered to client, normal tariff |
eMeterReturn | kWh | 10 s | Meter reading electricity delivered by client, normal tariff |
eMeterLow | kWh | 10 s | Meter reading electricity delivered to client, low tariff |
eMeterLowReturn | kWh | 10 s | Meter reading electricity delivered by client, low tariff |
ePower | kWh | 10 s | Actual electricity power delivered to client |
ePowerReturn | kWh | 10 s | Actual electricity power delivered by client |
gasTimestamp | - | 1 h | Timestamp of the gasMeter reading (set by smart meter) in local time |
gasMeter | m3 | 1 h | Last hourly value (temperature converted0, gas delivered to client |
This is weather data from the KNMI weather station in Rotterdam with a sample rate of 15 minutes.
A representative from OPSCHALER says that this weather station is the most nearby all the dwellings, the exact dwelling locations however are unknown.
They probably are in a 25 km radius from this weather station.
Parameter | Unit | Description |
---|---|---|
DD | degrees | Wind direction |
DR | s | Precipitation time |
FX | m/s | Maximum gust of wind at 10 m |
FF | m/s | Windspeed at 10 m |
N | okta | Cloud coverage |
P | hPa | Outside pressure |
Q | W/m2 | Global radiation |
RG | mm/h | Rain intensity |
SQ | m | Sunshine duration (in minutes) |
T | deg C | Temperature at 1,5 m (1 minute mean) |
T10 | deg C | Minimum temperature at 10 cm |
TD | deg C | Dew point temperature |
U | % | Relative humidity at 1,5 m |
VV | m | Horizontal sight |
WW | - | Weather- and station-code |
All (sub)chapters below are ment for the KB74-Opschaler group members.
- Login to JupyterHub on the datascience server.
- In the top right press 'New -> Terminal'. A SSH terminal should pop up in a new window.
- Next follow this tutorial: link.
- When you have done this you will need to add the SSH key to your GitHub account: link. Notice that step 1 will not work because 'clip' is not recognized! Work around this by using FileZilla to browse to your
~/.ssh/id_rsa.pub
and download the file. Where~
is your home folder. Then open the file with a texteditor, copy the contents and go on with the tutorial. - Test your connection: link
- You are ready to clone repositories.
ls
Lists directory contents- `cd directory_name' Moves up to directory_name
cd ..
Moves down a directorycp
Copies a file or directory to directory- Press tab to finish a word automatically.
Note that~
represents your home folder. More info on Linux commands: link
- Once GitHub has been setup correctly you can clone this reposotiry by pressing the green
Clone or download
button, copy the (link](https://github.com/deKeijzer/KB-74-OPSCHALER.git). - In the jupyter terminal window you should see the line
studentnumber@datascience:~$
. Move to the 'notebooks' folder by typingcd notebooks
. The directory you are in now should be~/notebooks
. - While in here type
git clone <the link you copied, from this repository>
. - Once this is done, move to the 'KB-74-OPSCHALER' folder by typing
cd KB-74-OPSCHALER
. 5. Once in here typegit status
. This will give you additional information and show you that you have cloned successfully.
Before you start working on code in jupyter, be sure that you have the latest version of this repository. Do this by typing git pull
. Once you have written certain parts of code and want to upload it to this repository do this as follows.
git add .
(this will select all files)git commit -m 'commit message. For examples changes that you made to the code.'
git push
More push & pull information can be found in this notebook.
Below is a list of the most important data locations for the Opschaler project. Make sure to not modify or add any files in the folders listed below. Some notebooks have been programmed in such a way that they expect all files in a folder to have a certain file structure. For example: in the smartmeter_data folder
the only files in there should be smartmeter files in the format dwelling_id.csv
. Any other file in there will crash the notebook which uses this folder to process the files.
- Only read files, do not write to them.
- Use the Processed dwelling_id dataframes files for EDA.
The KNMI data consists of two dataframes. One is the raw format, this is the way KNMI has provided the data. The other dataset is the processed one, this has been cleaned/prepared/processed in such a way that it can be used for EDA.
Location: /datc/opschaler/weather_data/knmi_10_min_raw_data
This is the raw 10 minute interval data from 2015 till 2018 as provided by the KNMI (by mail).
Location: //datc//opschaler//weather_data//weather.csv
The KNMI dataframe (1,82 GB) contains weather data from 2015 to 2018, with a 10 minute resolution.
More information can be found in this notebook.
Reading in the data is done as follows:
weather = pd.read_csv('//datc//opschaler//weather_data//weather.csv', delimiter='\t', comment='#', parse_dates=['datetime'])
weather = weather.set_index(['datetime'])
weather.head()
This is the smartmeter data as downloaded from the TU Delft server.
Location: /datc/opschaler/smartmeter_data
These are the raw smartmeter dataframes from the TU Delft server.
They should be in the format export_dwelling_id.csv
.
These files contain the raw electricity and raw gas data.
Location: //datc//opschaler//combined_gas_smart_weather_dfs//unprocessed
The smartmeter, gasmeter and weather dataframes merged into one dataframe.
_hour
has a one hour sample rate, _10s
has a 10 second sample rate.
NaNs are not removed, the following has been done (in order):
For _hour
files:
-
- gasPower calculated by using
.diff()
on gas column.
- gasPower calculated by using
-
- smartmeter and weather data downsampled to 1 hour, using mean.
-
- merged smartmeter, gas and weather data.
For _10s
files:
-
- gas has been upsampled to 10s by using forward fill (
.ffill()
)
- gas has been upsampled to 10s by using forward fill (
-
- gasPower calculated by using
.diff()
on gas column.
- gasPower calculated by using
-
- weather upsampled to 10s by using forward fill
-
- merged smartmeter, gas and weather data
Location: /datc/opschaler/combined_gas_smart_weather_dfs/processed
The smartmeter, gasmeter and weather dataframes merged into one dataframe.
Rows containing a NaN streak which is higher than accepted have been dropped.
NaNs in the weather data have been forward filled.
NaNs in 'eMeter', 'eMeterReturn', 'eMeterLowReturn', 'gasMeter'
have been interpolated.
ePower, ePowerReturn and gasPower might still contain NaNs, drop these after reading in the files (if required).
More information can be found here
dir = '//datc//opschaler//combined_gas_smart_weather_dfs//processed//'
dwelling_id = 'P01S01W0373'
(for example)df = pd.read_csv(dir+dwelling_id+'.csv', delimiter='\t', parse_dates=['datetime'])
df = df.set_index(['datetime'])
Location: /datc/opschaler/honeywell_sensors_per_dwelling_combined/honeywell_all_dwellings_combined.csv
Processed Honeywell sensordata.
All sensordata in one dataframe with dwelling labels.
Note that the serial data in this file has not yet been converted to the room labels.
The serial to room datafile honeywell_serial_to_room.xlsx
can be found in the same folder.
Location: /datc/opschaler/nan_information
This folder contains dwelling_id_threshold_percentage.csv
files together with corresponding plots to get indepth knowledge about the NaNs in all used data.
The notebook in which dwelling_id_threshold_percentage.csv
is created can be found here.
location: //datc//opschaler//EDA//
The EDA results, saved per dwelling.
For example, correlation coefficient matrices are saved in //datc//opschaler//EDA//correlation_matrices
In Linux:
top
to see CPU & RAM.- `nvidia-smi -l 1' to see GPU usage and refresh this information every second.
On Windows:
To use nvidia-smi
first move to:
cd C:\Program Files\NVIDIA Corporation\NVSMI
Then runnvidia-smi
by:.\nvidia-smi -l 1
.
To see the CPU usage:
wmic cpu get loadpercentage