# **Load Agent**

The Load Agent calculates the average load of the shiftable appliances for each hour. It takes the date-to-be-predicted and the data that are cleaned from noisy loads (preprocessed by the ***Preparation Agent*** as the input. Then, this agent calculates the average usage of the appliance for each hour till the preceding date. At the end, it returns the typical load profile per household with the shiftable appliances.

The load profile is responsible for the determination of the operating device costs in the ***Recommendation Agent***. 
The recommendation agent then searches at which hour/s the costs of the appliance usage are minimal, taking into account the ***Activity*** and ***Usage*** Agents and makes a recommendation.

This notebook contains the creation of the ***Load Agent*** class, sequential addition of the functions to the ***Load Agent*** and the *pipeline_function* that comprises of the whole functionality of the class. The complete class you can find in the Appendix of this notebook.



## **1. Preparing the Environment**

### **1.1 Load Scripts and Data**

Set up the connection to the Google Drive, load scripts with the Helper functions and Preparation Agent.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

DATA_PATH = '/content/drive/MyDrive/T4_Recommendation-system-for-demand-response-and-load-shifting/02_data/'

!cp /content/drive/MyDrive/T4_Recommendation-system-for-demand-response-and-load-shifting/03_scripts/helper_functions.py .
!cp /content/drive/MyDrive/T4_Recommendation-system-for-demand-response-and-load-shifting/03_scripts/agents.py .

Mounted at /content/drive


### **1.2 Calling the Preparation Agent**

We need to preprocess the input data for *Load Agent*. For that we call the *pipeline_load* function from the ***Preparation Agent*** module and set the configurations for this function. Further details on the *pipeline_load* function you can find in the notebook ***Preparation Agent*** (sec. 2.5).

Load the necessary modules from our classes: Helper and Preparation_Agent. Create an object 'helper' that calls the module 'Helper'.

In [None]:
from helper_functions import Helper
from agents import Preparation_Agent

helper = Helper()

Load the dataset Household 1 with the function 'load_household'.

In [None]:
household = helper.load_household(DATA_PATH, 1)
household

Unnamed: 0_level_0,Time,Unix,Aggregate,Fridge,Chest Freezer,Upright Freezer,Tumble Dryer,Washing Machine,Dishwasher,Computer Site,Television Site,Electric Heater,Issues
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2013-10-09 13:06:17,2013-10-09 13:06:17,1381323977,523,74,0,69,0,0,0,0,0,1,0
2013-10-09 13:06:31,2013-10-09 13:06:31,1381323991,526,75,0,69,0,0,0,0,0,1,0
2013-10-09 13:06:46,2013-10-09 13:06:46,1381324006,540,74,0,68,0,0,0,0,0,1,0
2013-10-09 13:07:01,2013-10-09 13:07:01,1381324021,532,74,0,68,0,0,0,0,0,1,0
2013-10-09 13:07:15,2013-10-09 13:07:15,1381324035,540,74,0,69,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2015-07-10 11:56:05,2015-07-10 11:56:05,1436529365,187,0,45,0,0,0,0,0,0,1,0
2015-07-10 11:56:12,2015-07-10 11:56:12,1436529372,185,0,45,0,0,0,0,0,0,1,0
2015-07-10 11:56:18,2015-07-10 11:56:18,1436529378,181,0,45,0,0,0,0,0,0,1,0
2015-07-10 11:56:25,2015-07-10 11:56:25,1436529385,186,0,45,0,0,0,0,0,0,1,0


Now, create an object from the Preparation Agent class.

In [None]:
# calling the preparation agent
prep = Preparation_Agent(household)

Set up the parameters for the *pipeline_load* function. Here we determine the **shiftable devices** and **threshold to detect active appliances** manually. 

**Shiftable** are devices that usage can be shifted within a day. For that devices we will make recommendations.

We assume that the user of our recommendation system can choose the shiftable appliances by yourself. In that way, among shiftable appliances are: *Tumble Dryer, Washing Machine* and *Dishwasher*.

We also assume that the *Computer Site* and *Television Site* are non-shiftable appliances. The users usually switch on these appliances, if they have an interest to watch particular shows (e. g. news at 7 p.m.) and, if they are students, for preparation of the home tasks, respectively.

Other appliances, such as refregirator and freezers cannot be switched off, and therefore, they are non-shiftable by the definition.

**Threshold** setting is important to clean data from the noisy loads. For example, the washing machine is still connected to the socket, but not working. Therefore, it consumes some energy that is not the electricity consumption of a user. More about the threshold you can find in the ***Preparation Agent*** notebook (sec. 2.4).


In [None]:
truncation_params = {
    'features': 'all', 
    'factor': 1.5, 
    'verbose': 1
}

scale_params = {
    'features': 'all', 
    'kind': 'MinMax', 
    'verbose': 1
}

aggregate_params = {
    'resample_param': '60T'
}

shiftable_devices = ['Tumble Dryer', 'Washing Machine', 'Dishwasher'] 

device_params = {
    'threshold': 0.15
}

load_pipe_params = {
    'truncate': truncation_params,
    'scale': scale_params,
    'aggregate': aggregate_params,
    'shiftable_devices': shiftable_devices, 
    'device': device_params
}

Call the *pipeline_load* function that preprocesses the input data:

*df* - the data with the original loads of shiftable appliances (in kWh) after the outliers' truncation and the cleaning step. This data will be used as an input for the *Load Agent*.

Further information about preprocessing steps find in the ***Preparation Agent*** notebook (sec. 2.2).

In [None]:
# calling the load pipline function
df, df_scaled, df_truncated = prep.pipeline_load(household, load_pipe_params)

[outlier truncation: Unix]: 100%|██████████| 6960008/6960008 [00:06<00:00, 1028660.43it/s]


[outlier truncation: Unix]: 0 outliers were truncated.




[outlier truncation: Aggregate]: 100%|██████████| 6959964/6959964 [00:06<00:00, 1075149.56it/s]


[outlier truncation: Aggregate]: 853913 outliers were truncated.




[outlier truncation: Fridge]: 100%|██████████| 1611328/1611328 [00:01<00:00, 1076920.27it/s]


[outlier truncation: Fridge]: 60120 outliers were truncated.




[outlier truncation: Chest Freezer]: 100%|██████████| 2395354/2395354 [00:02<00:00, 1091001.01it/s]


[outlier truncation: Chest Freezer]: 207420 outliers were truncated.




[outlier truncation: Upright Freezer]: 100%|██████████| 2800342/2800342 [00:02<00:00, 1096279.47it/s]


[outlier truncation: Upright Freezer]: 197818 outliers were truncated.




[outlier truncation: Tumble Dryer]: 100%|██████████| 28117/28117 [00:00<00:00, 915181.83it/s]


[outlier truncation: Tumble Dryer]: 5934 outliers were truncated.




[outlier truncation: Washing Machine]: 100%|██████████| 156232/156232 [00:00<00:00, 1004252.05it/s]


[outlier truncation: Washing Machine]: 27412 outliers were truncated.




[outlier truncation: Dishwasher]: 100%|██████████| 65272/65272 [00:00<00:00, 937920.70it/s]


[outlier truncation: Dishwasher]: 0 outliers were truncated.




[outlier truncation: Computer Site]: 100%|██████████| 756639/756639 [00:00<00:00, 986356.98it/s] 


[outlier truncation: Computer Site]: 253600 outliers were truncated.




[outlier truncation: Television Site]: 100%|██████████| 1273899/1273899 [00:01<00:00, 1012637.05it/s]


[outlier truncation: Television Site]: 225564 outliers were truncated.




[outlier truncation: Electric Heater]: 100%|██████████| 6899335/6899335 [00:06<00:00, 1074925.99it/s]


[outlier truncation: Electric Heater]: 473668 outliers were truncated.




[outlier truncation: Issues]: 100%|██████████| 58183/58183 [00:00<00:00, 962933.46it/s]


[outlier truncation: Issues]: 0 outliers were truncated.


[MinMaxScaler] Finished scaling the data.


In [None]:
df.iloc[300:360,:]

Unnamed: 0_level_0,Tumble Dryer,Washing Machine,Dishwasher
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-10-22 01:00:00,0.0,0.0,0.0
2013-10-22 02:00:00,0.0,0.0,0.0
2013-10-22 03:00:00,0.0,0.0,0.0
2013-10-22 04:00:00,0.0,0.0,0.0
2013-10-22 05:00:00,0.0,0.0,0.0
2013-10-22 06:00:00,0.0,0.0,0.0
2013-10-22 07:00:00,0.0,0.0,0.0
2013-10-22 08:00:00,0.0,0.0,0.0
2013-10-22 09:00:00,0.0,0.0,0.0
2013-10-22 10:00:00,0.0,0.0,0.0


In [None]:
df.describe()

Unnamed: 0,Tumble Dryer,Washing Machine,Dishwasher
count,13520.0,13520.0,13520.0
mean,0.591432,3.317218,10.267475
std,14.784178,24.859087,90.38138
min,0.0,0.0,0.0
25%,0.0,0.0,0.0
50%,0.0,0.0,0.0
75%,0.0,0.0,0.0
max,905.180288,733.0,1087.478788


In [None]:
df[df.iloc[:, :] != 0].describe()

Unnamed: 0,Tumble Dryer,Washing Machine,Dishwasher
count,26.0,280.0,185.0
mean,307.544581,160.174247,750.35814
std,141.48181,68.771746,204.531646
min,193.252927,110.040816,380.971319
25%,215.962301,127.49115,547.980583
50%,277.143647,144.878762,826.905028
75%,334.43277,165.578057,928.783964
max,905.180288,733.0,1087.478788


## **2. Load Agent**

In [None]:
class Load_Agent:

  def __init__(self, load_input_df):
    self.input = load_input_df

###**2.1 Define the Subtasks for the Load Agent**

The **load profile** table is constructed in the way that we assume that the appliance can be used 24 hours per day. That is, of course, not a case of the real application, rather the framework for the further use in the ***Recommendation Agent*** (because each day consists of 24 hours). 

The most of appliances are used for one to two hours, then the rest of the hours in the table is filled with zeros.


**Prove start and end date**

First of all, we have to be sure that we have the full-day data for each day as the input for *Load Agent*. 

That is important, because otherwise, we have no data for the missing hours. That is equal to the non-usage of appliances at these hours.

We cannot also cannot run the following functions that are based on the 24-hours intervals.

The function below checks whether the start and end date observations are represented during 24 hours, or not. If not, the start date begins from the next date, and the end date will be then the previous date (end day - 1), respectively. Otherwise, all input data are "proven" and are used as an input.

In [None]:
def prove_start_end_date(self, df, date):
  import pandas as pd

  start_date = (df.index[0]).strftime('%Y-%m-%d')
  end_date = date

  if len(df[start_date]) < 24:
    start_date = (pd.to_datetime(start_date) + pd.Timedelta(days = 1)).strftime('%Y-%m-%d')
    df = df[start_date:end_date]
  else: 
    df = df[:end_date]

  if len(df[end_date]) < 24:
    end_new = (pd.to_datetime(end_date) - pd.Timedelta(days = 1)).strftime('%Y-%m-%d')
    df = df[:end_new]
  else: 
    df = df[:end_date]
  return df

setattr(Load_Agent, 'prove_start_end_date', prove_start_end_date)
del prove_start_end_date

In [None]:
load_agent = Load_Agent(df)
load_agent.prove_start_end_date(df, '2015-07-10') # just last date to show that it works

Unnamed: 0_level_0,Tumble Dryer,Washing Machine,Dishwasher
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-10-10 00:00:00,0.0,0.0,0.0
2013-10-10 01:00:00,0.0,0.0,0.0
2013-10-10 02:00:00,0.0,0.0,0.0
2013-10-10 03:00:00,0.0,0.0,0.0
2013-10-10 04:00:00,0.0,0.0,0.0
...,...,...,...
2015-07-09 19:00:00,0.0,0.0,0.0
2015-07-09 20:00:00,0.0,0.0,0.0
2015-07-09 21:00:00,0.0,0.0,0.0
2015-07-09 22:00:00,0.0,0.0,0.0


Let's assume that we do these steps to get the recommendations to the '2014-01-01'.

In [None]:
df = load_agent.prove_start_end_date(df, '2014-01-01')
df

Unnamed: 0_level_0,Tumble Dryer,Washing Machine,Dishwasher
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-10-10 00:00:00,0.0,0.0,0.0
2013-10-10 01:00:00,0.0,0.0,0.0
2013-10-10 02:00:00,0.0,0.0,0.0
2013-10-10 03:00:00,0.0,0.0,0.0
2013-10-10 04:00:00,0.0,0.0,0.0
...,...,...,...
2014-01-01 19:00:00,0.0,0.0,0.0
2014-01-01 20:00:00,0.0,0.0,0.0
2014-01-01 21:00:00,0.0,0.0,0.0
2014-01-01 22:00:00,0.0,0.0,0.0


**Detection of the yesterday's date**

We need to receive a load profile on the 'yesterday's date' and use in the recommender system to compute the recommendations to the current/*input* date.

In [None]:
def df_yesterday_date(self, df, date):
  import pandas as pd
  yesterday = (pd.to_datetime(date) - pd.Timedelta(days = 1)).strftime('%Y-%m-%d')
  return df[:yesterday]

setattr(Load_Agent, 'df_yesterday_date', df_yesterday_date)
del df_yesterday_date

In [None]:
df = load_agent.df_yesterday_date(df, '2014-01-01')
df

Unnamed: 0_level_0,Tumble Dryer,Washing Machine,Dishwasher
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-10-10 00:00:00,0.0,0.0,0.0
2013-10-10 01:00:00,0.0,0.0,0.0
2013-10-10 02:00:00,0.0,0.0,0.0
2013-10-10 03:00:00,0.0,0.0,0.0
2013-10-10 04:00:00,0.0,0.0,0.0
...,...,...,...
2013-12-31 19:00:00,0.0,0.0,0.0
2013-12-31 20:00:00,0.0,0.0,0.0
2013-12-31 21:00:00,0.0,0.0,0.0
2013-12-31 22:00:00,0.0,0.0,0.0


**Raw load table**

This function returns a dictionary with the separate data frames. Each data frame contains the hourly loads for each appliance separately. 

The start-hour of the appliance usage is defined as the non-zero value that appears after zero value.

The function is named *raw* because each table contains *dirty* loads. That is because this function searches for the values that are greater than zero on the interval with the length of 24 (corresponds to the columns that represents the hours in the whole day). 
Of course, between first non-zero values and the end of this interval appear some zero values that are followed by the non-zero values, i.e. an appliance could be turned on twice within 24h.

Therefore, this function does not determine a clear end-hour of the appliance yet.

However, that is not the problem, because: 
- the non-zero values that appear after zeroes are defined by the innerloop, and therefore these non-zero values are met again as the new start values (new rows) due to the out-loop,

- the function below cleans the non-zero values that appears after zeroes, and we get the clean load profiles.

In other words, one long task was splitted into two functions/sub-tasks to represent easy solutions.

Please note, that the tables do not contain zero-values, because we have pre-defined the condition *greater than zero*. The *zero-values* are represented as *nan-values* in these tables.


In [None]:
def load_profile_raw(self, df, shiftable_devices):
  import pandas as pd

  hours = [] 
  for hour in range(1,25):
    hours.append('h' + str(hour))
  df_hours = {}

  for idx, appliance in enumerate(shiftable_devices): # delete enumerate if we do not need integers indexes of devices
    df_hours[appliance] = pd.DataFrame(index = None, columns = hours)
    column = df[appliance]

    for i in range(len(column)):

      if (i == 0) and (column[0] > 0):
        df_hours[appliance].loc[0, 'h' + str(1)] = column[0]

      elif (column[i-1] == 0) and (column[i] > 0):
        for j in range(0, 24): 
          if (i + j) < len(column):
            if (column[i + j] > 0):
              df_hours[appliance].loc[i, 'h' + str(j+1)] = column[i + j]

  return df_hours

setattr(Load_Agent, 'load_profile_raw', load_profile_raw)
del load_profile_raw

In [None]:
df_hours = load_agent.load_profile_raw(df, shiftable_devices)
df_hours['Washing Machine']

Unnamed: 0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11,h12,h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24
318,209.085,,,,,,,127.137,132.099,,,,,,,,,,,,,,,
325,127.137,132.099,,,,,,,,,,,,,,,,,,,,117.364,,
346,117.364,,,,,,,,,,,,,,,,,,,,,,,
418,201.958,,,,,,,,,,,,,,,,,,,,,,,
445,114.042,,,,,,,,,,,,,,,,,,,,,,138.236,131.192
467,138.236,131.192,,,,,,,,,,,,,,,,,,,,,,
586,132.78,,,,,,,,,,,,,,,,,,,,,,,
613,136.164,,,,,,,,,,,,,,,,,,,,,,,
678,159.398,,,,,,,,,,,,,,,,,,,,,,,
758,188.727,,,,,,,,,,,,,,,,,,,,,,,


**Clean load table**

We have to clean the previous table from the loads that starts in the hour/s after break of the usage. This function replaces all *nan*-values and all followed non-zero values in each row to zero. The end of use is thus defined as the last non-zero value where there were no *nan-* values between the start and this value.

In [None]:
def load_profile_cleaned(self, df_hours):
  import numpy as np

  for app in df_hours.keys():
     for i in df_hours[app].index:
       for j in df_hours[app].columns:
         if np.isnan(df_hours[app].loc[i, j]):
           df_hours[app].loc[i, j:] = 0 
  return df_hours

setattr(Load_Agent, 'load_profile_cleaned', load_profile_cleaned)
del load_profile_cleaned

In [None]:
df_hours = load_agent.load_profile_cleaned(df_hours)
df_hours['Dishwasher']

Unnamed: 0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11,h12,h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24
10,427.454,498.593,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
370,894.062,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
459,935.766,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
508,880.49,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
581,727.527,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
653,997.465,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
710,842.965,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
730,899.404,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
754,446.333,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
797,924.428,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
df_hours['Washing Machine']

Unnamed: 0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11,h12,h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24
318,209.085,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
325,127.137,132.099,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
346,117.364,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
418,201.958,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
445,114.042,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
467,138.236,131.192,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
586,132.78,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
613,136.164,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
678,159.398,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
758,188.727,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**The load profile table**

We calculate here the average values for all hours of each shiftable appliances. For that, we take each dataframe in the dictionary and calculate averages of the loads for every hour of these appliances. Then, we create a new data frame and add the appliances as the indexes and fill the *hours* columns with the respective average values.

In [None]:
def load_profile(self, df_hours, shiftable_devices): 
  import pandas as pd

  hours = df_hours[shiftable_devices[0]].columns
  loads = pd.DataFrame(columns = hours)

  for app in df_hours.keys():
    app_mean = df_hours[app].apply(lambda x: x.mean(), axis = 0)
    for hour in app_mean.index:
      loads.loc[app, hour] = app_mean[hour]

  loads = loads.fillna(0)   
  return loads

setattr(Load_Agent, 'load_profile', load_profile)
del load_profile

In [None]:
load_profiles = load_agent.load_profile(df_hours, shiftable_devices)
load_profiles

Unnamed: 0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11,h12,h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24
Tumble Dryer,355.682148,30.426509,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Washing Machine,147.090697,15.920504,3.307045,3.020827,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Dishwasher,814.583135,40.581581,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


###**2.2 Pipeline Function**

The output of *pipeline* function is a ready table with the average loads for each hour and each appliances. The inputs are: the preprocessed by the ***Preparation agent*** data frame, date-to-be-predicted and the list with the pre-defined shiftable appliances.

This function contains all functions in the ***Load Agent*** described above and gives a result just with one click. 



In [None]:
def pipeline(self, df, date, shiftable_devices):

  df = self.prove_start_end_date(df, date)
  df = self.df_yesterday_date(df, date)
  df_hours = self.load_profile_raw(df, shiftable_devices)
  df_hours = self.load_profile_cleaned(df_hours)
  loads = self.load_profile(df_hours, shiftable_devices)

  return loads

setattr(Load_Agent, 'pipeline', pipeline)
del pipeline

In [None]:
df, df_scaled, df_truncated = prep.pipeline_load(household, load_pipe_params)

[outlier truncation: Unix]: 100%|██████████| 6960008/6960008 [00:06<00:00, 1033877.54it/s]


[outlier truncation: Unix]: 0 outliers were truncated.




[outlier truncation: Aggregate]: 100%|██████████| 6959964/6959964 [00:06<00:00, 1065981.45it/s]


[outlier truncation: Aggregate]: 853913 outliers were truncated.




[outlier truncation: Fridge]: 100%|██████████| 1611328/1611328 [00:01<00:00, 1080953.91it/s]


[outlier truncation: Fridge]: 60120 outliers were truncated.




[outlier truncation: Chest Freezer]: 100%|██████████| 2395354/2395354 [00:02<00:00, 1094994.27it/s]


[outlier truncation: Chest Freezer]: 207420 outliers were truncated.




[outlier truncation: Upright Freezer]: 100%|██████████| 2800342/2800342 [00:02<00:00, 1087967.56it/s]


[outlier truncation: Upright Freezer]: 197818 outliers were truncated.




[outlier truncation: Tumble Dryer]: 100%|██████████| 28117/28117 [00:00<00:00, 1018562.86it/s]


[outlier truncation: Tumble Dryer]: 5934 outliers were truncated.




[outlier truncation: Washing Machine]: 100%|██████████| 156232/156232 [00:00<00:00, 994306.09it/s]


[outlier truncation: Washing Machine]: 27412 outliers were truncated.




[outlier truncation: Dishwasher]: 100%|██████████| 65272/65272 [00:00<00:00, 906546.21it/s]


[outlier truncation: Dishwasher]: 0 outliers were truncated.




[outlier truncation: Computer Site]: 100%|██████████| 756639/756639 [00:00<00:00, 989528.11it/s] 


[outlier truncation: Computer Site]: 253600 outliers were truncated.




[outlier truncation: Television Site]: 100%|██████████| 1273899/1273899 [00:01<00:00, 1064668.48it/s]


[outlier truncation: Television Site]: 225564 outliers were truncated.




[outlier truncation: Electric Heater]: 100%|██████████| 6899335/6899335 [00:06<00:00, 1084899.62it/s]


[outlier truncation: Electric Heater]: 473668 outliers were truncated.




[outlier truncation: Issues]: 100%|██████████| 58183/58183 [00:00<00:00, 951141.35it/s]


[outlier truncation: Issues]: 0 outliers were truncated.


[MinMaxScaler] Finished scaling the data.


In [None]:
load_pipe = Load_Agent(df) 
load_table1 = load_pipe.pipeline(df, '2014-01-01', shiftable_devices) 
load_table1

Unnamed: 0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11,h12,h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24
Tumble Dryer,355.682148,30.426509,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Washing Machine,147.090697,15.920504,3.307045,3.020827,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Dishwasher,814.583135,40.581581,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
load_table2 = load_pipe.pipeline(df, '2015-05-09', shiftable_devices) 
load_table2

Unnamed: 0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11,h12,h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24
Tumble Dryer,310.109881,9.736483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Washing Machine,153.608168,27.518637,7.967235,2.124543,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Dishwasher,784.803087,68.961369,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## **Appendix A1: Complete Load Agent**

In [None]:
class Load_Agent:

  def __init__(self, load_input_df):
    self.input = load_input_df


  def prove_start_end_date(self, df, date):
    import pandas as pd

    start_date = (df.index[0]).strftime('%Y-%m-%d')
    end_date = date

    if len(df[start_date]) < 24:
      start_date = (pd.to_datetime(start_date) + pd.Timedelta(days = 1)).strftime('%Y-%m-%d')
      df = df[start_date:end_date]
    else: 
      df = df[:end_date]

    if len(df[end_date]) < 24:
      end_new = (pd.to_datetime(end_date) - pd.Timedelta(days = 1)).strftime('%Y-%m-%d')
      df = df[:end_new]
    else: 
      df = df[:end_date]
    return df


  def df_yesterday_date(self, df, date):
    import pandas as pd

    yesterday = (pd.to_datetime(date) - pd.Timedelta(days = 1)).strftime('%Y-%m-%d')
    return df[:yesterday]


  def load_profile_raw(self, df, shiftable_devices):
    import pandas as pd

    hours = [] 
    for hour in range(1,25):
      hours.append('h' + str(hour))
    df_hours = {}

    for idx, appliance in enumerate(shiftable_devices): # delete enumerate if we do not need integers indexes of devices
      df_hours[appliance] = pd.DataFrame(index = None, columns = hours)
      column = df[appliance]

      for i in range(len(column)):

        if (i == 0) and (column[0] > 0):
          df_hours[appliance].loc[0, 'h' + str(1)] = column[0]

        elif (column[i-1] == 0) and (column[i] > 0):
          for j in range(0, 24): 
            if (i + j) < len(column):
              if (column[i + j] > 0):
                df_hours[appliance].loc[i, 'h' + str(j+1)] = column[i + j]

    return df_hours


  def load_profile_cleaned(self, df_hours):
    import numpy as np

    for app in df_hours.keys():
      for i in df_hours[app].index:
        for j in df_hours[app].columns:
          if np.isnan(df_hours[app].loc[i, j]):
            df_hours[app].loc[i, j:] = 0 
    return df_hours


  def load_profile(self, df_hours, shiftable_devices): 
    import pandas as pd

    hours = df_hours[shiftable_devices[0]].columns
    loads = pd.DataFrame(columns = hours)

    for app in df_hours.keys():
      app_mean = df_hours[app].apply(lambda x: x.mean(), axis = 0)
      for hour in app_mean.index:
        loads.loc[app, hour] = app_mean[hour]

    loads = loads.fillna(0)   
    return loads


  def pipeline(self, df, date, shiftable_devices):
    
        df = self.prove_start_end_date(df, date)
        df = self.df_yesterday_date(df, date)
        df_hours = self.load_profile_raw(df, shiftable_devices)
        df_hours = self.load_profile_cleaned(df_hours)
        loads = self.load_profile(df_hours, shiftable_devices)

        return loads
