# **Performance Evaluation Agent**

To evaluate the performance of our recommender system, we will analyze the system regarding the individual agents’ performance, the cold start problem, the sensitivity to changes in the recommendation hyperparameter and potential energy cost savings. We will introduce a further agent called Evaluation Agent which will perform all evaluation actions.

We will perform our evaluation analysis for households 1 to 10 in the REFIT: Electrical Load Measurements data (Murray et at., 2019), to validate our evaluation results.


## **1. Preparing the Environment**

In [None]:
from agents import Performance_Evaluation_Agent
from helper_functions import Helper

helper = Helper()

import pandas as pd
import numpy as np
import json
import pickle
from copy import deepcopy
import time
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm
#import statsmodels.api as sm

import scipy.spatial

In [None]:
DATA_PATH = '../data/'
EXPORT_PATH = '../export/'

<br>

## **2. Preparations for Evaluating the Performance of our Recommender System**
### **2.1 Determining User Input**

Before we are able to use our recommender system and evaluate its performance, we need to specify the required user inputs (i.e. active appliances, shiftable devices and the consumption threshold). For specifying which of the devices in the respective household will be determined as active appliances and shiftable devices, we look at the description of the devices provided in the readme file and categorize the devices according to our definitions of the categories. Furthermore, we validate that the used devices do not contain any noise in their consumption data and remove devices which contain noise, i.e. consume energy constantly over time. 

In [None]:
readme = DATA_PATH+'REFIT_Readme.txt'
readme = helper.load_txt(readme)
start = readme.rfind('House 1\n')
end = readme.find('House 11\n')
print(readme[start:end])

<br>

**Shiftable Devices**

In [None]:
# manual input, determined using information provided in the readme
# validated in the next step using the validate thresholds functionality
shiftable_devices = {
    1: ['Tumble Dryer', 'Washing Machine', 'Dishwasher'],
    2: ['Washing Machine', 'Dishwasher'],
    3: ['Tumble Dryer', 'Washing Machine', 'Dishwasher'],
    4: ['Washing Machine (1)', 'Washing Machine (2)'],
    5: ['Tumble Dryer'], # , 'Washing Machine' --> consumes energy constantly; , 'Dishwasher' --> noise at 3am
    6: ['Washing Machine', 'Dishwasher'],
    7: ['Tumble Dryer', 'Washing Machine', 'Dishwasher'],
    8: ['Washing Machine'], # 'Dryer' --> consumes constantly
    9: ['Washer Dryer', 'Washing Machine', 'Dishwasher'], 
    10: ['Washing Machine'] #'Dishwasher'
}

<br>

**Active Appliances**

In [None]:
# manual input, determined using information provided in the readme
# validated in the next step using the validate thresholds functionality
active_appliances = {
    1: deepcopy(shiftable_devices[1]) + ['Television Site', 'Computer Site'],
    2: deepcopy(shiftable_devices[2]) + ['Television', 'Microwave', 'Toaster', 'Hi-Fi', 'Kettle'],
    3: deepcopy(shiftable_devices[3]) + ['Toaster', 'Television', 'Microwave', 'Kettle'],
    4: deepcopy(shiftable_devices[4]) + ['Television Site', 'Kettle'], #'Microwave', 'Computer Site' --> consume energy constantly 
    5: deepcopy(shiftable_devices[5]) + ['Television Site', 'Combination Microwave', 'Kettle', 'Toaster'], # 'Computer Site', --> consumes energy constantly
    6: deepcopy(shiftable_devices[6]) + ['MJY Computer', 'Kettle', 'Toaster'], #, 'PGM Computer', 'Television Site' 'Microwave' --> consume energy constantly 
    7: deepcopy(shiftable_devices[7]) + ['Television Site', 'Toaster', 'Kettle'],
    8: deepcopy(shiftable_devices[8]) + ['Toaster', 'Kettle'], # 'Television Site', 'Computer' --> consume energy constantly
    9: deepcopy(shiftable_devices[9]) + ['Microwave', 'Kettle'], #'Television Site', 'Hi-Fi' --> consume energy constantly
    10: deepcopy(shiftable_devices[10]) + ['Magimix (Blender)', 'Microwave'] # 'Television Site' --> consume energy constantly
}

<br>

**Energy Consumption Threshold**

Our Preparation Agent will require the energy consumption threshold, which will determine if a device was used in a given period. This threshold will allow to reduce the impact of noise in the data. We will determine the optimal thresholds for the households using the Preparation Agent’s validate thresholds method. To demonstrate how noise in the consumption data occurs, we call the validate thresholds method for household 1. The consumption data regarding the Television Site in household 1 seems to contain daily noise around 3 am. We will choose the optimal threshold, such that the noise is removed from the data.

Furthermore, we will create our initial evaluation configuration file for each household which will contain the specified user input.

In [None]:
# validating the thresholds for household 1 to show noise in the data
household_id = 3

In [None]:
# creating the config including the user input
config =  {'data': {'household': deepcopy(household_id)}}
config['user_input'] = {
    'shiftable_devices': deepcopy(shiftable_devices[config['data']['household']]),
    'active_appliances': deepcopy(active_appliances[config['data']['household']])
}

# initializing the evaluation agent
model_type = "logit"

In [None]:
evaluation = Performance_Evaluation_Agent(DATA_PATH, model_type, config, load_data=True)

In [None]:
preparation = evaluation.preparation

# Data-Preparation
df_th = preparation.truncate(preparation.input)
df_th = preparation.scale(df_th)
df_th = helper.aggregate(df_th, '60T')

# Graphical analysis of candidate thresholds
#thresholds = [0] + list(np.geomspace(.01, .4, 5))
#preparation.validate_thresholds(df_th, thresholds, config['user_input']['active_appliances'])

In [None]:
thresholds = {
    1: 0.15,
    2: 0.01,
    3: 0.01, 
    4: 0.01, 
    5: 0.025,
    6: 0.065, 
    7: 0.01, 
    8: 0.01, # washing machine over night
    9: 0.01, 
    10: 0.01
}

<br>

### **2.2 Running our Pipeline**

Before we are able to analyze the performance of our recommender system, we need to calculate all outputs of all our agents and all recommendations possible based on the available data. To conveniently compute these outputs and recommendations for the households, we added a pipeline method to the Evaluation Agent which allows to run every agent of our recommender system for every available date iteratively. We will demonstrate its functionality by creating the recommendations for household 3. 

Additionally, we will use a further method of the Evaluation Agent to receive the default configuration for evaluating our recommender system.

In [None]:
# creating the config including the user input
config =  {'data': {'household': deepcopy(household_id)}}
config['user_input'] = {
    'shiftable_devices': deepcopy(shiftable_devices[config['data']['household']]),
    'active_appliances': deepcopy(active_appliances[config['data']['household']]),
    'threshold': deepcopy(thresholds[config['data']['household']])
}

<br>

**Preparing the data**

In [None]:
# calling the evaluation agent
evaluation = Performance_Evaluation_Agent(DATA_PATH, model_type, config)
evaluation.get_default_config('preparation')
evaluation.config

In [None]:
evaluation.pipeline('preparation')


<br>

**Creating all recommendations**


In [None]:
evaluation.get_default_config(['activity', 'usage', 'load'])
evaluation.config

In [None]:
activity_predictions, model = evaluation._pipeline_activity_usage_load("activity")

In [None]:
#this takes the prediction out of nesting and puts it into own row
activity_predictions_test = activity_predictions.explode('Prediction')
activity_predictions_test

In [None]:
# just to create time (other way?)
row = 1
hour = 0
hours = []
while row <=len(activity_predictions_test):
    hours.append(hour)
    hour += 1
    row += 1
    if hour == 24:
        hour = 0

print(hours)

In [None]:
config

In [None]:
activity_predictions_test['time'] = hours
activity_predictions_test

In [None]:
activity_treshold = config['preparation']['activity']['activity']['threshold']
active_predictions = activity_predictions_test[activity_predictions_test.Prediction > activity_treshold]

Get data for prediction with Lime and Shap.

In [None]:
# load household data for Household 1
household = helper.load_household(DATA_PATH, 3)

In [None]:
threshold = 0.01
active_appliances = ['Tumble Dryer', 'Dishwasher', 'Washing Machine','Television', 'Microwave', 'Kettle']
shiftable_devices = ['Tumble Dryer', 'Washing Machine', 'Dishwasher']
#model_types = ['logit', 'knn', 'ada', 'random forest']

In [None]:
#activity params
truncation_params = {
    'features': 'all',
    'factor': 1.5,
    'verbose': 0
}

scale_params = {
    'features': 'all',
    'kind': 'MinMax',
    'verbose': 0
}

aggregate_params = {
    'resample_param': '60T'
}

activity_params = {
    'active_appliances': active_appliances,
    'threshold': threshold
}

time_params = {
    'features': ['hour', 'day_name']
}

activity_lag_params = {
    'features': ['activity'],
    'lags': [24, 48, 72]
}

activity_pipe_params = {
    'truncate': truncation_params,
    'scale': scale_params,
    'aggregate': aggregate_params,
    'activity': activity_params,
    'time': time_params,
    'activity_lag': activity_lag_params
}

#load agent
device_params = {
    'threshold': threshold
}

load_pipe_params = {
    'truncate': truncation_params,
    'scale': scale_params,
    'aggregate': aggregate_params,
    'shiftable_devices': shiftable_devices,
    'device': device_params
}

#usage agent

device = {
    'threshold' : threshold}

aggregate_params24_H = {
    'resample_param': '24H'
}

usage_pipe_params = {
    'truncate': truncation_params,
    'scale': scale_params,
    'activity': activity_params,
    'aggregate_hour': aggregate_params,
    'aggregate_day': aggregate_params24_H,
    'time': time_params,
    'activity_lag': activity_lag_params,
    'shiftable_devices' : shiftable_devices,
    'device': device
}

In [None]:
from agents import Activity_Agent, Usage_Agent, Load_Agent, Price_Agent, Preparation_Agent
#get out X_train etc.
# Load pickle data
#activity_df = pd.read_pickle('../data/processed_pickle/activity_df.pkl')
prep = Preparation_Agent(household)
activity_df = prep.pipeline_activity(household, activity_pipe_params)

In [None]:
activity = Activity_Agent(activity_df)

In [None]:
date = '2016-07-09'
X_train, y_train, X_test, y_test = activity.train_test_split(activity_df, date)
activity_df.shape, X_train.shape, y_train.shape, X_test.shape, y_test.shape

# Evaluation Lime, Shap

In [None]:
predictive_models = [model]
predictive_models
#y_test.head(20)#.iloc[2]

In [None]:
#14752
activity_predictions_test

In [None]:
activity_df.columns

In [None]:
activity_df

NameError: name 'activity_predictions_test' is not defined

In [79]:
### LIME Function
from lime import lime_tabular
import statistics

local = 9 #the instance we want to explain
n_iter = 1
i = 0

data = {'Explainability Model': [],
        'Predictive Model': [],
        'Classifier': [],
        'Run Duration': [],
        'MAEE': [],
        'MSEE': []}
exp_eval_df = pd.DataFrame(data)

for local in range(0, len(activity_predictions)):
    for pred_model in predictive_models:
        classifier = pred_model
        print(classifier)

        print(pred_model)
        print(str(pred_model))

        if "KNeighbors" in str(pred_model):
            predictive_model = "KNN"
        elif "Random" in str(pred_model):
            predictive_model = "Random Forest"
        elif "Ada" in str(pred_model):
            predictive_model = "AdaBoost"
        else:
            predictive_model = "Unknown model"

        print(predictive_model)

        # LIME
        explainability_model = 'LIME'
        start_time = time.time()
        #run once
        explainer = lime_tabular.LimeTabularExplainer(training_data = np.array(X_train),
                                                  mode = "regression",
                                                  feature_names = X_train.columns,
                                                  categorical_features = [0])

        exp = explainer.explain_instance(data_row = X_test.iloc[local], #changed to somewhere where activity= 1
                                    predict_fn = pred_model.predict)

        exp.show_in_notebook(show_table = True)
        end_time = time.time()
        difference_time = end_time - start_time

        #compute MSEE:
        y_expl_i = exp.local_pred
        #print(y_expl_i)
        #SEE = (y_pred_i - y_expl_i)**2 #squared prediction error for this computation (repetition necessary for MSEE)

        rep = 0

        exp_list_abs = []
        exp_list_squ = []
        for rep in range(n_iter): #number of iterations for computing the diffferent lime models

            exp = explainer.explain_instance(data_row = X_test.iloc[local], #changed to somewhere where activity= 1
                                    predict_fn = pred_model.predict)
            exp_list_abs.append(y_pred_i-exp.local_pred)
            exp_list_squ.append((y_pred_i-exp.local_pred)**2)

        exp_np_abs =np.array(exp_list_abs)
        exp_np_squ =np.array(exp_list_squ)
        #exp_list_abs.to_numpy()
        #exp_list_abs.flatten()
        #print(exp_list_abs.flatten())

        MAEE = statistics.mean(exp_np_abs.flatten())
        MSEE = statistics.mean(exp_np_squ.flatten())
        print(MAEE)

        #MSEE = mean(exp_list_squ[0])
        #print(MSEE)

        #exp.as_list()
        exp_eval_df.loc[i+1] = [explainability_model, predictive_model, classifier, difference_time, MAEE, MSEE]

        i=i+1


        # SHAP

        explainability_model = 'SHAP'
        start_time = time.time()

        explainer = shap.KernelExplainer(pred_model.predict_proba, X_train)
        shap_values = explainer.shap_values(X_test.iloc[local,:])
        display(shap.force_plot(explainer.expected_value[1], shap_values[1], X_test.iloc[local,:]))

        end_time = time.time()
        difference_time = end_time - start_time

        #compute MSEE:
        rep = 0

        shap_list_abs = []
        shap_list_squ = []

        for rep in range(n_iter):
            explainer = shap.KernelExplainer(pred_model.predict_proba, X_train)
            shap_values = explainer.shap_values(X_test.iloc[local,:])
            print(shap_values)
            # first array = contribution to class 0
            # second array = contribution to class 1
            contribution_to_class_1 = np.array(shap_values).sum(axis=1)[1] # the red part of the diagram
            print(contribution_to_class_1)
            base_value = explainer.expected_value[1] # the mean prediction
            print(base_value)
            y_expl_i = base_value + contribution_to_class_1
            print(y_expl_i)
            SEE = (y_pred_i[0] - y_expl_i)**2 #squared prediction error for this computation (repetition necessary for MSEE)

            shap_list_abs.append(y_pred_i-y_expl_i)
            shap_list_squ.append((y_pred_i-y_expl_i)**2)

            print(shap_list_abs)

        shap_np_abs =np.array(shap_list_abs)
        shap_np_squ =np.array(shap_list_squ)

        MAEE = statistics.mean(shap_np_abs.flatten())
        MSEE = statistics.mean(shap_np_squ.flatten())
        print(MAEE)

        exp_eval_df.loc[i+1] = [explainability_model, predictive_model, classifier, difference_time, MAEE, MSEE]

        i = i+1

        print(exp_eval_df)

LogisticRegression(random_state=0)
LogisticRegression(random_state=0)
LogisticRegression(random_state=0)
Unknown model


IndexError: single positional indexer is out-of-bounds

In [None]:
exp_eval_df