# Result figures for *Enhanced spatio-temporal electric load forecasts with less data using active deep learning*

---

## Overview
1. Test hyper parameters
2. Numerical results
3. Training and validation losses against unqueried candidates
4. Validation losses against all candidates
5. Heuristics subsampling
6. Heuristics points per cluster
7. Heuristics query by coordinate
8. Query sequence importance
9. Manuscript figure: Results summary
10. Manuscript figure: Heuristics summary

In this notebook session, we summarize and visualize the results of our experiments. First test if all numerical results were computed with the same hyper parameters. Next, we plot our results and exemplar predictions so as to visually see and evaluate their meaning. Lastly, we create figures for our manuscript from experiments that best visualize our findings. We start with importing a number of packages that we use throughout this notebook session. 

In [5]:
import vis_results

HYPER_VIS = vis_results.HyperParameterVisualizing()

## 1. Test hyper parameters

Here, we import the results and the hyper parameters that were used in each of our experiment. We check whether all imported results are calculated on the exact same hyper parameters for the hypothesis test.

In [6]:
vis_results.test_hyper(HYPER_VIS)

## 2. Numeric results

In this section, we conclude the percentage of the data budget that is used, the percentage of novel senors in the queried candidate data point that is selected and the testing losses for each of our conducted experiments.

In [7]:
vis_results.show_numerical_results(HYPER_VIS)

profiles_100
delta1_valup1


Unnamed: 0,comp_fac,budget_usage,sensor_usage,test_loss,RF_loss,accuracy
spatio-temporal None PL train,1.0,100,100,1.317648,1.624579,19
spatio-temporal None PL val,1.0,100,100,1.317648,1.624579,19
spatio-temporal X_st rnd d_c train,4.7,100,100,0.425527,1.624579,74
spatio-temporal X_st rnd d_c val,4.7,100,100,0.425527,1.624579,74
spatio-temporal X_st min d_c train,11.6,100,100,0.668999,1.624579,59
spatio-temporal X_st min d_c val,11.6,100,100,0.668999,1.624579,59
spatio-temporal X_st max d_c train,2.4,100,100,0.397463,1.624579,76
spatio-temporal X_st max d_c val,2.4,100,100,0.397463,1.624579,76
spatio-temporal X_st avg d_c train,2.1,100,100,0.381471,1.624579,77
spatio-temporal X_st avg d_c val,2.1,100,100,0.381471,1.624579,77


profiles_100
delta1_valup0


Unnamed: 0,comp_fac,budget_usage,sensor_usage,test_loss,RF_loss,accuracy
spatio-temporal None PL train,1.0,100,100,1.752188,2.619218,33
spatio-temporal None PL val,1.0,100,100,1.752188,2.619218,33
spatio-temporal X_st rnd d_c train,5.1,100,100,0.689054,2.619218,74
spatio-temporal X_st rnd d_c val,5.1,100,100,0.689054,2.619218,74
spatio-temporal X_st min d_c train,15.2,100,100,0.857415,2.619218,67
spatio-temporal X_st min d_c val,15.2,100,100,0.857415,2.619218,67
spatio-temporal X_st max d_c train,7.4,100,100,0.811436,2.619218,69
spatio-temporal X_st max d_c val,7.4,100,100,0.811436,2.619218,69
spatio-temporal X_st avg d_c train,4.6,100,100,1.177447,2.619218,55
spatio-temporal X_st avg d_c val,4.6,100,100,1.177447,2.619218,55


profiles_100
delta0_valup0


Unnamed: 0,comp_fac,budget_usage,sensor_usage,test_loss,RF_loss,accuracy
spatio-temporal None PL train,1.0,81,100,2.075299,1.387907,0
spatio-temporal None PL val,1.0,81,100,2.075299,1.387907,0
spatio-temporal X_st rnd d_c train,2.9,55,100,1.460858,1.387907,0
spatio-temporal X_st rnd d_c val,2.9,55,100,1.460858,1.387907,0
spatio-temporal X_st min d_c train,2.8,55,100,1.052247,1.387907,24
spatio-temporal X_st min d_c val,2.8,55,100,1.052247,1.387907,24
spatio-temporal X_st max d_c train,2.9,36,88,0.78666,1.387907,43
spatio-temporal X_st max d_c val,2.9,36,88,0.78666,1.387907,43
spatio-temporal X_st avg d_c train,2.4,33,85,0.951458,1.387907,31
spatio-temporal X_st avg d_c val,2.4,33,85,0.951458,1.387907,31


profiles_100
delta0_valup1


Unnamed: 0,comp_fac,budget_usage,sensor_usage,test_loss,RF_loss,accuracy
spatio-temporal None PL train,1.0,81,100,2.106433,1.569501,0
spatio-temporal None PL val,1.0,81,100,2.106433,1.569501,0
spatio-temporal X_st rnd d_c train,4.5,63,100,1.194894,1.569501,24
spatio-temporal X_st rnd d_c val,4.5,63,100,1.194894,1.569501,24
spatio-temporal X_st min d_c train,3.9,63,100,1.309338,1.569501,17
spatio-temporal X_st min d_c val,3.9,63,100,1.309338,1.569501,17
spatio-temporal X_st max d_c train,3.9,39,98,1.166607,1.569501,26
spatio-temporal X_st max d_c val,3.9,39,98,1.166607,1.569501,26
spatio-temporal X_st avg d_c train,3.8,37,100,1.16317,1.569501,26
spatio-temporal X_st avg d_c val,3.8,37,100,1.16317,1.569501,26


## 3. Training and validation losses against unqueried candidates

For each prediction task, each query variable and each query variant, we create figures that allow us to compare their training and validation losses throughout the process of querying new candidate data points in each iteration of the algorithm that we propose.

In [6]:
%%capture
# prevents figures being printed out if used at begining of cell

### Define a series of manual corrections for figures ###

class ManualFigureCorrections:
    
    """ Bundles information for manually correcting figure axes """
    
    
    def __init__(
        self,
        pred_type, 
        AL_variable_list, 
        parameter, 
        column, 
        y_lim_bottom, 
        y_lim_top
    ):
        
        """ Takes required arguments for correcting axes. """
        
        self.pred_type = pred_type
        self.AL_variable_list = AL_variable_list
        self.parameter = parameter
        self.column = column
        self.y_lim_bottom = y_lim_bottom
        self.y_lim_top = y_lim_top
        
correction_list = []

# set the fontsize for figures
mpl.rcParams.update({'font.size': FONTSIZE})

# create a counter over the list of numeric results
result_index_counter = 0

# iterate over all considered prediction types
for index_pred, pred_type in enumerate(PRED_TYPE_LIST):
    
    # iterate over all considered parameter constellations
    for index_param, parameter in enumerate(PARAMETER_LIST):

        delta = int(parameter[5])
        valup = int(parameter[12])
        
        # get results df corresponding to currently iterated parameter and pred_type
        result = results_list[result_index_counter]
        
        # increment result index counter
        result_index_counter += 1
        
        # create the column name for random lossess
        col_name_train = (
            pred_type 
            + ' None ' 
            + 'PL ' 
            + 'train'
        )
        col_name_val = (
            pred_type 
            + ' None ' 
            + 'PL ' 
            + 'val'
        )
        
        
        # get random results
        PL_train = result[col_name_train][9:].dropna().values
        t_iter = int(result[col_name_train][1])

        PL_val = result[col_name_val][9:].dropna().values
        budget_usage = result[col_name_val][2]
        sensor_usage = result[col_name_val][3]
        RF_loss = result[col_name_val][5]
        PL_loss = result[col_name_val][6]
        PL_accuracy = 1 - min(1, PL_loss /RF_loss)
        
        # create the figure legends for random losses
        legend_RF = 'RF baseline'
        legend_PL_train = '{} {}s'.format(
            'PL random:', 
            t_iter
        )
        legend_PL_val = '{}  {:.0%} data  {:.0%} sensors  {:.0%} accuracy'.format(
            'PL random:', 
            budget_usage, 
            sensor_usage,
            PL_accuracy
        )
        
        fig, ax = plt.subplots(
            len(AL_VARIABLES), 
            2, 
            figsize=(
                20, 
                len(AL_VARIABLES) * WIDTH_FACTOR
            )
        )
        
        # set figure titles
        if valup == 0:

            string_valup = 'initial'

        else:

            string_valup = 'unqueried'
        
        # counter to increment subtitle of figures
        title_counter = 0
        
        # iterate over all AL variables
        for index_var, AL_variable in enumerate(AL_VARIABLES):
            
            # plot passive learning losses
            ax[index_var, 0].plot(
                PL_train, 
                color='b', 
                linestyle='--', 
                label=legend_PL_train
            )
            ax[index_var, 1].plot(
                PL_val, 
                color='b', 
                linestyle='--', 
                label=legend_PL_val
            )
            
            if index_var == 0:
                # set column titles
                cols = [
                    'Training losses \n {}'.format(AL_variable), 
                    'Validation losses \n {}'.format(AL_variable)
                ]
                for axes, col in zip(ax[0], cols):
                    axes.set_title(col)
            else:
                
                # set title
                ax[index_var, 0].set_title(AL_variable)
                ax[index_var, 1].set_title(AL_variable)
                       
            # iterate over all AL variants
            for index_method, AL_variant in enumerate(AL_VARIANTS):
                
                # create the column name for iterated validation loss
                col_name_train = (
                    pred_type 
                    + ' ' 
                    + AL_variable 
                    + ' ' 
                    + AL_variant 
                    + ' train'
                )
                col_name_val = (
                    pred_type 
                    + ' ' 
                    + AL_variable 
                    + ' ' 
                    + AL_variant 
                    + ' val'
                )
                
                # get training losses for mode 1 with validation updates
                train_history = result[col_name_train][9:].dropna().values
                t_iter = int(result[col_name_train][1])

                val_history = result[col_name_val][9:].dropna().values
                budget = result[col_name_val][2]
                sensor = result[col_name_val][3]
                RF_loss = result[col_name_val][5]
                AL_loss = result[col_name_val][6]
                AL_accuracy = 1 - min(1, AL_loss /RF_loss)
                
                # create the legends
                legend_train = 'AL {}: {}s'.format(
                    AL_variant, 
                    t_iter
                )
                legend_val = 'AL {}:  {:.0%} data  {:.0%} sensors  {:.0%} accuracy'.format(
                    AL_variant, 
                    budget, 
                    sensor,
                    AL_accuracy
                )
                
                # plot iterated training losses
                ax[index_var, 0].plot(
                    train_history, 
                    label=legend_train
                )
                ax[index_var, 1].plot(
                    val_history, 
                    label=legend_val
                )

            # set legend
            ax[index_var, 0].legend(
                loc='best', 
                frameon=False,
                fontsize=FONTSIZE-2
            )
            ax[index_var, 1].legend(
                loc='best', 
                frameon=False,
                fontsize=FONTSIZE-2
            )

            # set y-axis labels
            ax[index_var, 0].set_ylabel(
                'L2 loss [kW²]', 
                fontsize=FONTSIZE+3
            )
            
            
        # set x-axis
        ax[index_var, 0].set_xlabel(
            'epoch', 
            fontsize=FONTSIZE+3
        )
        ax[index_var, 1].set_xlabel(
            'epoch', 
            fontsize=FONTSIZE+3
        )

        # create saving paths 
        saving_path = (
            path_to_saving_lossesvsunqueried 
            + pred_type 
            + ' ' 
            + parameter 
            + '.pdf'
        )

        # set layout tight
        fig.tight_layout()

        # save figures
        fig.savefig(saving_path)

## 4. Validation losses against all candidates

Here, we compare the validation losses for both removing candidate data points and keeping them after these are queried against the initial candidate data pool. This has the purpose to see how each variant of the algorithm that we propose deals with biases when extending the initial prediction model with newly queried data points. We only compare validation losses and neglect training losses to be more concise.

In [7]:
%%capture 
# prevents figures being printed out if used at begining of cell

### Define a set of manual corrections for figures ###

# set the fontsize for figures
mpl.rcParams.update({'font.size': FONTSIZE})

# create a counter over the list of numeric results
result_index_counter = 0
    
# iterate over all considered prediction types
for index_pred, pred_type in enumerate(PRED_TYPE_LIST):
    
    # create figure
    fig, ax = plt.subplots(
        len(AL_VARIABLES), 
        2, 
        figsize=(
            20, 
            len(AL_VARIABLES) * WIDTH_FACTOR
        )
    )

    # iterate over all considered parameter constellations
    for index_param, parameter in enumerate(PARAMETER_LIST):
    
        delta = int(parameter[5])
        valup = int(parameter[12])

        # get results df corresponding to currently iterated parameter and pred_type
        result = results_list[result_index_counter]
        
        # increment result index counter
        result_index_counter += 1
        
        # skip results, if we consider valup == 0
        if valup == 1:
            continue
        if delta == 1:
            plot_column = 0
        else:
            plot_column = 1
            
        # create the column name for PL validation loss
        col_name_val = (
            pred_type 
            + ' None ' 
            + 'PL ' 
            + 'val'
        )

        PL_val = result[col_name_val][9:].dropna().values
        budget_usage = result[col_name_val][2]
        sensor_usage = result[col_name_val][3]
        RF_loss = result[col_name_val][5]
        PL_loss = result[col_name_val][6]
        PL_accuracy = 1 - min(1, PL_loss /RF_loss)
        
        # create the figure legends for random losses
        legend_RF = 'RF baseline'
        legend_PL_val = 'PL random {:.0%} data  {:.0%} sensors  {:.0%} accuracy'.format(
             budget_usage, 
             sensor_usage,
             PL_accuracy
        )
        
        title_counter = 0
        
        # iterate over all sort variables
        for index_var, AL_variable in enumerate(AL_VARIABLES):

            # plot passive learning training losses
            ax[index_var, plot_column].plot(
                PL_val, 
                color='b', 
                linestyle='--', 
                label=legend_PL_val
            )
            
            if index_var == 0:
                # set column titles
                cols = [
                    'Validation losses with δ=1\n {}'.format(AL_variable), 
                    'Validation losses with δ=0\n {}'.format(AL_variable)
                ]
                for axes, col in zip(ax[0], cols):
                    axes.set_title(col)
            else:
                
                # set title
                ax[index_var, 0].set_title(AL_variable)
                ax[index_var, 1].set_title(AL_variable)

            # iterate over all methods of currently iterated sort variable
            for index_method, method in enumerate(AL_VARIANTS):

                # create the column name for iterated validation loss
                col_name_val = (
                    pred_type 
                    + ' ' 
                    + AL_variable 
                    + ' ' 
                    + method 
                    + ' val'
                )
                
                val_history = result[col_name_val][9:].dropna().values
                budget = result[col_name_val][2]
                sensor = result[col_name_val][3]
                RF_loss = result[col_name_val][5]
                AL_loss = result[col_name_val][6]
                AL_accuracy = 1 - min(1, AL_loss /RF_loss)
                
                # create the legends
                legend_val = 'AL {}  {:.0%} data  {:.0%} sensors  {:.0%} accuracy'.format(
                    method, 
                    budget, 
                    sensor, 
                    AL_accuracy
                )
          
                # plot iterated validation losses
                ax[index_var, plot_column].plot(
                    val_history, 
                    label=legend_val
                )
                
                # set legends
                ax[index_var, plot_column].legend(
                    loc='best', 
                    frameon=False,
                    fontsize=FONTSIZE-2
                )

                # set y-axis labels
                ax[index_var, plot_column].set_ylabel(
                    'L2 loss [kW²]', 
                    fontsize=FONTSIZE+3
                )
                

        # set x-axis labels
        ax[index_var, plot_column].set_xlabel(
            'epoch', 
            fontsize=FONTSIZE+3
        )

    # set layout tight
    fig.tight_layout()

    # create saving paths 
    saving_path = (
        path_to_saving_lossesvsall 
        + pred_type 
        + '.pdf'
    )

    # save figures
    fig.savefig(saving_path)

## 5. Heuristics subsampling


## 6. Heuristics points per cluster


## 7. Heuristics query by coordinate

## 8. Query sequence importance

Here, we compare the training and validation losses of our active learning models against learning from the same data but in a randomized sequence. 

In [8]:
%%capture 
# prevents figures being printed out if used at begining of cell

# set the fontsize for figures
mpl.rcParams.update({'font.size': FONTSIZE})

# create list of custom lines for custom legend
custom_lines = [
    Line2D([0], [0], color='b', linestyle="--"),
    Line2D([0], [0], color='b')
]

# create color list for plots of same AL variant to have same color
color_list = [
    '#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', 
    '#8c564b',  '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'
]

# create a counter over the list of numeric results
result_index_counter = 0

# iterate over all considered prediction types
for index_pred, pred_type in enumerate(PRED_TYPE_LIST):
    
    fig_valup0, ax_valup0 = plt.subplots(
        len(AL_VARIABLES), 
        2, 
        figsize=(
            20, 
            len(AL_VARIABLES) * WIDTH_FACTOR
        )
    )
    
    fig_valup1, ax_valup1 = plt.subplots(
        len(AL_VARIABLES), 
        2, 
        figsize=(
            20, 
            len(AL_VARIABLES) * WIDTH_FACTOR
        )
    )
    
    # iterate over all considered parameter constellations
    for index_param, parameter in enumerate(PARAMETER_LIST):

        delta = int(parameter[5])
        valup = int(parameter[12])
        
        # get results df corresponding to currently iterated parameter and pred_type
        AL_result = results_list[
            result_index_counter
        ]
        seqimportance_result = seqimportance_list[
            result_index_counter
        ]
        
        # increment result index counter
        result_index_counter += 1
        
        # set wanted plot column
        if delta == 1:
            plot_column = 0
        else:
            plot_column = 1
        
        if valup == 0:
            fig = fig_valup0
            ax = ax_valup0
        else:
            fig = fig_valup1
            ax = ax_valup1
        
        title_counter = 0
        # iterate over all AL variables
        for index_var, AL_variable in enumerate(AL_VARIABLES):
      
            if index_var == 0:
                # set column titles
                cols = [
                    'Validation losses with δ=1\n {}'.format(AL_variable), 
                    'Validation losses with δ=0\n {}'.format(AL_variable)
                ]
                for axes, col in zip(ax[0], cols):
                    axes.set_title(col)
            else:
                
                # set title
                ax[index_var, 0].set_title(AL_variable)
                ax[index_var, 1].set_title(AL_variable)

            
            # iterate over all AL variants
            for index_method, AL_variant in enumerate(AL_VARIANTS):
                
                # create the column name for iterated validation loss
                col_name_val = (
                    pred_type 
                    + ' ' 
                    + AL_variable 
                    + ' ' 
                    + AL_variant 
                    + ' val'
                )
                
                # get validation losses for AL
                AL_val_history = (
                    AL_result[col_name_val][9:].dropna().values
                )
                
                # get validation losses with randomized sequence tests
                seqimportance_val_history = (
                    seqimportance_result[col_name_val][1:].dropna().values
                )
                
                # plot iterated losses
                ax[index_var, plot_column].plot(
                    AL_val_history, 
                    color=color_list[index_method]
                )
                ax[index_var, plot_column].plot(
                    seqimportance_val_history, 
                    color=color_list[index_method], 
                    linestyle="--"
                )
                
                # set y-axis labels
                ax[index_var, 0].set_ylabel(
                    'L2 loss [kW²]', 
                    fontsize=FONTSIZE+3
                )
                
                
            # set legend
            ax[index_var, 0].legend(
                custom_lines, 
                ['random sequence', 'original sequence'], 
                loc="best", 
                frameon=False
            )
            ax[index_var, 1].legend(
                custom_lines, 
                ['random sequence', 'original sequence'], 
                loc="best", 
                frameon=False
            )
                
        # set x-axis
        ax[index_var, 0].set_xlabel(
            'epoch', 
            fontsize=FONTSIZE+3
        )
        ax[index_var, 1].set_xlabel(
            'epoch', 
            fontsize=FONTSIZE+3
        )

    # create saving paths 
    saving_path_valup0 = (
        path_to_saving_seqimportance 
        + pred_type
        + ' valup0.pdf'
    )

    saving_path_valup1 = (
        path_to_saving_seqimportance 
        + pred_type 
        + ' valup1.pdf'
    )

    # set layout tight
    fig_valup0.tight_layout()
    fig_valup1.tight_layout()

    # save figures
    fig_valup0.savefig(saving_path_valup0)
    fig_valup1.savefig(saving_path_valup1)

## 9. Manuscript figure: Results summary


## 10. Manuscript figure: Heuristics summary