# Loading Experiment Data

In this notebook, we start by loading the data collected while running different experiment-wares, and perform some preprocessing on this data to allow its use for further analysis in dedicated notebooks.

## Imports

We first need to import the modules we need to load the data.
In particular, we must obviously import *Metrics-Wallet*, which we will use to deal with our data.

In [None]:
from itertools import product
from metrics.wallet import BasicAnalysis

## Reading the data

The next step is to read the data from the log files produced by our different experiment-wares.
This data is described in the file [`{{ config_file }}.yml`](config/{{ config_file }}.yml), and automatically parsed by *Metrics-Scalpel* to create a `BasicAnalysis` object.

In [None]:
analysis = BasicAnalysis(input_file='config/{{ config_file }}.yml', log_level='WARNING')

The `BasicAnalysis` object instantiated above provides elementary and general methods for preprocessing our data before actually analyzing the results (which will require more specific methods as it can be seen in the dedicated notebooks).

An important thing to do now is to visualize the collected data, to make sure that everything was properly read.
This can be achieved by looking at the data-frame that has been built inside the `BasicAnalysis` object.

In [None]:
analysis.data_frame

**TODO: CHECK THAT EVERYTHING IS INDEED OK BEFORE PREPROCESSING THE DATA!**

## More meaningful names for experiment-wares

Currently, the names of the experiment-wares correspond to those extracted by *Metrics-Scalpel*.
While they are sufficient to discriminate them in the analysis, they are not necessarily meaningful.
It is possible to replace them by more descriptive names, that may even contain LaTeX code if you want pretty names in your papers.
To do so, replace the content of the following dictionary by using the previous name of each experiment-ware as key and its new name as value.

In [None]:
name_map = {
    xp_ware: xp_ware for xp_ware in analysis.experiment_wares()
}

Based on the map above, we can easily replace the name of the experiment-wares as follows.

In [None]:
analysis = analysis.add_variable(
    new_var='experiment_ware', 
    function=lambda xp: name_map[xp['experiment_ware']]
)

## General data preprocessing

**TODO: ADD HERE ALL THE CODE YOU NEED TO FIX POTENTIAL PROBLEMS WITH THE COLLECTED DATA (E.G., MISSING VALUES, TYPOS, ETC.).**

We can now check that the changes have been taken into account by having a look to the data-frame of the analysis.

In [None]:
analysis.data_frame

## Checking the success and consistency of the results

During our analysis, we will need to know whether a given experiment was successful.
As an example, we provide below the code to check the success of an optimization solver.

In [None]:
def is_success(xp):
    """
    This function checks that a solver either proved the optimality of its best
    bound within the time limit, or proved the input to be unsatisfiable.
    """
    return xp['decision'] == 'OPTIMUM FOUND' or xp['decision'] == 'UNSATISFIABLE'

To make sure that our experiments are consistent, we also need to compare the results obtained by the different experiment-wares.
As an example, we provide below the code to check that if different optimization solvers claim to have found an optimal value, this value must be the same for all solvers.

In [None]:
def is_consistent_by_input(df_input):
    """
    This function checks that the pairwise comparison between two different
    optimal bounds found on the same input is small enough to consider these bounds as consistent.
    """
    # Checking the decision of the solvers.
    decisions = df_input['decision'].unique()
    if 'OPTIMUM FOUND' in decisions and 'UNSATISFIABLE' in decisions:
        # A solver has found an optimal solution while another proved unsatisfiability.
        return False
    if 'SATISFIABLE' in decisions and 'UNSATISFIABLE' in decisions:
        # A solver has found a solution while another proved unsatisfiability.
        return False

    # Checking that at most one optimal value exists.
    best_values_for_complete_search = df_input[df_input['success']]['best_bound'].unique()

    # Checking that there is no better value than the optimal one.
    if df_input['objective'].unique()[0] == 'min':
        best_global_value = df_input['best_bound'].min()
    else:
        best_global_value = df_input['best_bound'].max()

    return best_global_value is None or \
           len(best_values_for_complete_search) <= 1 and \
           best_values_for_complete_search[0] != best_global_value

We can now use the functions above to check the consistency of the different experiments in the analysis.

In [None]:
analysis.check_success(is_success)
analysis.check_input_consistency(is_consistent_by_input)

**TODO: CHECK THAT NO WARNING IS RAISED.**

**TODO: LOOK AT THE FOLLOWING TABLE TO UNDERSTAND WHY WARNINGS WERE RAISED.**

**TODO: IN BOTH CASES, ADD A COMMENT HERE TO CONFIRM THAT THERE IS NO PROBLEM OR WHY THERE ARE PROBLEMS.**

In [None]:
analysis.error_table()

## Summary and export of the analysis

We can now give a summary of the analysis, that we obtain through the following table.

In [None]:
analysis.description_table()

Finally, the analysis is exported, both to share the data to allow the reproducibility of the analysis, and to reuse it in other notebooks dedicated to more specific analyses.

In [None]:
analysis.export('.cache')