# Bootstrap multiple comparisons tutorial

This notebook provides a step-by-step guide to using BootComp to perform multiple comparisons of simulation output data

## Preamble

To help with management and house keeping BootComp is split up over a number of Python modules.  We need to import these into the notebook before we can run any analysis.

1. Bootstrap = core bootstrap routines for resampling and calculating test statistics. 
2. BootIO = file input and output functions as well as functions for displaying results on screen
3. BootChartExtensions - MatPlotLib functions for displaying comparions using charts
4. MCC - multiple comparison utility functions e.g. Bonferroni adjustment
5. ConvFuncs - utility module contains functions for converting between datatypes

In [None]:
import Bootstrap as bs
import BootIO as io
#import BootChartExtensions as ch
import MCC as mcc
import ConvFuncs as cf


We are also going to make use of a Python Data Science Library called PANDAS

In [None]:
import pandas as pd

## 1. Parameters and setup

In [None]:
N_BOOTS = 2000
INPUT_DATA = "data/real_scenarios.csv"

Scenarios need to be in a .csv file.  Each scenario needs to have the same number of replications/batch means.

In [None]:
#note BootStrap routines require lists of lists
scenario_data = cf.list_of_lists(io.load_scenarios(INPUT_DATA))
N_SCENARIOS = len(scenario_data)
print("Loaded data. {0} scenarios".format(N_SCENARIOS))

Before we run the bootstrap we need to setup a BootstrapArguments class.  The arguments class is a a container class that encapsulates all arguments for the bootstrap.  Some of these arguments are simple value based ones e.g. the number of bootstrap resamples and others are complex functions e.g. the test statisic we will calculate on each bootstrap resample.  

The ones we are going to use in this example are:

1. nboot = the number of bootstrap resamples to run.
2. nscenarios = the number of scenarios to compare
3. ncomparisons = the number of simulataneious comparisons 
4. confidence = the the confidence level we will use within any 100%(1-alpha) Confidence interval we contruct
5. point_estimate_func = A function that takes a set of bootstrap samples as parameter and calculates a point estimate.        e.g the mean or standard deviation.  signature = func(data, args).
6. test_statistic_function = a function that calculates a test statistic for comparing two scenarios (e.g.             difference). method signature is func(first_scenario_data, second_scenario_data)
7. comp_function = how do you want to analyse the test statistic?  Percentile CIs? Proportion Sj > Si?  Or graphically?  Specifiy a function of signature func(data, args).

For this example we will use a test statistic of the mean difference and the comparison will be the proportion of Sj > Si

In [None]:
args =  bs.BootstrapArguments()

args.nboots = N_BOOTS
args.nscenarios = N_SCENARIOS

args.point_estimate_func = bs.bootstrap_mean
args.test_statistic_function = bs.boot_mean_diff
args.comp_function = bs.proportion_x2_lessthan_x1


## 2. Analysis procedures

### 2.1 Resampling

To generate a dataset containing resampled test statistics for each scenario you need to run the resample_all_scenarios function from the Bootstrap module.  It takes two parameters i.) the scenario data ii.) A BootstrapArguments object.

The first example performs 10 bootstrap resamples of the mean for illustration of the data returned.  


In [None]:
boot_data = bs.resample_all_scenarios(scenario_data, args)

boot_data is a list of lists.  Each lists contains the resampled point estimates from the simulation experiment/scenario.  To visualise this data we are going to use a PANDAS DataFrame object.  When displayed each column represents a experiment/scenario and each row represents a bootstrap sample.   

In [None]:
df = pd.DataFrame(boot_data, columns = [str(i) for i in range(1, args.nboots+1)])
df.index += 1
df2 = df.transpose()
df2.to_csv("resamples.csv")

### 2.2. Comparing scenarios

Use the compare_scenarios_pairwise function to conduct an all pairwise comparison of the bootstrapped scenarios.

In [None]:
results = bs.compare_scenarios_pairwise(boot_data, args) 

### 2.3. Printing the results of a comparison

There are two options for printing results to the screen.  For a large number of comparisons it is receommended that you use 'long format'

In [None]:
io.print_long_format_comparison_results(results)

For a small number of comparisons it is recommended that you use matrix format.

In [None]:
matrix = io.results_to_matrix(results)

To include the missing cells in the comparison table above use the insert_inverse_results function

In [None]:
io.insert_inverse_results(matrix, args.nscenarios)
df = io.matrix_to_dataframe(matrix, io.scenario_headers(args.nscenarios))
df.style.applymap(io.colour_cells_by_proportion)

### 2.4. Writing results to file

Once the results are in matrix format you can write these to file with a single function.  The file is written to the working directory and is called results_matrix.csv. The results can be opened in Excel (or otherwise).

In [None]:
io.write_results_matrix(matrix, io.scenario_headers(args.nscenarios))

### 2.5 Visual comparisons

If required a visual comparison of scenarios could be conducted. Chart based comparison functions are contained in the BootChartExtensions module.  Here we illustrate plotting the PDF of mean differences.

In [None]:
#args.comp_function = ch.plot_boostrap_samples_pdf
#bs.compare_scenarios_pairwise(boot_data, args) 

Here we illustrate plotting the CDF of mean differences.

In [None]:
#args.comp_function = ch.plot_boostrap_samples_cdf
#bs.compare_scenarios_pairwise(boot_data, args) 

## 2.6 Ranking Systems

### 2.6.1 Best systems - proportion of resamples

To help identify the top systems found in experimentation use Bootstrap.rank_systems_min and Bootstrap.rank_systems_max.  These return a frequency table reporting the number of proportion of bootstrap resamples where system i was the best (minimum or maximium).  Results exclude systems that never came top.

In [None]:
help(bs.rank_systems_min)

In [None]:
df_boots = cf.resamples_to_df(boot_data, args.nboots)  #put the bootstrap data into a Pandas.DataFrame
bs.rank_systems_min(df_boots, args)

### 2.6.2 Proportion of time system is the best m systems

Alternatively you can find the proportion of resamples where a system was in the best m systems.
I.e. What were the proportion of 1000 bootstrap resamples where the results for system 1 was in the smallest 5 of all of the systems.  

There are functions:
1. Bootstrap.rank_systems_msmallest(DataFrame, BootstrapArguments, m)
2. Bootstrap.rank_systems_mlargest(DataFrame, BootstrapArguments, m)

Note that when m = 1 these functions are equivalent to those introduced in section 2.6.1

For this example the output measure related to waiting times.  We therefore use function 1 above converned with finding systems that consistently rank amoung the lowest 5 waiting times.

In [None]:
help(bs.rank_systems_msmallest)

In [None]:
msmallest = bs.rank_systems_msmallest(df_boots, args, 5)
msmallest

### 2.6.2 Rerunning comparisons using 'the best' subset of all systems

The ranking analysis from 2.6.1. provides a shortlist of the systems that consistently appear in the best m systems.  This information can be used to perform a comparison of the subset of systems with the most promicing outputs.  First we need to do two things:
1.) Get the list of system/scenario ids as a list
2.) Get the subset of bootstrap resamples corresponding to this subset.  

In [None]:
subset_indexes = msmallest.index.values.tolist()
subset = cf.subset_of_list(boot_data, subset_indexes)

Next we update the BootStrapArguments obeject: i.e. the number of scenarios we are working with and the type of comparison we want to do. Then run the compare_scenarios_pairwise function

In [None]:
args.nscenarios = len(subset)
args.comp_function = bs.proportion_x2_lessthan_x1
results = bs.compare_scenarios_pairwise(subset, args) 
matrix = io.results_to_matrix(results) 
io.insert_inverse_results(matrix, args.nscenarios)

In [None]:
df = io.matrix_to_dataframe(matrix, [str(i) for i in subset_indexes])
df.style.applymap(io.colour_cells_by_proportion)

To code below limits the comparison to the top 10 ranked systems.

In [None]:
subset_indexes = msmallest.index.values.tolist()[:10]
subset = cf.subset_of_list(boot_data, subset_indexes)
args.nscenarios = len(subset)
args.comp_function = bs.proportion_x2_lessthan_x1
results = bs.compare_scenarios_pairwise(subset, args) 

matrix = io.results_to_matrix(results) 
io.insert_inverse_results(matrix, args.nscenarios)

df = io.matrix_to_dataframe(matrix, [str(i) for i in subset_indexes])

df.style.applymap(io.colour_cells_by_proportion)

In [None]:
from statistics import mean
subset = cf.subset_of_list(scenario_data, subset_indexes)
[mean(i) for i in subset]