# Bootstrap multiple comparisons tutorial (WSC18)

This Jupyter _Python 3_ notebook has been written to accompany the WSC18 paper:

**PRACTICAL CONSIDERATIONS IN SELECTING THE BEST SET OF SIMULATED SYSTEMS**  _by Christine Currie and Tom Monks_.

The notebook provides a worked example of using BootComp to conduct a 2 stage screening and search of a simulation model.  

## 1. Preamble

### 1.1. Detail of the simulation model

The simulation model was used in a 2017 project in the UK to help a hospital, a community healthcare provider and a clinical commissioning group design and plan a new community rehabilitation ward.  In the UK, patients who require rehabilitation are often stuck in a queuing system where there must wait (inappropriately) in a acute hospital bed for a space in the rehabilitaiton ward.  The model investigated the sizing of the new ward in order to minimise patient waiting time whilst meeting probabilitic constraints regarding ward occupancy (bed utilization) and the number of transfers between single sex bays.

<img src="images/DToC.jpg" alt="Delayed Transfers of Care Model" title="Simulation Model and KPIs" />

### 1.2. Output data

The output data for the example analysis are bundled with git repository.  There are three .csv files in the data/ directory for 'waiting times', 'utilization' and 'transfers'.  

The model itself is not needed.  There are 50 replications of 1151 competing designs points.  Users can vary the number of replications used in the two stage procedure.  

The experimental design is also included for reference.

## 2. Prerequisites

### 2.1. BootComp Modules

In [1]:
import Bootstrap as bs
import BootIO as io
import ConvFuncs as cf

In [2]:
#WSC18 specific
import Bootstrap_crn as crn
from Bootstrap_crn import bootstrap_chance_constraint

### 2.2. Python Data Science Modules

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## 3. Procedure: Stage 1

** Optimization Parameters **

N_BOOTS = no. bootstraps to perform

** Stage 1 **

$n_1$ = no. stage 1 indeptendent replications for each systems / competing design

$p_1$ = percentage of bootstrap samples that must meet chance constraint in stage 1

$y_1$ proportion of bootstrap samples of primary KPI that must be within $x_1$ percent of the best system

** Stage 2 **

$n_2$ = no. stage 2 independent replications for each system / competigin design

$p_1$ = percentage of bootstrap samples that must meet chance constraint in stage 2

$y_1$ proportion of bootstrap samples of primary KPI that must be within $x_1$ percent of the best system in stage 2

In [4]:
N_BOOTS = 1000
n_1 = 5
n_2 = 45

gamma_1 = 0.7
x_1 = 0.1 
y_1 = 0.95

gamma_2 = 0.95
x_2 = 0.05
y_2 = 0.95


** Chance constraints **

In [5]:
min_util = 80 # ward occupancy >= 80%
max_tran = 50 # transfers between single sex bays <= 50

### Step 1: Read in initial $ n_1 $  replications

In [6]:

INPUT_DATA1 = "data/replications_wait_times.csv"
INPUT_DATA2 = "data/replications_util.csv"
INPUT_DATA3 = "data/replications_transfers.csv"
DESIGN = "data/doe.csv"

In [7]:
system_data_wait = crn.load_systems(INPUT_DATA1, exclude_reps = 50-n_1)
system_data_util = crn.load_systems(INPUT_DATA2, exclude_reps = 50-n_1)
system_data_tran = crn.load_systems(INPUT_DATA3, exclude_reps = 50-n_1)

N_SCENARIOS = system_data_wait.shape[1]
N_REPS = system_data_wait.shape[0]

print("Loaded waiting time data. {0} systems; {1} replications".format(system_data_wait.shape[1], system_data_wait.shape[0]))
print("Loaded utilzation data. {0} systems; {1} replications".format(system_data_util.shape[1], system_data_util.shape[0]))
print("Loaded transfers data. {0} systems; {1} replications".format(system_data_tran.shape[1], system_data_tran.shape[0]))

Loaded waiting time data. 1051 systems; 5 replications
Loaded utilzation data. 1051 systems; 5 replications
Loaded transfers data. 1051 systems; 5 replications


In [8]:
df_tran = pd.DataFrame(system_data_tran)
df_util = pd.DataFrame(system_data_util)
df_wait = pd.DataFrame(system_data_wait)

### Step 2: Limit to systems that satisfy chance constraints

Bootstrap function arguments

In [9]:
args =  bs.BootstrapArguments()

args.nboots = N_BOOTS
args.nscenarios = N_SCENARIOS
args.point_estimate_func = bs.bootstrap_mean


#### Chance constraint 1:  Utilisation Threshold (value for money)

In [10]:
passed_1 = bootstrap_chance_constraint(data = system_data_util.T, threshold=min_util, boot_args=args, gamma=gamma_1)

In [None]:
passed_1

Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            421, 422, 423, 424, 425, 426, 427, 428, 429, 430],
           dtype='int64', length=429)

#### Chance constraint 2: Upper bound on transfers between bays

In [None]:
passed_2 = bootstrap_chance_constraint(data = system_data_tran.T, threshold=max_tran, boot_args=args, 
                                       gamma=gamma_1, kind='upper')

In [None]:
passed_2

#### Filter for systems that meet all chance constraints

In [None]:
subset = np.intersect1d(passed_1, passed_2)
subset

In [None]:
subset.shape

In [None]:

subset_waits = df_wait[subset].mean()
subset_waits.rename('wait', inplace=True)
subset_utils = df_util[subset].mean()
subset_utils.rename('util', inplace=True)
subset_tran = df_tran[subset].mean()
subset_tran.rename('tran', inplace=True)

List and rank the systems along with their peformance measures

In [None]:
subset_kpi = pd.concat([subset_waits, subset_utils, subset_tran], axis=1)

In [None]:
subset_kpi.sort_values(by=['wait', 'util', 'tran'])

In [None]:
best_system_index = subset_kpi.sort_values(by=['wait', 'util', 'tran']).index[0]

In [None]:
best_system_index

### Step 3: setup differences

In [None]:
feasible_systems = df_wait[subset]

In [None]:
feasible_systems

In [None]:
diffs =  pd.DataFrame(feasible_systems.as_matrix().T - np.array(feasible_systems[best_system_index])).T
diffs.columns = subset

### Step 4: Quality Bootstrap i.e. Simple bootstrap of differences

In [None]:
resample_diffs = bs.resample_all_scenarios(diffs.values.T.tolist(), args)

In [None]:
df_boots_diffs= cf.resamples_to_df(resample_diffs, N_BOOTS)
df_boots_diffs.columns = subset
df_boots_diffs.shape

### Step 6: Rank systems  

In [None]:
indifference = feasible_systems[best_system_index].mean() * x_1
indifference

In [None]:
#convert numbers to 0 or 1
# 1 = difference less than 0.244
# 0 = difference greater than 0.244

def indifferent(x, indifference):
    """
    
    """
    if x <= indifference:
        return 1
    else:
        return 0

In [None]:
df_indifference = df_boots_diffs.applymap(lambda x: indifferent(x, indifference))
df_indifference

### Step 7: Define set $J$ where y% of bootstraps are within x% of the best mean

In [None]:
threshold = N_BOOTS * y_1
df_within_limit = df_indifference.sum(0)
df_within_limit= pd.DataFrame(df_within_limit, columns=['sum'])
take_forward = df_within_limit.loc[df_within_limit['sum'] >= threshold].index

In [None]:
take_forward

In [None]:
no_stage1 = take_forward.shape[0]

_Quick look at stage 1 results_

In [None]:

df_doe = pd.read_csv(DESIGN, index_col='System')
df_doe.index -= 1
subset_kpi=  subset_kpi[subset_kpi.index.isin(take_forward)]
temp = df_doe[df_doe.index.isin(take_forward)]
df_stage1 = pd.concat([temp, subset_kpi], axis=1)
df_stage1.sort_values(by=['wait', 'util', 'tran'])

## 4. Procedure - Stage 2

### Step 8: More replicates of promicing solutions using Common Random Numbers

User simulates $ n_2 $ additional replicates for the feasible solutions brought forward from stage 1.

Example = 50 replicates (45 extra)

In [None]:
df_wait_s2 = pd.DataFrame(crn.load_systems(INPUT_DATA1))[take_forward]
df_util_s2 = pd.DataFrame(crn.load_systems(INPUT_DATA2))[take_forward]
df_tran_s2 = pd.DataFrame(crn.load_systems(INPUT_DATA3))[take_forward]

N_SCENARIOS = df_wait_s2.shape[1]
N_REPS = df_wait_s2.shape[0]

print("Loaded waiting time data. {0} systems; {1} replications".format(df_wait_s2.shape[1], df_wait_s2.shape[0]))
print("Loaded utilzation data. {0} systems; {1} replications".format(df_util_s2.shape[1], df_util_s2.shape[0]))
print("Loaded transfers data. {0} systems; {1} replications".format(df_tran_s2.shape[1], df_tran_s2.shape[0]))

### Step 9: Repeat steps 2 - 6 from stage 1

#### Step 2 - Chance contraints

In [None]:
passed_1 = bootstrap_chance_constraint(data = df_util_s2.values.T, threshold=min_util, boot_args=args, gamma=gamma_2)

In [None]:
passed_1

In [None]:
take_forward

In [None]:
cc_1 = np.array([take_forward[x] for x in passed_1])
cc_1

In [None]:
cc_1.shape

In [None]:
passed_2 = bootstrap_chance_constraint(data = df_tran_s2.values.T, threshold=max_tran, boot_args=args, gamma=gamma_2, kind='upper')

In [None]:
passed_2

In [None]:
cc_2 = np.array([take_forward[x] for x in passed_2])
cc_2

In [None]:
cc_2.shape

In [None]:
subset = np.intersect1d(cc_1, cc_2)
subset

In [None]:
subset.shape

In [None]:
def get_subset_kpi(subset):
    subset_waits = df_wait_s2[subset].mean()
    subset_waits.rename('wait', inplace=True)
    subset_utils = df_util_s2[subset].mean()
    subset_utils.rename('util', inplace=True)
    subset_tran = df_tran_s2[subset].mean()
    subset_tran.rename('tran', inplace=True)
    
    subset_kpi = pd.concat([subset_waits, subset_utils, subset_tran], axis=1)
    subset_kpi.index.rename('System', inplace=True)
    
    return subset_kpi

In [None]:

subset_waits = df_wait_s2[subset].mean()
subset_waits.rename('wait', inplace=True)
subset_utils = df_util_s2[subset].mean()
subset_utils.rename('util', inplace=True)
subset_tran = df_tran_s2[subset].mean()
subset_tran.rename('tran', inplace=True)

In [None]:
subset_kpi = pd.concat([subset_waits, subset_utils, subset_tran], axis=1)
subset_kpi.index.rename('System', inplace=True)

In [None]:
subset_kpi.sort_values(by=['wait', 'util', 'tran'])


In [None]:
best_system_index = subset_kpi.sort_values(by=['wait', 'util', 'tran']).index[0]

In [None]:
best_system_index

### Step [?]  Setup differences from best (stage 2)

In [None]:
feasible_systems = df_wait_s2[subset]
diffs =  pd.DataFrame(feasible_systems.as_matrix().T - np.array(feasible_systems[best_system_index])).T
diffs.columns = subset

### Bootstrap differences

In [None]:
resample_diffs = bs.resample_all_scenarios(diffs.values.T.tolist(), args)


In [None]:
df_boots_diffs= cf.resamples_to_df(resample_diffs, args.nboots)
df_boots_diffs.columns = subset
df_boots_diffs.shape

In [None]:
indifference = feasible_systems[best_system_index].mean() * x_2
indifference

In [None]:
df_indifference = df_boots_diffs.applymap(lambda x: indifferent(x, indifference))
df_indifference

In [None]:
threshold = args.nboots * y_2
df_within_limit = df_indifference.sum(0)
df_within_limit= pd.DataFrame(df_within_limit, columns=['sum'])
final_set = df_within_limit.loc[df_within_limit['sum'] >= threshold].index

In [None]:
final_set

Final set of feasible systems selected from the competing designs

In [None]:
df_doe = pd.read_csv(DESIGN, index_col='System')
df_doe.index -= 1
#subtract 1 from index so taht it matches zero indexing in analysis.



In [None]:
subset_kpi = get_subset_kpi(final_set)
subset_kpi

In [None]:
temp = df_doe[df_doe.index.isin(final_set)]
#subset_kpi = subset_kpi.applymap(lambda x: '%.4f' % x)
df_final = pd.concat([temp, subset_kpi], axis=1)
df_final.sort_values(by=['wait', 'util', 'tran'])


In [None]:
print('No. in final set {0}'.format(df_final.shape[0]))

In [None]:
print('No. taken forward from stage 1: {0}'.format(no_stage1))

In [None]:
df_final.to_clipboard(excel=True)

## Charts for paper

In [None]:
df_doe = pd.read_csv(DESIGN, index_col='System')
df_doe.index -= 1

Utilisation

In [None]:
temp = df_doe.loc[df_doe['Number of Bays']==0]
#temp.index += 1
subset_waits = df_wait[temp.index].mean()
subset_waits.rename('wait', inplace=True)
subset_utils = df_util[temp.index].mean()
subset_utils.rename('util', inplace=True)
subset_trans = df_tran[temp.index].mean()
subset_trans.rename('tran', inplace=True)



subset_utils_sem = df_util[temp.index].sem()
subset_utils_sem.rename('util_sem', inplace=True)

subset_utils_count = df_util[temp.index].count()
subset_utils_count.rename('n_util', inplace=True)

import scipy as sp
import scipy.stats

subset_kpi = pd.concat([temp, subset_waits, subset_utils, subset_trans, subset_utils_sem, subset_utils_count], axis = 1)
subset_kpi['Waiting Time (hrs)'] = round(subset_kpi['wait']*24, 2)

confidence = 0.95

subset_kpi['hw_95'] = subset_kpi['util_sem'] * sp.stats.t.ppf((1+confidence)/2., subset_kpi['n_util']-1)

#fig = plt.figure()
#ax = fig.add_subplot(111)
fig, axes = plt.subplots(nrows=1, ncols=2, sharey=False)

subset_kpi.sort_values('util').plot(y = 'util', x= 'Number of Singles', figsize=(20, 8), fontsize = 14, 
                                    linewidth=3, legend =False, kind='scatter', ax=axes[1], xticks=[x for x in range(43, 56, 1)], xlim=(42, 56), yerr='hw_95')#, xlim=(70, 92), ylim=(0, 35))
axes[1].set_ylabel('Mean Waiting Time (hrs)', fontsize = 14)
axes[1].set_xlabel('Number of Singles (beds)', fontsize = 14)




subset_kpi.plot('Number of Singles', 'Waiting Time (hrs)', figsize=(20, 8), fontsize = 14, 
                                    linewidth=3, legend =False, kind='line', ms=10, style='o-', ax=axes[0], xlim=(42, 56),
                                    xticks=[x for x in range(43, 56, 1)])
axes[0].set_xlabel('No of Singles (beds)', fontsize = 14)
axes[0].set_ylabel('Mean Utilization (% beds)', fontsize = 14)
axes[0].grid(True)
axes[1].grid(True)
axes[1].legend(['mean (n=5)','95% Confidence Interval'],fontsize=14)
#plt.tight_layout()

In [None]:
fig.savefig("chance_constraint_stage1.pdf", format = 'pdf', dpi=300, bbox_inches='tight')

In [None]:
subset_kpi

Patient transfers between bays of beds

In [None]:
temp=df_doe.loc[df_doe['Total beds']<=54]
#temp.index += 1
subset_waits = df_wait[temp.index].mean()
subset_waits.rename('wait', inplace=True)
subset_utils = df_tran[temp.index].mean()
subset_utils.rename('tran', inplace=True)

subset_kpi = pd.concat([temp, subset_waits, subset_utils], axis = 1)
subset_kpi['Waiting Time (hrs)'] = round(subset_kpi['wait']*24, 2)

subset_kpi.head()

In [None]:
import scipy as sp
import scipy.stats

means = subset_kpi.groupby(['Size of Bays'])['tran'].mean()
means.rename('mean', inplace=True)
sems = subset_kpi.groupby(['Size of Bays'])['tran'].sem()
sems.rename('sem', inplace=True)
counts = subset_kpi.groupby(['Size of Bays'])['tran'].count()
counts.rename('n', inplace=True)




transfers = pd.concat([means, sems, counts], axis=1)
confidence = 0.95

transfers['hw_95'] = transfers['sem'] * sp.stats.t.ppf((1+confidence)/2., transfers['n']-1)

#fig = plt.figure()
#ax = fig.add_subplot(111)
fig, axes = plt.subplots(nrows=1, ncols=1)
transfers = transfers.loc[transfers.index >0]
transfers = transfers.loc[transfers.index <27]
transfers.plot(y='mean', x=transfers.index, figsize=(20, 8), fontsize = 14, 
                              linewidth=3, legend =False, kind='line', ax=axes, yerr='hw_95'
              , xlim=(2, 27), xticks=[x for x in range(3, 26, 2)])

axes.set_xlabel('Bay Size (beds)', fontsize = 14)

axes.set_ylabel('Mean Bay Transfers', fontsize = 14)

In [None]:
transfers