# Comparison of BootComp against OBCA-m on single objective problems

This notebook reruns some of the analysis published with OBCA-m, using common random numbers (CRN) and the BootComp algorithm.

All problems are single objective with no chance constraints.

* The first problem has 10 system designs with a constant variance.  That task is to return the top 3 (largest) systems.

* The second problem is taken from the Law inventory problem, but is adapted to operate with common random numbers

In [2]:
import numpy as np
import pandas as pd
from numba import jit

As all problems used in the comparison do not have chance constraints then only the `quality_bootstrap()` function is needed

In [3]:
from bootcomp.bootstrap import quality_bootstrap

## Functions for creating the data used in the experiments

In [4]:
@jit(nopython=True)
def bootstrap(data, boots=1000):
    """
    Returns a numpy array containing the bootstrap resamples
    Useful for creating a large number of experimental datasets 
    for testing R&S routines
    
    Keyword arguments:
    data -- numpy.ndarray of systems to boostrap
    boots -- integer number of bootstraps (default = 1000)
    """
    experiments = boots
    designs = 10
    samples = data.shape[1]

    datasets = np.zeros((experiments, designs, samples))
     
    for exp in range(experiments):
        
        for design in range(designs):

            for i in range(samples):

                datasets[exp][design][i] = data[design][round(np.random.uniform(0, samples)-1)]
      
    return datasets

In [5]:
@jit(nopython=True)
def crn_bootstrap(data, boots=1000):
    """
    Returns a numpy array containing the bootstrap resamples
    Useful for creating a large number of experimental datasets 
    for testing R&S routines.  This function assumes common
    random numbers have been used to create the original data.
    
    Keyword arguments
    data -- numpy.ndarray of systems to boostrap
    boots -- integer number of bootstraps (default = 1000)
    """

    experiments = boots
    designs = data.shape[0]
    samples = data.shape[1]
    
    datasets = np.zeros((experiments, designs, samples))
     
    for exp in range(experiments):
        
         for i in range(samples):

                row = data.T[np.random.choice(data.shape[0])]
                
                for design in range(designs):
                    datasets[exp][design][i] = row[design]  
      
    return datasets

#### Use the following code to create independent samples

In [6]:
def experiments_independent_samples(ifile_name, boots=1000):
    data = np.genfromtxt(ifile_name, delimiter=",", skip_footer=0).transpose()
    experiments = bootstrap(data, boots=boots)
    return experiments

#### Use the following code to create CRN data sets

In [7]:
def experiments_dependent_samples(ifile_name, boots=1000):
    data = np.genfromtxt(ifile_name, delimiter=",", skip_footer=0).transpose()
    experiments = crn_bootstrap(data, boots=boots)
    return experiments

## Automated version of BootComp

Used in practice BootComp is run in two stages.

After an $n_0$ initial replications the user can then decide how strict to be with the quality bootstrap. This will affect the number of system designs carried For example, if user required designs to be within 5% of the best mean 95% of the time then this might only carry over m-1 designs. In these circumstances the user might be less conservative and allow designs to be within 10-20% of best.  

The automated BootComp routines attempts to mimic this decision making process.  It is given a range of tolerances to consider and returns the most conservative estimate that returns at least m designs.

In [8]:
def simulate_experiment(model, reps, systems):
    return model[:,:reps][systems]

In [9]:
def cs(selected_top_m, true_top_m):
    """Returns boolean value:
    True = correct selection of top m
    False = incorrect selection (one or more of selected top m is incorrect)
    
    Keyword arguments:
    
    selected_top_m --   numpy.array containing the indexes of 
                        the top m means selected by the algorithm
                        
    true_top_m --       numpy.array containing the indexes of 
                        the true top m means
    
    """
    return np.array_equal(np.sort(selected_top_m), true_top_m)

In [10]:
def get_budgets(max_t, min_t, increment_t):
    #incremental budgets 200, 400, .... T
    budgets = [i for i in range(min_t, max_t + increment_t, increment_t)]
    return budgets

In [11]:
def simulate_stage(model, design_indexes, reps, x, y, boots, opt='max'):
    
        
    output_data = simulate_experiment(model=model, 
                                      reps=reps, 
                                      systems=design_indexes)
    
    if opt == 'max':
        best_design_index_sub = output_data.mean(axis=1).argmax()
    else:
        best_design_index_sub = output_data.mean(axis=1).argmin()
    
    best_design_index = design_indexes[best_design_index_sub]
    
    df_output = pd.DataFrame(output_data).T
    df_output.columns = design_indexes 
    
    results = quality_bootstrap(df_output, 
                                design_indexes, 
                                best_design_index, 
                                x, y, boots)

    return results

In [12]:
def bootcomp(model, budget, m, n_0, boots, x1, y1, x2, y2, opt='max'):

    #stage 1
    #simulate n reps from all system designs
    k = model.shape[0]
    design_indexes = [i for i in range(k)]

    stage_one_results = simulate_stage(model, design_indexes,
                                      reps=n_0, x=x1, y=y1, boots=boots,
                                      opt=opt)
       
    
    #stage 2
    #equal allocation of remaining budget
    stage_two_reps = int((budget - (n_0 * k))/len(stage_one_results)) + n_0
       
    design_indexes = (stage_one_results).tolist()
        
    stage_two_results = simulate_stage(model, design_indexes,
                                       reps=stage_two_reps, 
                                       x=x2, y=y1, boots=boots,
                                       opt=opt)

    return stage_two_results.tolist(), stage_two_reps

In [13]:
def bootcomp_top_m(model, reps, design_indexes, m, opt='max'):
    
    output_data = simulate_experiment(model=model, 
                                      reps=reps, 
                                      systems=design_indexes)
    
    sorted_indexes = output_data.mean(axis=1).argsort()
    
    if opt == 'max':
        x = sorted_indexes[-m:]
        best_design_indexes = np.array(design_indexes)[x] 
        
        
    else:
        x = sorted_indexes[:m]
        best_design_indexes = np.array(design_indexes)[x] 
        
    return best_design_indexes
        
    

In [14]:
def numerical_experiment(experiments, budgets, m, n_0, true_top_m, 
                         x1, y1, x2, y2, nboots=1000, opt='max'):
    """
    Conduct a user set number of numerical experiments on the algorithm
    for different computational budgets
    
    Returns:
    1. numpy.ndarray containing P{cs} for each budget
        
    Keyword arguments:
    experiments -- numpy.ndarray[experiments][designs][replication]
    budgets -- python list containing budgets
    model_file -- string path to model 
    
    """
    n_experiments = experiments.shape[0]
    k = experiments.shape[1]  
    
    correct_selections = np.zeros((n_experiments, len(budgets)))
   
    for exp in range(n_experiments):

        for t in range(len(budgets)):

            selected_top_m, reps = bootcomp(budget=budgets[t],
                                            model=experiments[exp],
                                            m=m, 
                                            n_0=n_0, 
                                            boots=nboots, x1=x1, y1=y1,
                                            x2=x2, y2=y2, opt=opt)
            
            
            #BootComp can return > m solutions.  This function trims to top m 
            top_m = bootcomp_top_m(experiments[exp], reps, selected_top_m, m, opt)
            
            #is it the correct selection?
            correct_selections[exp][t] = cs(top_m, true_top_m)
                        
    return correct_selections
    

## Tests 1: Top 3 systems from 10

In this example we simulate 10,000 experiments of 10 competing system designs.  Each design has equal variance.  BootComp must return the 3 designs with the largest mean.

In [15]:
def experiment_1():
    T = 3000
    increment_t = 100
    min_budget = 300
    n_0 = 20
    n_experiments = 10
    x1 = 0.4 
    y1 = 0.8 # 1 - alpha
    x2 = 0.3
    y2 = 0.95 # 1-alpha
    boots=1000
    m = 3

    #specific to this implementation
    ifile_name = 'data/EG1a_CRN.csv'
    reps_available = 10000

    #info for correct selection
    true_top_m = np.array([7, 8, 9])

    #incremental budgets 200, 400, .... T
    budgets = get_budgets(T, min_budget, increment_t)
    
    #generate experimental dataset
    experiments = experiments_dependent_samples(ifile_name, 
                                                boots=n_experiments)
    
    #run numerical experiment
    css = numerical_experiment(experiments, 
                               budgets, 
                               m,
                               n_0,
                               true_top_m, 
                               x1, y1, x2, y2, boots)
    return css

In [57]:
def experiment_1_No_CRN():
    T = 8000
    increment_t = 100
    min_budget = 8000
    n_0 = 20
    n_experiments = 10000
    x1 = 0.6
    y1 = 0.8
    x2 = 0.3
    y2 = 0.95
    boots=1000
    m = 3

    #specific to this implementation
    ifile_name = 'data/EG1a.csv'
    reps_available = 10000

    #info for correct selection
    true_top_m = np.array([7, 8, 9])

    #incremental budgets 200, 400, .... T
    budgets = get_budgets(T, min_budget, increment_t)
    
    #generate experimental dataset - independent samples!
    experiments = experiments_independent_samples(ifile_name, 
                                                boots=n_experiments)
    
    #run numerical experiment
    css = numerical_experiment(experiments, 
                               budgets, 
                               m,
                               n_0,
                               true_top_m, 
                               x1, y1, x2, y2, boots)
    return css

## Tests 2: Law Inventory Example

In this example, we simulate 10,000 experiments of 9 competing system designs in the Law inventory problem. Common random numbers have been employed achieveing ~89% variance reduction. BootComp must return the 3 designs with the largest mean.

### Maximisation Test (top 3)

In [17]:
def experiment_2():
    T = 500
    increment_t = 100
    min_budget = 300
    n_0 = 20
    n_experiments = 1000
    x1 = 0.3
    y1 = 0.8
    x2 = 0.2
    y2 = 0.95
    boots = 1000
    m = 3

    #specific to this implementation
    ifile_name = 'data/EGLaw_CRN.csv'
   
    #info for correct selection
    true_top_m = np.array([6, 7, 8])

    #incremental budgets 200, 400, .... T
    budgets = get_budgets(T, min_budget, increment_t)
    
    #generate experimental dataset
    experiments = experiments_dependent_samples(ifile_name, 
                                                boots=n_experiments)
    
    #run numerical experiment
    css = numerical_experiment(experiments, 
                               budgets, 
                               m,
                               n_0,
                               true_top_m, 
                               x1, y1, x2, y2, boots)
    return css

### Minisation Example (top 2)

In [18]:
def experiment_3():
    T = 1000
    increment_t = 100
    min_budget = 300
    n_0 = 20
    n_experiments = 10
    x1 = 0.1
    y1 = 0.95
    x2 = 0.05
    y2 = 0.95
    boots=1000
    m = 2
    opt = 'min'

    #specific to this implementation
    ifile_name = 'data/EGLaw_CRN.csv'
   
    #info for correct selection
    true_top_m = np.array([1, 2])

    #incremental budgets 200, 400, .... T
    budgets = get_budgets(T, min_budget, increment_t)
    
    #generate experimental dataset
    experiments = experiments_dependent_samples(ifile_name, 
                                                boots=n_experiments)
    
    #run numerical experiment
    css = numerical_experiment(experiments, 
                               budgets, 
                               m,
                               n_0,
                               true_top_m, 
                               x1, y1, x2, y2, boots, opt=opt)
    return css

## Run Experiments

Note: If running a large number of experiments (e.g. 10,000) against a large number of budgets (e.g. 30) then expect runtimes of around 4-5 hours.

**OBCA-m dataset: k = 10, var=const, m=3, type=maximisation**

In [87]:
results1 = experiment_1()

In [88]:
results1.shape

(10, 28)

In [89]:
pd.DataFrame(results1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,18,19,20,21,22,23,24,25,26,27
0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
6,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
7,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
8,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
9,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


**OBCA-m dataset: k = 10, var=const, m=3, type=maximisation, CRN=False**

In [58]:
results1a = experiment_1_No_CRN()
results1a.shape

(10000, 1)

In [59]:
pd.DataFrame(results1a).sum()

0    9290.0
dtype: float64

**Law Inventory example. k = 9, m=3, type=maximisation**

In [97]:
results_law_max = experiment_2()

In [98]:
results_law_max.shape

(10, 8)

In [99]:
pd.DataFrame(results_law_max)

Unnamed: 0,0,1,2,3,4,5,6,7
0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
6,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
7,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
8,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
9,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


**Law Inventory example. k = 9, m=2, type=minimisation** 

In [100]:
results_law_min = experiment_3()

In [101]:
results_law_min.shape

(10, 8)

In [102]:
pd.DataFrame(results_law_min)

Unnamed: 0,0,1,2,3,4,5,6,7
0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
6,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
7,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
8,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
9,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
