**Implementing D-MAB, as described in DaCosta et al. - 2008 - Adaptive operator selection with dynamic multi-arm**

>  (hybrid between UCB1 and Page-Hinkley (PH) test)

D-MAB maintains four indicators for each arm $i$:
1. number $n_{i, t}$ of times $i$-th arm has been played up to time $t$;
2. the average empirical reward $\widehat{p}_{j, t}$ at time $t$;
3. the average and maximum deviation $m_i$ and $M_i$ involved in the PH test, initialized to $0$ and updated as detailed below. At each time step $t$:

D-MAB selects the arm $i$ that maximizes equation 1:

$$\widehat{p}_{i, t} + \sqrt{\frac{2 \log \sum_{k}n_{k, t}}{n_{i, t}}}$$

> Notice that the sum of the number of times each arm was pulled is equal to the time $\sum_{k}n_{k, t} = t$, but since their algorithm resets the number of picks, we need to go with the summation. 

and receives some reward $r_t$, drawn after reward distribution $p_{i, t}$.

> I think there is a typo in the eq. 1 on the paper. I replaced $j$ with $i$ in the lower indexes.

The four indicators are updated accordingly:

- $\widehat{p}_{i, t} :=\frac{1}{n_{i, t} + 1}(n_{i, t}\widehat{p}_{i, t} + r_t)$
- $n_{i, t} := n_{i, t}+1$
- $m_i := m_i + (\widehat{p}_{i, t} - r_t + \delta)$
- $M_i:= \text{max}(M_i, m_i)$

And if the PH test is triggered ($M_i - m_i > \lambda$), the bandit is restarted, i.e., for all arms, all indicators are set to zero (the authors argue that, empirically, resetting the values is more robust than decreasing them with some mechanism such as probability matching).

> I will reset to 1 instead of 0 (as the original paper does) to avoid divide by zero when calculating UCB1.

The PH test is a standard test for the change hypothesis. It works by monitoring the difference between $M_i$ and $m_i$, and when the difference is greater than some uuser-specified threshold $\lambda$, the PH test is triggered, i.e., it is considered that the Change hypothesis holds.

Parameter $\lambda$ controls the trade-off between false alarms and un-noticed changes. Parameter $\delta$ enforces the robustness of the test when dealing with slowly varying environments.

We also need a scaling mechanism to control the Exploration _versus_ Exploitation balance. They proposed two, from which I will focus on the first: Multiplicative Scaling (cUCB). **It consists on multiplying all rewards by a fixed user-defined parameter $C_{M-\text{scale}}$.

This way, we need to give to our D-MAB 3 parameters: $\lambda$, $\delta$, and $C_{M - \text{scale}}$. In the paper they did a sensitivity analysis of the parameters, but I think they should be fine tuned for each specific data set.

> Brush originally sample the mutations using an uniform distribution. This algorithm chooses the arms using an deterministic approach --- the one that maximizes the UCB1 score. Somehow we need to convert them to have a transparent implementation to the user.

In [None]:
!pip install matplotlib > /dev/null
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

import numpy as np
import time
import pandas as pd

from brush.estimator import BrushEstimator
from sklearn.base import ClassifierMixin, RegressorMixin
from deap import creator
import _brush
from deap_api import nsga2 

In [None]:
class D_MAB:
    def __init__(self, num_bandits, delta=0.15, lmbda=0.25):
        self.num_bandits = num_bandits

        # Store learner status when the update function is called
        self.pull_history = {
            c:[] for c in ['t', 'arm idx', 'reward', 'update'] + 
                          [f'UCB1 {i}'  for i in range(num_bandits)] + 
                          [f'weight {i}' for i in range(num_bandits)] } 

        # This is the probability that should be used to update brush probs
        self._probabilities = np.ones(num_bandits)/num_bandits

        self.delta = delta # how to define these values???
        self.lmbda = lmbda

        self._reset_indicators() # Creating the indicators 

    def _reset_indicators(self):
        self._avg_rewards    = np.zeros(self.num_bandits)
        self._num_pulls      = np.zeros(self.num_bandits)
        self._avg_deviations = np.zeros(self.num_bandits)
        self._max_deviations = np.zeros(self.num_bandits)

    def _calculate_UCB1s(self):
        # We need that the reward is in [0, 1] (not avg_reward, as it seems to
        # render worse results). It looks like normalizing the rewards is a
        # problem: reward should be [0, 1], but not necessarely avg_rewards too
        rs = self._avg_rewards
        ns = self._num_pulls
        
        UCB1s = rs + np.sqrt(2*np.log1p(sum(ns))/(ns+1))

        return UCB1s

    @property
    def probabilities(self):
        # How to transform our UCB1 scores into node probabilities?
        return self._probabilities
    
    @probabilities.setter
    def probabilities(self, new_probabilities):
        if len(self._probabilities)==len(new_probabilities):
            self._probabilities = new_probabilities
        else:
            print(f"New probabilities must have size {self.num_bandits}")

    def choose_arm(self):
        """Uses previous recordings of rewards to pick the arm that maximizes
        the UCB1 function. The choice is made in a deterministic way.
        """

        UCB1s = self._calculate_UCB1s()

        return np.nanargmax(UCB1s)

    def update(self, arm_idx, reward):
        # Here we expect that the reward was already scaled to be in the 
        # interval [0, 1] (in the original paper, they sugest using a scaling
        # factor as an hyperparameter).
        self.pull_history['t'].append( len(self.pull_history['t']) )
        self.pull_history['arm idx'].append( arm_idx )
        self.pull_history['reward'].append( reward )

        # Updating counters
        self._avg_rewards[arm_idx]    = \
            (self._num_pulls[arm_idx]*self._avg_rewards[arm_idx] + reward)/(self._num_pulls[arm_idx]+1)
        self._avg_deviations[arm_idx] = \
            self._avg_deviations[arm_idx] + (self._avg_rewards[arm_idx] - reward + self.delta)    
        self._num_pulls[arm_idx]    = self._num_pulls[arm_idx] +1
        self._max_deviations[arm_idx] = \
            np.maximum(self._max_deviations[arm_idx], self._avg_deviations[arm_idx])

        if (self._max_deviations[arm_idx] - self._avg_deviations[arm_idx] > self.lmbda):
            self._reset_indicators()
            self.pull_history['update'].append( 1 )
        else:
            self.pull_history['update'].append( 0 )

        self._probabilities = self._calculate_UCB1s()

        for i, UCB1 in enumerate(self._calculate_UCB1s()):
            self.pull_history[f'UCB1 {i}'].append( UCB1 )
            self.pull_history[f'weight {i}'].append( self.probabilities[i] )

        return self

In [None]:
def plot_learner_history(learner, arm_labels=[]):

    # getting the labels to use in plots
    if len(arm_labels) != learner.num_bandits:
        arm_labels = [f'arm {i}' for i in range(learner.num_bandits)]

    # Setting up the figure layout
    fig = plt.figure(figsize=(15, 10), tight_layout=True)
    gs = gridspec.GridSpec(7, 6)

    learner_log = pd.DataFrame(learner.pull_history).set_index('t')
                
    total_rewards = learner_log.groupby('arm idx')['reward'].sum().to_dict()
    total_pulls   = learner_log['arm idx'].value_counts().to_dict()

    data_total_pulls    = np.array([total_pulls[k] for k in sorted(total_pulls)])
    data_total_rewards  = np.array([total_rewards[k] for k in sorted(total_rewards)])
    data_total_failures = data_total_pulls-data_total_rewards

    ylim = np.maximum(data_total_rewards.max(), data_total_failures.max())

    axs = fig.add_subplot(gs[0:2, 4:])

    axs.bar(arm_labels, -1*data_total_failures, label="Null reward")
    axs.bar(arm_labels, data_total_rewards, label="Positive reward")

    axs.set_xlabel("Arm")
    axs.set_ylim( (-1.05*ylim, 1.05*ylim) )
    axs.legend()

    win_ratios = pd.DataFrame.from_dict({
        'arm'      : arm_labels,
        'totpulls' : data_total_pulls,
        '0 reward' : data_total_failures,
        '+ reward' : data_total_rewards,
        'success%' : (data_total_rewards/(data_total_pulls)).round(2)
    })

    axs = fig.add_subplot(gs[2:4, 4:])
    axs.table(cellText=win_ratios.values, colLabels=win_ratios.columns, loc='center')
    axs.axis('off')
    axs.axis('tight')

    # Plotting rewards and pulls -----------------------------------------------
    # plot the cumulative number of pulls (for evaluations, not generations) ---
    data = np.zeros( (learner_log.shape[0]+1, 4) )
    for i, row in learner_log.iterrows():
        data[i+1, :] = data[i]
        data[i+1, row['arm idx'].astype(int)] += 1

    axs = fig.add_subplot(gs[0:2, :4])
    axs.plot(data, label=arm_labels)
    axs.set_ylabel("Number of times mutation was used")
    axs.legend()

    # multiple lines all full height showing when D-TS used the dynamic update rule
    plt.vlines(x=[i for i, e in enumerate(learner_log['update']) if e != 0],
               ymin=0, ymax=np.max(data), colors='k', ls='-', lw=0.025)

    # Plotting alphas and betas ------------------------------------------------
    for i, col in enumerate(['UCB1']):
        columns = learner_log.columns[learner_log.columns.str.startswith(f'{col} ')]
        labels  = [f"{col} {arm_labels[i]}" for i in range(4)] 
        data    = learner_log.loc[:, columns]

        axs = fig.add_subplot(gs[(i+1)*2:(i+1)*2+2, :4])
        axs.plot(data, label=labels)
        axs.set_ylabel(f"{col}s")
        axs.legend()

        # multiple lines all full height showing when D-TS used the dynamic update rule
        axs.vlines(x=[i for i, e in enumerate(learner_log['update']) if e != 0],
                ymin=0, ymax=np.max(data), colors='k', ls='-', lw=0.025)
    
    axs.set_xlabel("Evaluations") # Label only on last plot

    plt.show()

Below I'll create a simple bandit configuration so we can do a sanity check of our `D_MAB` implementation.

In [None]:
# Sanity checks
class Bandits:
    def __init__(self, reward_prob):
        # Implementing simple bandits.
        self.reward_prob = reward_prob # True reward prob., which learner shoudn't know
        self.n_bandits   = len(reward_prob) 

    def pull(self, arm_idx):
        # Sampling over a normal distr. with mu=0 and var=1
        result = np.random.randn()
        
        # return a positive or nullary reward (Bernoulli random variable).
        return 1 if result > self.reward_prob[arm_idx] else 0

for probs, descr, expec in [
    (np.array([ 1.0,  1.0, 1.0,  1.0]), 'All bandits with same probs'  , 'similar amount of pulls for each arm'         ),
    (np.array([-1.0,  0.2, 0.0,  1.0]), 'One bandit with higher prob'  , 'more pulls for first arm, less pulls for last'),
    (np.array([-0.2, -1.0, 0.0, -1.0]), 'Two bandits with higher probs', '2nd approx 4th > 1st > 3rd'                   ),
]:
    bandits = Bandits(probs)

    print("------------------------ optimizing ------------------------")

    learner = D_MAB(4)
    for i in range(1000):
        arm_idx = learner.choose_arm()
        reward  = bandits.pull(arm_idx)

        learner.update(arm_idx, reward) 

    plot_learner_history(learner)
    print(f"(it was expected: {expec})")

Ok, so the D-MAB seems to work. Now let's add this MAB inside mutation to update PARAMS option and control dinamically the mutaiton probabilities during evolution.

We can import the brush estimator and replace the `_mutation` by a custom function. Ideally, to use this python MAB optimizer, we need to have an object created to keep track of the variables, and the object needs to wrap the _pull_ action, as well as evaluating the reward based on the result.

> we'll need to do a _gambiarra_ to know which mutation is used so we can correctly update `D_MAB`. All MAB logic is implemented in python, and we chose the mutation in python as well. To make sure a specific mutation was used, we force it to happen by setting others' weights to zero. this way we know exactly what happened in the C++ code

In [None]:
class BrushEstimatorMod(BrushEstimator): # Modifying brush estimator
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        # mutations optimized by the learner. Learner arms correspond to
        # these mutations in the order they appear here
        self.mutations_ = ['point', 'insert', 'delete', 'toggle_weight']

        # Whether the learner should update after each mutation, or if it should
        # update only after a certain number of evaluations.
        # Otherwise, it will
        # store all rewards in gen_rewards_ (which is reseted at the beggining
        # of every generation) and do a batch of updates only after finishing
        # mutating the solutions.
        self.batch_size_    = self.pop_size #
        self.batch_rewards_ = []

    def _mutate(self, ind1):
        # Overriding the mutation so it updates our sampling method. Doing the
        # logic on the python-side for now.

        # Creating a wrapper for mutation to be able to control what is happening
        # in the C++ code (this should be prettier in a future implementation)
        
        params = self.get_params()
        
        mutation_idx = self.learner_.choose_arm()

        for i, m in enumerate(self.mutations_):
            params['mutation_options'][m] = 0 if i != mutation_idx else 1.0

        _brush.set_params(params)

        opt = ind1.prg.mutate()

        if opt:
            offspring = creator.Individual(opt)
            # print("mutation")
            # print(ind1.prg.get_model())
            # print(offspring.prg.get_model())

            offspring.fitness.values = self.toolbox_.evaluate(offspring)
            
            # We compare fitnesses using the deap overloaded operators
            # from the docs: When comparing fitness values that are **minimized**,
            # ``a > b`` will return :data:`True` if *a* is **smaller** than *b*.
            # (this means that this comparison should work agnostic of min/max problems,
            # or even a single-objective or multi-objective problem)
            reward = 1.0 if offspring.fitness > ind1.fitness else 0.0
            
            # if not ignore_this_time:
            #     self.batch_rewards_.append( (mutation_idx, reward) )

            self.batch_rewards_.append( (mutation_idx, reward) )

            if len(self.batch_rewards_) >= self.batch_size_:
                for (mutation_idx, reward) in self.batch_rewards_:
                    self.learner_.update(mutation_idx, reward)
                self.batch_rewards_ = []
            
            return offspring

        return None
    
    def fit(self, X, y):

        _brush.set_params(self.get_params())

        self.data_ = self._make_data(X,y)
        # self.data_.print()

        # set n classes if relevant
        if self.mode=="classification":
            self.n_classes_ = len(np.unique(y))

        # We have 4 different mutations, and the learner will learn to choose
        # between these options by maximizing the reward when using each one
        self.learner_ = D_MAB(4)

        if isinstance(self.functions, list):
            self.functions_ = {k:1.0 for k in self.functions}
        else:
            self.functions_ = self.functions

        self.search_space_ = _brush.SearchSpace(self.data_, self.functions_)

        self.toolbox_ = self._setup_toolbox(data=self.data_)

        archive, logbook = nsga2(
            self.toolbox_, self.max_gen, self.pop_size, 0.9, self.verbosity)

        self.archive_ = archive
        self.logbook_ = logbook
        self.best_estimator_ = self.archive_[0].prg

        return self
    

class BrushClassifierMod(BrushEstimatorMod,ClassifierMixin):
    def __init__( self, **kwargs):
        super().__init__(mode='classification',**kwargs)

    def _fitness_function(self, ind, data: _brush.Dataset):
        ind.prg.fit(data)
        return (
            np.abs(data.y-ind.prg.predict(data)).sum(), 
            ind.prg.size()
        )
    
    def _make_individual(self):
        return creator.Individual(
            self.search_space_.make_classifier(self.max_depth, self.max_size)
            if self.n_classes_ == 2 else
            self.search_space_.make_multiclass_classifier(self.max_depth, self.max_size)
            )

    def predict_proba(self, X):
        data = self._make_data(X)
        return self.best_estimator_.predict_proba(data)


class BrushRegressorMod(BrushEstimatorMod, RegressorMixin):
    def __init__(self, **kwargs):
        super().__init__(mode='regressor',**kwargs)

    def _fitness_function(self, ind, data: _brush.Dataset):
        ind.prg.fit(data)
        return (
            np.sum((data.y- ind.prg.predict(data))**2),
            ind.prg.size()
        )

    def _make_individual(self):
        return creator.Individual(
            self.search_space_.make_regressor(self.max_depth, self.max_size)
        )

## Regression problem

In [None]:
# This is needed to avoid racing conditions (https://deap.readthedocs.io/en/master/tutorials/basic/part4.html)
if __name__ == '__main__':
    from brush import BrushRegressor
    
    import warnings
    warnings.filterwarnings("ignore")

    from pmlb import fetch_data

    # X, y = fetch_data('537_houses', return_X_y=True, local_cache_dir='./')

    data = pd.read_csv('../../docs/examples/datasets/d_example_patients.csv')
    X = data.drop(columns='target')
    y = data['target']

    # data = pd.read_csv('../../docs/examples/datasets/d_2x1_subtract_3x2.csv')
    # X = data.drop(columns='target')
    # y = data['target']

    # data = pd.read_csv('../../docs/examples/datasets/d_square_x1_plus_2_x1_x2_plus_square_x2.csv')
    # X = data.drop(columns='target')
    # y = data['target']

    kwargs = {
        'verbosity' : False,
        'pop_size'  : 60,
        'max_gen'   : 300,
        'max_depth' : 10,
        'max_size'  : 20,
        'mutation_options' : {"point":0.25, "insert": 0.25, "delete":  0.25, "toggle_weight": 0.25}
    }

    results = pd.DataFrame(columns=pd.MultiIndex.from_tuples(
        [('Original', 'score'), ('Original', 'best model'), 
         ('Original', 'size'),  ('Original', 'depth'), ('Original', 'Time'), 
         ('Modified', 'score'), ('Modified', 'best model'), 
         ('Modified', 'size'),  ('Modified', 'depth'), ('Modified', 'Time'), 
         ('Modified', 'point mutation calls'),
         ('Modified', 'insert mutation calls'),
         ('Modified', 'delete mutation calls'),
         ('Modified', 'toggle_weight mutation calls')],
        names=('Brush version', 'metric')))
    
    est_mab = None
    for i in range(30):
        try:
            print(f"{i}, ", end='\n' if (i==29) else '')

            est_start_time = time.time()
            est     = BrushRegressor(**kwargs).fit(X,y)
            est_end_time = time.time() - est_start_time

            est_mab_start_time = time.time()
            est_mab = BrushRegressorMod(**kwargs).fit(X,y)
            est_mab_end_time = time.time() - est_mab_start_time

            learner_log = pd.DataFrame(est_mab.learner_.pull_history).set_index('t')
            total_pulls = learner_log['arm idx'].value_counts().to_dict()
            
            results.loc[f'run {i}'] = [
                # Original implementation
                est.score(X,y), est.best_estimator_.get_model(),
                est.best_estimator_.size(), est.best_estimator_.depth(), est_end_time,

                # Implementation using Dynamic Thompson Sampling
                est_mab.score(X,y), est_mab.best_estimator_.get_model(), 
                est_mab.best_estimator_.size(), est_mab.best_estimator_.depth(), est_mab_end_time,
                
                # Mutation count
                *total_pulls.values()]
        except Exception as e:
            print(e)

    # Showing results and statistics
    display(results)
    display(results.describe())

In [None]:
def generate_plots(est_mab):

    learner_log = pd.DataFrame(est_mab.learner_.pull_history).set_index('t')

        # Setting up the figure layout
    fig = plt.figure(figsize=(12, 6), tight_layout=True)
    gs = gridspec.GridSpec(6, 6)

    # Approximating the percentage of usage for each generation ----------------
    data = np.zeros( (est_mab.max_gen, 4) )
    for g in range(est_mab.max_gen):
        idx_start = g*(learner_log.shape[0]//est_mab.max_gen)
        idx_end   = (g+1)*(learner_log.shape[0]//est_mab.max_gen)

        df_in_range = learner_log.iloc[idx_start:idx_end]
        g_data = df_in_range['arm idx'].value_counts(normalize=True).to_dict()
        for k, v in g_data.items():
            data[g, k] = v

    axs = fig.add_subplot(gs[0:3, :3])
    axs.stackplot(range(est_mab.max_gen), data.T, labels=est_mab.mutations_)

    axs.set_ylabel("Percentage of usage")
    axs.legend()

    # average Brush weights for each generation --------------------------------
    data = np.zeros( (est_mab.max_gen, 4) )
    for g in range(est_mab.max_gen):
        idx_start = g*(learner_log.shape[0]//est_mab.max_gen)
        idx_end   = (g+1)*(learner_log.shape[0]//est_mab.max_gen)

        learner_log_in_range = learner_log.iloc[idx_start:idx_end]

        total_rewards = learner_log_in_range.groupby('arm idx')['reward'].sum().to_dict()
        total_pulls   = learner_log_in_range['arm idx'].value_counts().to_dict()

        keys = total_pulls.keys()
        data_total_pulls    = np.array([total_pulls[k] for k in sorted(keys)])
        data_total_rewards  = np.array([total_rewards[k] for k in sorted(keys)])

        # Success rate
        data[g, [int(i) for i in keys]] = data_total_rewards/data_total_pulls

    axs = fig.add_subplot(gs[3:6, :3])
    axs.stackplot(range(est_mab.max_gen), data.T, labels=est_mab.mutations_)

    axs.set_xlabel("Generations")
    axs.set_ylabel("brush Weights conversion")
    axs.legend()

    # --------------------------------------------------------------------------
    logbook = pd.DataFrame(columns=['gen', 'evals', 'ave m1', 'ave m2',
                                    'std m1', 'std m2', 'min m1', 'min m2'])
    for item in est_mab.logbook_:
        # I'll store the calculate
        logbook.loc[item['gen']] = (
            item['gen'], item['evals'], *item['ave'], *item['std'], *item['min']
        )

    x = logbook['gen']
    for i, metric in enumerate(['m1', 'm2']):
        axs = fig.add_subplot(gs[(3*i):(3*i + 3), 3:])

        y     = logbook[f'ave {metric}']
        y_err = logbook[f'std {metric}']
        y_min = logbook[f'min {metric}']

        axs.plot(x, y, 'b', label='Avg.')
        axs.fill_between(x, y-y_err, y+y_err, fc='b', alpha=0.5, label="Std.")
        axs.plot(x, y_min, 'k', label='Min.')

        axs.set_ylabel("Score" if metric=='m1' else "Size")
        axs.legend()

    axs.set_xlabel("Generations")

    plt.show()

In [None]:
plot_learner_history(est_mab.learner_, arm_labels=est_mab.mutations_)
generate_plots(est_mab)

## Classification problem

In [None]:
if __name__ == '__main__':
    from brush import BrushClassifier
    
    import warnings
    warnings.filterwarnings("ignore")

    from pmlb import fetch_data

    # X, y = fetch_data('adult', return_X_y=True, local_cache_dir='./')

    data = pd.read_csv('../../docs/examples/datasets/d_analcatdata_aids.csv')
    X = data.drop(columns='target')
    y = data['target']

    kwargs = {
        'verbosity' : False,
        'pop_size'  : 60,
        'max_gen'   : 300,
        'max_depth' : 10,
        'max_size'  : 20,
        'mutation_options' : {"point":0.25, "insert": 0.25, "delete":  0.25, "toggle_weight": 0.25}
    }

    results = pd.DataFrame(columns=pd.MultiIndex.from_tuples(
        [('Original', 'score'), ('Original', 'best model'), 
         ('Original', 'size'),  ('Original', 'depth'), ('Original', 'Time'), 
         ('Modified', 'score'), ('Modified', 'best model'), 
         ('Modified', 'size'),  ('Modified', 'depth'), ('Modified', 'Time'), 
         ('Modified', 'point mutation calls'),
         ('Modified', 'insert mutation calls'),
         ('Modified', 'delete mutation calls'),
         ('Modified', 'toggle_weight mutation calls')],
        names=('Brush version', 'metric')))
    
    est_mab = None
    for i in range(30):
        try:
            print(f"{i}, ", end='\n' if (i==29) else '')

            est_start_time = time.time()
            est = BrushClassifier(**kwargs).fit(X,y)
            est_end_time = time.time() - est_start_time

            est_mab_start_time = time.time()
            est_mab = BrushClassifierMod(**kwargs).fit(X,y)
            est_mab_end_time = time.time() - est_mab_start_time

            learner_log = pd.DataFrame(est_mab.learner_.pull_history).set_index('t')
            total_pulls = learner_log['arm idx'].value_counts().to_dict()
            
            results.loc[f'run {i}'] = [
                # Original implementation
                est.score(X,y), est.best_estimator_.get_model(),
                est.best_estimator_.size(), est.best_estimator_.depth(), est_end_time,

                # Implementation using Dynamic Thompson Sampling
                est_mab.score(X,y), est_mab.best_estimator_.get_model(), 
                est_mab.best_estimator_.size(), est_mab.best_estimator_.depth(), est_mab_end_time,
                
                # Mutation count
                *total_pulls.values()]
            
        except Exception as e:
            print(e)

    # Showing results and statistics
    display(results)
    display(results.describe())

In [None]:
plot_learner_history(est_mab.learner_, arm_labels=est_mab.mutations_)
generate_plots(est_mab)