## Implementation of a multi-armed bandit in Python.

We import the needed library for scipy (stats).

In [99]:
import sys
!{sys.executable} -m pip install scipy
from scipy import stats

Defaulting to user installation because normal site-packages is not writeable


First we define the number of users we'll be simulating and the number of treatment arms in the experiment. 

In [100]:
num_users = 10000
num_arms = 2

Next, we need to define the 'success rate' of each arm. This is unknown in a real life experiment but for the purposes of this simulation we must define it so we can 'discover it' through the multi-armed bandit.

In [103]:
success_rates = {'A': 0.5, 'B': 0.60}

This means the first arm A 'succeeds' 50% of the time and the second arm B 60% of the time. 

Success in this case could mean that they simply continue in the app - or it could be that they completed some activity in the future. The only limitation is that the indicator must be true or false (boolean) and not a continuous, numeric outcome.  This will work in most cases.

### Simulation

In [107]:
result_counts = {'A': {'success': 0, 'failure': 0},
                 'B': {'success': 0, 'failure': 0}}

For our simulation we need to keep track of the number of successes and failures for each arm to learn how much better one arm could be than the other. Then lets run the simluation:

In [111]:
for i in range(num_users):
    # for storing the values that determine which arm we _believe is better
    current_draw = {'A': None, 'B': None} 
    for arm in result_counts.keys():
        # draw these values from a beta distribution that considers the number of successes/failures in each arm
        current_draw[arm] = stats.beta.rvs(result_counts[arm]['success'] + 1,
                                           result_counts[arm]['failure'] + 1, 1) - 1 
    #print(current_draw)
    # select the arm with the greatest value
    chosen_arm = max(current_draw, key = current_draw.get)
    #print(chosen_arm)
    # get the result of the arm based on the probability of success in our simulation
    # we won't know this in an acutal experiment
    current_result = stats.binom.rvs(1, success_rates[chosen_arm], 1) - 1
    #print(current_result)
    if current_result == 1: # success
        # increment the success count for the selected arm
        result_counts[chosen_arm]['success'] += 1
        # increment teh failure count for the selected arm
    elif current_result == 0: # failure
        result_counts[chosen_arm]['failure'] += 1
    #print(f'draws: {current_draw}\t chose: {chosen_arm}\t result: {current_result}\t counts: {result_counts}')

### Results

Below you can see what we learned about the success rate for each arm. We should learn the success rate of the more successful arm fairly accurately since we allocated more users to it. We know less about the arm that was less performant because we didn't allocate as many users to it.

In [112]:
{arm: result_counts[arm]['success'] / (result_counts[arm]['success'] + result_counts[arm]['failure'])
    for arm in result_counts
}

{'A': 0.4726027397260274, 'B': 0.5974223665516541}

Accordingly we did not allocate as much traffic to the less performant arm, hurray.

In [113]:
{arm: (result_counts[arm]['success'] + result_counts[arm]['failure'])
    for arm in result_counts
}

{'A': 146, 'B': 9854}