# A/B/n Testing

A/B/n testing is an exploration approach to find the best option among a set of alternatives each of which following an unknown distribution.

The way A/B/n testing finds the best option is simply sample almost uniformly from all available alternatives. 

Then the best option is the one that led to the highest reward. 

## 0. Imports

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

## 1. Defining a Marketing Scenario

In this section, we want to define a hypothetical marketing scenario in which we have 5 ads each of which following a Bernoulli distribution. 

The goal is to find what is the best ad to show to the user.

But before that, since each ad follows a Bernoulli distribution, let's first implement the Bernoulli distribution.

### 1.1. Defining Bernoulli Distribution

In [2]:
class Bernoulli:
    def __init__(self, p):
        """
        Define a Bernoulli distribution

        Parameters
        ----------
        p : a float number between 0 to 1
            p represents the probability of choosing 1

        Returns
        -------
        None.

        """
        self.p = p
        
    def draw(self):
        """
        Draw a single sample from the Bernoulli distribution

        Returns
        -------
        reward : binary: 0 or 1
            A sample from the distribution
        """
        reward = np.random.binomial(n=1, p=self.p)
        return reward

### 1.2. Defining A/B Testing Class

Now, we can define the bandit game class which receives some distributions and allows the user to pull them.

In [3]:
class ABnTesting:
    def __init__(self, ads):
        self.ads = ads
        self.n_ads = len(ads)
        
        self._reset()
        self._build_model()
        
    def _reset(self):
        """
        Define some variables to keep track of the reward and timestep.

        Returns
        -------
        None.

        """
        self.rewards = []
        self.total_reward = 0
        self.avg_reward = 0
        self.avg_rewards = []
        self.time_step = 0
        
    
    def _build_model(self):
        """
        Build a tabular model to keep track of the action values and the
        number of time each action has been selected.

        Returns
        -------
        None.

        """
        self.Q = np.zeros(self.n_ads) # action value 
        self.N = np.zeros(self.n_ads) # action frequency
        
        
    def _fit(self, action, rew):
        """
        Based on the latest action and reward, we update the model

        Parameters
        ----------
        action : int
            the selected action in the current timestep.
        rew : float
            the reward the env gives to the agent accordoing to it's selected action.

        Returns
        -------
        None.

        """
        self.N[action] += 1
        self.Q[action] += 1/self.N[action] * (rew - self.Q[action])
        
    
    def _update(self, rew):
        """
        Updating the reward related variables and time_step in each timestep.

        Parameters
        ----------
        rew : float
            the reward the env gives to the agent accordoing to it's selected action.

        Returns
        -------
        None.

        """
        self.rewards.append(rew)
        self.total_reward += rew
        self.avg_reward += (rew - self.avg_reward)/(self.time_step+1)
        self.avg_rewards.append(self.avg_reward)
        self.best_action = np.argmax(self.Q)
        self.time_step += 1
        
        
    def _step(self, action):
        """
        Excuting the action the agent selected to see how rewarding it is.

        Parameters
        ----------
        action : int
        the selected action of the current timestep

        Returns
        -------
        None.

        """
        rew = self.ads[action].draw()
        self._fit(action, rew)
        self._update(rew)
            
    
    def train(self, n_iters=10e6):
        """
        in the training phase of the A/Bn testing we simply loop over all 
        available actions. 
        But, in the test phase, we always select the best_action.

        Parameters
        ----------
        n_iters : int, optional
            Number of training iterations. The default is 10e6.

        Returns
        -------
        None.

        """
        self.phase_name = 'train'
        
        for i in range(n_iters):
            action = np.random.randint(self.n_ads)
            self._step(action)
        
        
    def test(self, best_action, n_iters=100):
        """
        In the test phase, we always select the best action. So, we need to 
        know what the best action is.

        Parameters
        ----------
        best_action : TYPE
            the best action learned in the training phase.
        n_iters : TYPE, optional
            Number of test iterations. The default is 100.

        Returns
        -------
        None.

        """
        self.phase_name = 'test'
        
        self._reset()
        for i in range(n_iters):
            self._step(best_action)
        
        
    def render(self):
        """
        Printing the results.

        Returns
        -------
        None.

        """
        print(f"\n----- abn total_reward: {self.total_reward}")
        print(f"----- abn avg_reward: {self.avg_reward}")
        print(f"----- abn best_action: {self.best_action}")
        if self.phase_name == 'train':
            print(f"----- abn action_value: {self.Q}")
            print(f"----- abn n_visits_per_ad: {self.N}")

## Train and Test the A/B Testing Model:

In [4]:
## Define some ads following Bernoulli distribution/

b_ad_A = Bernoulli(0.03)
b_ad_B = Bernoulli(0.06)
b_ad_C = Bernoulli(0.073)
b_ad_D = Bernoulli(0.036)
b_ad_E = Bernoulli(0.027)

b_ads = [b_ad_A, b_ad_B, b_ad_C, b_ad_D, b_ad_E]

In [5]:
## Instantiate and train the A/B testing model

n_iters = 100_000

abn_tester = ABnTesting(b_ads)

abn_tester.train(n_iters=int(0.1*n_iters))

abn_tester.render()


----- abn total_reward: 467
----- abn avg_reward: 0.04670000000000006
----- abn best_action: 2
----- abn action_value: [0.03178739 0.0629648  0.07081749 0.04057524 0.02533532]
----- abn n_visits_per_ad: [1919. 2017. 2104. 1947. 2013.]


In [6]:
## Test the trained A/B testing model

n_iters = 100

abn_tester.test(best_action=abn_tester.best_action, n_iters=int(0.9*n_iters))

abn_tester.render()


----- abn total_reward: 8
----- abn avg_reward: 0.08888888888888888
----- abn best_action: 2
