# Assessing Ad Performance using Bayesian A/B Algorithms

This project assesses the performance of different adverts, as measured by Conversion, using several different Bayesian Algorithms applied to A/B testing. The following Algorithms are explored:

- Epsilon-Greedy
- Optimistic Initial Value
- Upper Confidence Bound
- Thompson Sampling

The dataset used is available here https://www.kaggle.com/osuolaleemmanuel/ad-ab-testing. As well as discussing the intracacies of each algorithm, we also consider their applicability to online learning and marketing.

***

<br>

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import StandardScaler

Problems with frequentist Hypothesis testing.

In [None]:
# Load dataset
df = pd.read.csv()

***

## Bayesian A/B Testing

The adaptive approach of Bayesian A/B testing is particularly useful for online platforms. Show showing of different adverts or the use of different webpages for conversion. A/B tests can be applied to any problem in which we want to randomly assign different treatments and assess which treatment optimizes some objective function or statistic. For example, this can include validating the positive impact of drugs, selecting from severl webpage or advert designs and the content to show on news feeds based on their engagement. 

Standard A/B Testing assigns different treatments to a control and (potentially more than one) test group. In medical statistics, this includes sample size calculations for the achievement of a specific test power. The mean value and confidence intervals for the test and control groups are then compared after a pre-determined number of trials, and then the treatment with the best performance is assigned to the whole population. A problem with this approach is that it can often continue to assign the sub-optimal treatment to one of the groups.

Also, for sample size calculations, this assumes we have a good estimate of the effect size- if we have strong prior knowledge that we expect one advert to perform better than the other we should be factoring this into our analysis.

Another issue with frequentist approaches to hypothesis testing is the subjectivity in statistical significance. The p-value used to choose between treatments can also vary as the number of trials increases, hence a difference in performance between two treatments can vary between significance and insignificance. 

__Explore-Exploit Dilemma__

One one hand, we need enough samples to provide a low-variance estimate of the CTR for each advert but on the other hand we would like to quickly exploit the advert with the highest click-through rate in order to maximize sales and revenue. Bayesian methods provide an effective way of doing this. This is also the main trade-off in reinforcement learning models

Bayesian statistics enables one to place distributional assumptions on the estimates of the mean (conversion rate), which gives an improved measure of uncertainty as to the true value of the parameter. We can then use Bayes Theorem to combine the prior and likelihood to get a posterior distribution for the click-through-rate. 

The discussed algorithms allow us to balance exploration and exploitation. 

_Exploration_ involves collecting addtional data for each bandit to increase our certainty (reduce the variance) in the estimate of the conversion rate for each bandit. 

_Exploitation_ involves showing the optimal bandit (the one with the highest MLE of the conversion rate) to consumers, in order to maximize conversion and sales. 

In particular, we want both a high level of exploration and exploitation, however at each iteration there is a choice between exploration or exploitation- we can't do both at the same time. 

Other problems with the standard frequentist approach to A/B testing are

### Limitations of Standard A/B Testing

- During the exploratory phase information is wasted whilst we continue to explore inferior bandits in order to gather more data

- May have to run the test for a long time in order to gather enough data to gain enough statistical confidence to select amongst the bandit

- Performance of different bandits may change over time and there may be fluctuations between significance and insignificance of results as more data is gathered. This is particularly prevalent when the assumption of independent trials is violated. 

- There is a jump from exploration to exploitation, rather than a smooth transition

## Click Through Rate

The click through rate is the probability the user clicks an advert/link. Each 'trial' has a Bernoulli distribution.

$$CTR = \frac{No. Clicks}{No. Impressions}$$

***

***

## Epsilon-Greedy

Epsilon-Greedy adjusts the 'Greedy' algorithm to randomly allocate one of the bandits with probability $\epsilon$. In this case, being 'greedy' means choosing the bandit with the highest maximum likelihood estimate. However this can result in being stuck in suboptimal bandits, for example if only one of the bandits returns a sale in the first iteration, it will always have a higher MLE estimate for the conversion rate than the other bandits. Epsilon-Greedy alters this 'greedy' approach by having a small probability of choosing a bandit at random, $\epsilon$. The controls the amount of exploration- a higher value of $\epsilon$ is associated with a greater amount of exploration of different bandits but a lower level of exploitation by assigning the optimal bandit (advert) to each user.  We can also introduce a __cooling schedule__, whereby the value of $\epsilon$ (level of exploration) decreases at each iteration, enabling us to exploit the optimal bandit as we collect more data and hence become more confident in our estimates of net conversion for each bandit. The value of epsilon therefore decays over time, decreasing as the amount of data gathered increases. 

Pseudo-code

while TRUE:

    p = random no in [0, 1]
    if p < epsilon:
        j = choose a random bandit
    else:
        j = argmax(predicted bandit means)
    x = play bandit j and get reward bandits[j]. Update Mean. Alter Epsilon using cooling schedule

In [4]:
Num_trials = 10000
EPS = 0.1
Bandit_Probabilities = [0.3, 0.35, 0.4]

#Create a Bandit class that initializes each probability in the list of
#probabilities and then simulates a True/False outcome (1/0 in Python) when a
#particular bandit is played and use the update method to update the
#estimated probability
class Bandit:
    def __init__(self, p):
        #p: win/conversion rate
        self.p = p
        self.p_estimate = 0
        self.N = 0

    def pull(self):
        #Draw a win (converted) with probability p
        return(np.random.random() < self.p)

    def update(self, x):
        self.N += 1
        #update the estimate probability of success
        self.p_estimate = (1 / self.N) * ((self.N - 1) * self.p_estimate + x)

SyntaxError: non-default argument follows default argument (<ipython-input-4-abf9d29137eb>, line 25)

Next create a function to simulate trials:

In [None]:
def Simulation(Bandit_Probabilities, cooling_schedule, Num_trials = 10000, EPS = 0.1):
    
    """Function to simulate epsilon-greedy bandit algorithm
    Inputs: 
        Bandit_Probabilities: list of known conversion probabilities for each bandit
        cooling_schedule: hyperparameter to adjust speed of exploration decay
        Num_trials: scalar indicating the number of simulations to perform
        EPS: scalar value for epsilon"""
    
    pass
    
    # Inputs:
        # 
    
    #Initialize each probability as a Bandit object
    bandits = [Bandit(p) for p in Bandit_Probabilities]

    #Record metrics
    rewards = np.zeros(Num_trials)
    num_times_explored = 0
    num_times_exploited = 0
    num_optimal = 0
    optimal_j = np.argmax([b.p for b in bandits])
    print("optimal j:", optimal_j)

    #Run algorithm
    for i in range(Num_trials):

        #Use epsilon-greedy to select next bandit
        if np.random.random() < EPS:
            num_times_explored += 1
            #choose random bandit
            j = np.random.randint(len(bandits))
        else:
            num_times_exploited += 1
            #choose bandit with optimal p.estimate
            j = np.argmax([b.p_estimate for b in bandits])

        if j == optimal_j:
            num_optimal += 1

        #pull arm for bandit with largest sample (generate a 'win' / 'loss')
        x = bandits[j].pull()

        #update reward log
        rewards[i] = x

        #Update the distribution for the bandit we selected
        bandits[j].update(x)

In [5]:
Simulation(Bandit_Probabilities = [0.3, 0.35, 0.4], cooling_schedule=0.1)

for b in bandits:
    print("Mean estimate:", b.p_estimate)

TypeError: Simulation() missing 2 required positional arguments: 'Num_trials' and 'EPS'

The Cooling Rate, EPS, is similar to the cooling schedule in Simulated Anealing and controls the trade-off between exploitation and exploration of the algorithm. Exploring the space of possible solutions will more likely result in the algorithm selecting the Bandit with the greatest conversion, whilst exploiting takes advantage of the improved performance. 

In order to maximize profit, an online advertiser could use Machine Learning algorithms to improve the conversion of adverts by placing those adverts that yield the highest conversion rate. Moreover, One could split adverts depending on the characterisitics associated with the performance of individual adverts. That is, effctively run different AB tests based on the levels of a variable. However, this results in information loss, if each A/B test is carried out independently. Therefore we require an algorithm (and to perhaps create one if one does not yet exist), that uses information from other A/B tests but also takes account of information in its own AB test to diverge from other groups if required. 

***

### Prior Expectations

Given our prior domain experience of advertising, we expect click through rates (CTRs) to lie in the region 1% - 5%. Instead of using a Beta[1, 1] prior, which corresponds to the uniform distribution on [0, 1], we can use other parameters [a, b] to represent our prior knowledge of the expected value of the click through rate.

Could simulate some trials, say 100 and then plot the posterior distributions for each bandit to understand and show how the algorithm works

## Simulation

***

## Marketing Application

***

## Conclusion

***

## References

https://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf