In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#### Dataset

- Suppose an advertising company is running 10 different ads targeted towards a similar set of population on a webpage.
- results for which ads were clicked by a user
- Each column index represents a different ad. 
- We have a 1 if the ad was clicked by a user, and 0 if it was not. 

In [2]:
location = r'S:\AI-DATASETS\Ads_Optimisation.csv'

In [4]:
# Importing the dataset
dataset = pd.read_csv(location)
dataset.shape

(10000, 10)

In [5]:
dataset.head()

Unnamed: 0,Ad 1,Ad 2,Ad 3,Ad 4,Ad 5,Ad 6,Ad 7,Ad 8,Ad 9,Ad 10
0,1,0,0,0,1,0,0,0,1,0
1,0,0,0,0,0,0,0,0,1,0
2,0,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,0,0,0,0


First, we will try a random selection technique, where we randomly select any ad and show it to the user.

If the user clicks the ad, we get paid and if not, there is no profit.

#### Implementing Random Selection

In [6]:
import random

In [7]:
nbr_of_trials = 10000
nbr_of_ads    = 10

ads_selected_list = []
total_reward      = 0

for trial_ctr in range(0, nbr_of_trials):
    
    ad_nbr = random.randrange(nbr_of_ads)
    
    ads_selected_list.append(ad_nbr)
    
    reward = dataset.values[trial_ctr, ad_nbr]
    
    total_reward = total_reward + reward

print("Total rewards = ", total_reward)

Total rewards =  1227


In [8]:
pd.Series(ads_selected_list).head(1000).value_counts(normalize=True)

3    0.114
6    0.106
8    0.106
0    0.106
2    0.102
4    0.100
5    0.096
9    0.095
1    0.091
7    0.084
dtype: float64

In [9]:
pd.Series(ads_selected_list).sample(1000).value_counts(normalize=True)

3    0.117
8    0.111
5    0.109
2    0.105
4    0.100
7    0.097
6    0.096
9    0.093
0    0.088
1    0.084
dtype: float64

we look at the last 1000 trials, it is not able to find the optimal ad.

----------------------------
#### Implementing UCB
----------------------
The idea behind UCB is very simple:

- Select the action (arm) that has a high sum of average reward and upper confidence bound
- Pull the arm and receive a reward
- Update the arm's reward and confidence bound

But how do we calculate UCB?

We can calculate UCB using the formula  where N(a) is the number of times the arm was pulled and t is the total number of rounds.

So, in UCB, we select an arm with the following formula:

$$ \large A_{t} \doteq \underset{a}{\arg \max }\left[Q_{t}(a)+c \sqrt{\frac{\ln t}{N_{t}(a)}}\right]$$

where 

- $ln( t)$ denotes the natural logarithm of t (the number that $e \approx 2.71828$ would have to be raised to in order to equal t) 

- $N_t(a)$ denotes the number of times that action $a$ has been selected prior to time t (the denominator), and 

- the number c > 0 controls the degree of exploration. 

- If $N_t(a) = 0$, then $a$ is considered to be a `maximizing` action.


In [11]:
import math

In [12]:
nbr_of_trials = 10000
number_of_ads = 10

ads_selected_list     = []
numbers_of_selections = [0] * number_of_ads
sums_of_reward        = [0] * number_of_ads

total_reward          = 0

for trial_ctr in range(0, nbr_of_trials):

    ad              = 0
    max_upper_bound = 0
    
    for i in range(0, number_of_ads):
        
        if (numbers_of_selections[i] > 0):
            average_reward = sums_of_reward[i] / numbers_of_selections[i]
            delta_i        = math.sqrt(2 * math.log(trial_ctr+1) / numbers_of_selections[i])
            upper_bound    = average_reward + delta_i
        else:
            upper_bound = 1e400
            
        if upper_bound > max_upper_bound:
            max_upper_bound = upper_bound
            ad = i
            
    ads_selected_list.append(ad)
    
    numbers_of_selections[ad] += 1
    
    reward = dataset.values[trial_ctr, ad]
    
    sums_of_reward[ad] += reward
    
    total_reward += reward

In [13]:
pd.Series(ads_selected_list).sample(1000).value_counts(normalize=True)

4    0.564
7    0.113
0    0.088
8    0.045
6    0.043
1    0.042
3    0.041
2    0.032
5    0.017
9    0.015
dtype: float64