# Online Advertising as Multi-armed Bandit (MAB)

Imagine you are a digital marketer running an online advertising campaign. You have several ad variations that you can display to users, each with its own click-through rate (CTR) or conversion rate. Your goal is to maximize user engagement or conversions by selecting the most effective ad variation.

 Let's assume you have three ad variations, represented by arms A1, A2 and A3. Each ad variation has an associated probability distribution of click-through rates or conversion rates, denoted as Q1, Q2 and Q3. These probability distributions represent the likelihood of a user clicking on or converting from each ad variation. At each time step 't', you need to choose an ad variation 'A' to display to users. When ad variation 'A' is displayed, users interact with it, and you observe the outcome, which can be a click or a conversion. The outcome is drawn from the probability distribution Q(A), representing the likelihood of a click or conversion for ad variation 'A'. 
 
 Assume that the three probability distributions Q1, Q2 and Q3 are normal distributions with means of {7, 10, 6} and standard deviations of {0.45, 0.65,0.35} respectively. Your objective is to maximize the cumulative number of clicks over a series of ad displays (let’s say 10,000 ad displays). 
 
This code implements ε-greedy strategy to determine which ad variation to display at each time step based on the estimated click-through rates.


### Set up the MAB

In [7]:
import numpy as np

# Initialization
num_arms = 3
num_trials = 10000

# Probability distribution of each arm
mu = [7, 10, 6]
sigma = [0.45, 0.65, 0.35]

# Counters for each arm
counts = np.zeros(num_arms)
rewards = np.zeros(num_arms)

# Select initial arm
a = np.random.choice(num_arms)

### $\epsilon$-stratgey

In [8]:
# Epsilon for epsilon-greedy algorithm
eps = 0.1

for t in range(num_trials):
    # Select arm
    if np.random.rand() > eps:  # Exploit
        a = np.argmax(rewards / (counts + 1e-5))  # Add a small constant to avoid division by zero
    else:  # Explore
        a = np.random.choice(num_arms)

    # Simulate click-through rate
    reward = np.random.normal(mu[a], sigma[a])
    
    # Update counters
    counts[a] += 1
    rewards[a] += reward

# Estimated click-through rates
estimates = rewards / counts

### Print estimated click-through rates

In [9]:
print("Estimated click-through rates: ", estimates)

Estimated click-through rates:  [ 6.99670105 10.00601168  6.00450476]
