<h1>
    <center>
        AI for Sales and Advertising – Sell like the Wolf of AI Street

# Let's start coding

## First, import the three following required libraries:

* 1. numpy, which you will use to build the environment matrix.
* 2. matplotlib.pyplot, which you will use to plot the histogram.
* 3. random, which you will use to generate the random numbers needed for the simulation.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import random 

## Then set the parameters for the number of customers and strategies

* 1. N = 10,000 customers.
* 2. d = 9 strategies.

In [3]:
N = [10000]
conversion_rates = [0.05, 0.13, 0.09, 0.16, 0.11, 0.04, 0.20, 0.08, 0.01]
d = len(conversion_rates)

In [4]:
X = np.zeros((N,d))
for i in range(N):
    for j in range(d):
        if np.random.rand() < conversionRates[j]:
            X[i][j] = 1

TypeError: 'list' object cannot be interpreted as an integer

Now that the environment is ready, you can start implementing the AI. To do so, the first step is to introduce and initialize the variables you will need for the implementation:

1. **strategies_selected_rs**: A list that will contain the strategies selected over the rounds by the Random Selection algorithm. Initialize it as an empty list.

2. **strategies_selected_ts**: A list that will contain the strategies selected over the rounds by the Thompson Sampling AI model. Initialize it as an empty list.

3. **total_rewards_rs**: The total reward accumulated over the rounds by the Random Selection algorithm. Initialize it as 0.

4. **total_rewards_ts**: The total reward accumulated over the rounds by the Thompson Sampling AI model. Initialize it as 0.

5. **number_of_rewards_1**: A list of 9 elements which will contain for each strategy the number of times it received a 1 reward. Initialize it as a list of 9 zeros.

6. **number_of_rewards_0**: A list of 9 elements which will contain for each strategy the number of times it received a 0 reward. Initialize it as a list of 9 zeros.

In [None]:
strategies_selected_rs = []
strategies_selected_ts = []
total_reward_rs = 0
total_reward_ts = 0
numbers_of_rewards_1 = [0] * d
numbers_of_rewards_0 = [0] * d
print(numbers_of_rewards_1)

In [None]:
for n in range(0, N):
    # random selection
    strategy_rs = random.randrange(d)
    strategies_selected_rs.append(strategy_rs)
    reward_rs = X[n, strategy_rs]
    total_reward_rs = total_reward_rs + reward_rs

Next, you need to implement Thompson Sampling following exactly Steps 1, 2, and 3 provided previously. I recommend looking at these steps again before coding the next part, and try to code by yourself before seeing my solution. That's the best way you can progress; practice makes perfect.

You should implement Thompson Sampling step by step, starting with the first step. Let's remind ourselves of it:


**Step 1**: For each strategy i, take a random draw from the following distribution:

\begin{equation}
\theta_i(n)\sim \beta(N_i^1(n)+1,N_i^0(n)+1)
\end{equation}

* where:

1. $N_i^1(n)$ is the number of times the strategy i has received a 1 reward up to round n;
2. $N_i^0(n)$ is the number of times the strategy i has received a 0 reward up to round n

Let's see how Step 1 is implemented.

Code a second for loop that iterates the 9 strategies, because you have to take a random draw from the Beta distribution of each of the 9 strategies.

The random draws from the Beta distributions are generated by the **betavariate()** function taken from the **random** library, which you imported at the beginning.

In [1]:
# Thompson Sampling
#    strategy_ts = 0
#    max_random = 0
#    for i in range(0, d):
#        random_beta = random.betavariate(numbers_of_rewards_1[i] + 1, numbers_of_rewards_0[i] + 1)

Now implement Step 2, that is:

**Step 2**: Select the strategy $s(n)$ that has the highest $\theta_i(n)$:

\begin{equation}
    s(n)=argmax_{i \in \{1,\dots,9\}}(\theta_i(n))
\end{equation}

To implement Step 2, you stay in the second for loop which iterates the 9 strategies, and use a simple trick with an if condition that will figure out the highest $\theta_i(n)$.

The trick is the following: while iterating the strategies, if you find a random draw (**random_beta**) that is higher than the maximum of the random draws obtained so far (**max_random**), then that maximum becomes equal to that higher random draw.

In [2]:
# Thompson Sampling
#    strategy_ts = 0
#    max_random = 0
#    for i in range(0, d):
#        random_beta = random.betavariate(numbers_of_rewards_1[i] + 1, numbers_of_rewards_0[i] + 1)
#        if random_beta > max_random:
#            max_random = random_beta
#            strategy_ts = i
#    reward_ts = X[n, strategy_ts]

And finally, let's implement Step 3, the easiest one:

**Step 3**: Update $N_i^1(n)$ and $N_i^0(n)$ according to the following conditions:
1. If the strategy selected s n ( ) received a 1 reward:

$N_i^1(n) := N_i^1(n) + 1$

2. If the strategy selected s n ( ) received a 0 reward:

$N_i^1(n) := N_i^0(n) + 1$

Implement that simply with the exact same two if conditions, translated into code.

In [3]:
# Thompson Sampling
#    strategy_ts = 0
#    max_random = 0
#    for i in range(0, d):
#        random_beta = random.betavariate(numbers_of_rewards_1[i] + 1, numbers_of_rewards_0[i] + 1)
#        if random_beta > max_random:
#            max_random = random_beta
#            strategy_ts = i
#    reward_ts = X[n, strategy_ts]
#    if reward_ts == 1:
#        numbers_of_rewards_1[strategy_ts] = numbers_of_rewards_1[strategy_ts] + 1
#    else:
#        numbers_of_rewards_0[strategy_ts] = numbers_of_rewards_0[strategy_ts] + 1

Next, don't forget to add the strategy selected in Step 2 to our list of strategies (**strategies_selected_ts**), and also to compute the total reward accumulated over the rounds by Thompson Sampling (**total_reward_ts**).

In [4]:
 # Thompson Sampling
#    strategy_ts = 0
#    max_random = 0
#    for i in range(0, d):
#        random_beta = random.betavariate(numbers_of_rewards_1[i] + 1, numbers_of_rewards_0[i] + 1)
#        if random_beta > max_random:
#            max_random = random_beta
#            strategy_ts = i
#    reward_ts = X[n, strategy_ts]
#    if reward_ts == 1:
#        numbers_of_rewards_1[strategy_ts] = numbers_of_rewards_1[strategy_ts] + 1
#    else:
#        numbers_of_rewards_0[strategy_ts] = numbers_of_rewards_0[strategy_ts] + 1
#    strategies_selected_ts.append(strategy_ts)
#    total_reward_ts = total_reward_ts + reward_ts

Then compute the final score, which is the relative return of Thompson Sampling with respect to our benchmark, which is Random Selection:

## The final result

By executing this code, I obtained a final relative return of 91%. In other words, Thompson Sampling almost doubled the performance of my Random Selection benchmark. Not too bad!

Finally, plot a histogram of the selected strategies to check that Strategy 7 (at index 6) was the one most selected, since it is the one with the highest conversion rate. To do this, use the hist() function from the matplotlib library.

In [None]:
# Implementing Random Selection and Thompson Sampling
strategies_selected_rs = []
strategies_selected_ts = []
total_reward_rs = 0
total_reward_ts = 0
numbers_of_rewards_1 = [0] * d
numbers_of_rewards_0 = [0] * d
for n in range(0, N):
    # Random Selection
    strategy_rs = random.randrange(d)
    strategies_selected_rs.append(strategy_rs)
    reward_rs = X[n, strategy_rs]
    total_reward_rs = total_reward_rs + reward_rs
    # Thompson Sampling
    strategy_ts = 0
    max_random = 0
    for i in range(0, d):
        random_beta = random.betavariate(numbers_of_rewards_1[i] + 1, numbers_of_rewards_0[i] + 1)
        if random_beta > max_random:
            max_random = random_beta
            strategy_ts = i
    reward_ts = X[n, strategy_ts]
    if reward_ts == 1:
        numbers_of_rewards_1[strategy_ts] = numbers_of_rewards_1[strategy_ts] + 1
    else:
        numbers_of_rewards_0[strategy_ts] = numbers_of_rewards_0[strategy_ts] + 1
    strategies_selected_ts.append(strategy_ts)
    total_reward_ts = total_reward_ts + reward_ts