<h1>
    <center>
       O modelo Thompson Sampling

# Determinação da máquina com maior chance de vitória

O modelo de Thompson Sampling será utilizado para determinar quais das máquinas oferece a maior chance de vitória. Este algoritmo utiliza a função de distribuição apresentada a seguir:

\begin{equation}
    x = \beta(a,b)
\end{equation}
onde:
* $x$ é uma escolha aleatória para distribuição Beta;
* $\beta$ é nossa função Beta;
* $a$ é o primeiro argumento;
* $b$ é o segundo argumento.

## Importing the libraries 

In [1]:
import numpy as np

## Setting conversion rates and the number of samples

Now we have to understand something very important. You are creating a simulation whose aim is to simulate real-life situations. In reality, every slot machine gives us some chance of winning, and some machines have it higher than others. Therefore, when simulating this environment, you have to do the same thing. It is important to remember, however, that our AI will not know these predefined winning rates. It cannot just read them and judge, based on these rates, which machine is the best

In [49]:
conversionRates = [0.15, 0.04, 0.13, 0.11, 0.05]
N = 10000
d = len(conversionRates)

## Criação do dataset para treinamento

In [50]:
X = np.zeros((N,d))
for i in range(N):
    for j in range(d):
        if np.random.rand() < conversionRates[j]:
            X[i][j] = 1

## Counter of victory number

In [51]:
nPosReward = np.zeros(d)
nNegReward = np.zeros(d)

## Taking our best slot machine through beta distribution and updating its losses and wins

In [52]:
for i in range(N):
    selected = 0
    maxRandom = 0
    for j in range(d):
        randomBeta = np.random.beta(nPosReward[j] + 1, nNegReward[j] +1)
        if randomBeta > maxRandom:
            selected = j
            maxRandom = randomBeta
    if X[i][selected] == 1:
        nPosReward[selected] += 1
    else:
        nNegReward[selected] += 1

## Showing which slot machine is considered the best

In [53]:
nSelected = nPosReward + nNegReward
for i in range(d):
    print('Machine number ' + str(i + 1) + ' was selected ' +
    str(nSelected[i]) + ' times')
print('Conclusion: Best machine is machine number ' + str(np.argmax(nSelected) + 1))

Machine number 1 was selected 7816.0 times
Machine number 2 was selected 174.0 times
Machine number 3 was selected 1690.0 times
Machine number 4 was selected 264.0 times
Machine number 5 was selected 56.0 times
Conclusion: Best machine is machine number 1
