# Reinforcement learning

Suppose you are in Las Vegas and have 1,000$ to play *bandits* (slot machines).

There are ten bandits which have different chances of winning.

What would be suitable strategies to maximize your profit?

## Solution via *exploration*

Implement a single bandit, bandits have different chances of winning (can you guess how it works?).

In [None]:
import random
def bandit(i):
    threshold = (i*2+1)/20
    if random.random() < threshold:
        return 2
    return 0

`bandit()` simulates the slot machines with numbers `0-9`. Their chance of winning is `5%`, `15%`, ..., `95%`. On average, it is *fair*.

Start the experiment and play each machine 100 times:

In [None]:
import numpy as np

random.seed(42)
money = np.array([0]*10)
for b in range(10):
    for d in range(100):
        money[b] += bandit(b)

In [None]:
sum(money)

In [None]:
import pandas as pd
pd.DataFrame(money).plot.bar()

## Solution by exploitation

Remember how much money you get from each machine:

In [None]:
random.seed(42)
money = np.array([0]*10)
coins = np.array([0]*10)
for b in range(10):
    for d in range(10):
        money[b] += bandit(b)
        coins[b] += 1

Calculate the expectation value for each machine:

In [None]:
money/coins

Prefer those machines which yielded higher returns:

In [None]:
c = 1
for b in (money/coins).argsort():
    for d in range(c):
        money[b] += bandit(b)
        coins[b] += 1
    c += 2

Check the correct number of coins

In [None]:
sum(coins)

Calculate expectation values:

In [None]:
money/coins

Play the rest (800$):

In [None]:
for r in range(8):
    c = 1
    for b in (money/coins).argsort():
        for d in range(c):
            money[b] += bandit(b)
            coins[b] += 1
        c += 2

All money is gone now:

In [None]:
sum(coins)

Did you make more profit?

In [None]:
sum(money)

Works like expected!

In [None]:
money/coins

Exploration yields better estimates for the chance of winning in each individual machine, but exploitation gives higher profit.

In [None]:
coins