# Multi-armed Bandits
An implementation of the $k$-armed bandit problem with greedy and $\epsilon$-greedy action selection methods as described in Chapter 2 of _Reinforcement Learning: An Introduction, 2nd Edition (2018)_ by Richard S. Sutton and Andrew G. Barto. 

In [2]:
%load_ext autoreload
%autoreload 2

import numpy as np
import scipy as sp

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
class KArmedBandit:
    """
    Class implementing a k-armed bandit.
    
    For each of the k actions, the expected action is initialised from a
    Gaussian distribution with mean 0 and unit variance.
    
    The reward for each action is then selected from a Gaussian distribution
    with mean of the expected action and unit variance.
    
    Args:
        k (int, default=10): Number of actions (arms)
    Atributes:
        q_a (1D array of floats): List of expected action values with length k
    """
    
    def __init__(self, k=10):
        """
        Initialise expected action values
        """
        self.k = k
        self.q_a = np.random.normal(0.0, 1.0, k)
    
    def reward(self, action):
        """
        Generate reward for a particular action.
        
        Args:
            action (int): Integer action from 0 to k-1
        """
        return np.random.normal(self.q_a[action], 1.0)