# Task model

## Bandit task

Parameters: 
* Number of option ($N$)
* Distribution of probability over these options ($\{p_{reward}(i)\}_{i\in N}$)

# Cognitive model

## Random

* Decision rule ($\tau$)

\begin{equation}
p_{choice}(i) \sim \text{Uniform}
\end{equation} 

* Updating rule

[None]

## Rescola-Wagner

* Decision rule ($\tau$)

\begin{equation}
p_{choice}(i) = \dfrac{\exp (v(i)/\tau)}{\sum_{j \in N} \exp (v(j)/\tau)} 
\end{equation}

* Updating rule ($\alpha$)

\begin{equation}
v^{t+1}(i) = v^t(i) + \alpha(s - v^t(i))
\end{equation}

In [None]:
class Random:
    
    def __init__(self, n_option):
        self.options = np.arange(n_option)
        
    def choose(self):
        return self.decision_rule()
    
    def learn(self, i, success):
        self.updating_rule(i, success)
    
    def decision_rule(self):
        return np.random.random()
    
    def updating_rule(self, i, success):
        pass
        

In [None]:
import numpy as np

class RL(Random):
    def __init__(self, n_option, learning_rate, temp):
        super().__init__(n_option)
        self.values = np.zeros(n_option)
        self.learning_rate = learning_rate
        self.temp = temp
    
    def decision_rule(self):
        p = np.exp(self.values/self.temp) / np.sum(self.values/self.temp)
        return np.random.choice(self.options, p=p)
        
    def updating_rule(self, i, success):
        self.values[i] += self.learning_rate * (success - self.values[i])