## Implement and simulate RL model 

This notebook implements a basic RL model with a n-armed bandit task. 
The simulation should produce results that mimic the behavioral data including:

* trial: trial number in block
* stimulus: stimulus presented for this trial (1:x, x = set size)
* key_press: action for the trial (0 = J, 1 = K, 2 = L)
* key_answer: correct action for the trial (0 = J, 1 = K, 2 = L)
* correct: whether response was corect
* set_size: set size of this block
* set: image folder used for this block
* img_num: image file used for this trial's stimulus
* iteration: how many times this stimulus has been seen so far
* delay: how many trials since last presentation of this stimulus
* reward_history: how many correct responses for this stimulus since block start

This should also be able to look at the affects of various hyperparameters in the model including: 

* alpha (float between 0 and 1): learning rate
* beta (int): temperature
* epsilon (float between 0 and 1): noise
* phi (float between 0 and 1): decay
* pers (float between 0 and 1): perseveration, values closer to 1 have complete neglect of 
* negative feedback
* values near 0 represent equal learning from positive and negative learning


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
import pandas as pd
import random

In [65]:
class simulate_RL6():
    def __init__(self, alpha, beta, epsilon, phi, pers, T):
        self.alpha = alpha
        self.pers = pers
        self.neg_alpha = (1-self.pers)*self.alpha
        self.beta = beta
        self.epsilon = epsilon
        self.phi = phi
        self.pers = pers
        self.T = T
        self.Q = np.zeros((3,3))
        self.correct = []
        self.key_answer = []
        self.stimulus = []
    def fit(self):
        """This function will train the model
        """
        for i in range(0, self.T):
            s = np.random.randint(0, 3)
            self.stimulus.append(s)
            p = np.exp(self.beta * self.Q[s, :])
            p= p/np.sum(p)
            p = (1-self.epsilon)*p + self.epsilon*(1/3)
            a = random.choices([0, 1, 2], weights=p)[0]
            self.key_answer.append(a)
            if s == 0 and a ==0:
                r = 1
            elif s==0 and a !=0:
                r=0
            elif s==1 and a==0:
                r=1
            elif s==1 and a!=0:
                r=0
            elif s==2 and a==2:
                r=1
            elif s==2 and a!=2:
                r=0


            Q0 = self.Q[s, a].copy()
            if r==0:
                self.Q[s,a] = self.Q[s,a] + self.neg_alpha*(r-self.Q[s,a])
            else:
                self.Q[s, a] = self.Q[s, a] + self.alpha*(r-self.Q[s, a])

            self.Q[s,a] = self.Q[s, a] + self.phi*(Q0-self.Q[s, a])
            self.correct.append(r)
        
        
        pass
    def _get_delay():
        pass
    def _get_iteration():
        pass
    def _get_reward_history():
        pass
        
    

In [66]:
model = simulate_RL6(1, 0.5, 0.4, 0.6, 0.6, 10)

In [67]:
model.fit()