# Reinforcement learning model notebook

This notebook will allow you to run the reinforcement learning model presented in the paper on an example participant, loading in their specific subject csv

The code for the reinforcement learning model is found under src.rl.mb_mf_fit



In [1]:
import pandas as pd
from src.rl.mb_mf_fit import MB_MF_rllik_learn_mat_arms, param_init, apply_priors
import scipy.optimize
import numpy as np

In [2]:
# setting parameters (note this is not taking into account the priors)
bounds = [
                (0.00001, 20),
                (0.00001, 0.9999),
                (0.00001, 0.9999),
                (0.00001, 0.9999),
                (0.00001, 0.9999),
                (0.00001, 0.9999),
                (0.00001, 0.9999),
                (-20, 20),
                (-20, 20),
                (0.00001, 0.9999),
                (0.00001, 0.9999),
            ]
def param_init(bounds):
    """
    feed in a list of tuples of bounds and the param_init function will return a set of starting point params from within those bounds
    """
    params = []
    for i in range(len(bounds)):
        lower = bounds[i][0]
        higher = bounds[i][1]
        curr_param = np.random.default_rng(202204).uniform(low=lower, high=higher, size=1)
        params.append(curr_param[0])
    return params
params =param_init(bounds)

In [3]:
sub_df = pd.read_csv('../../data/interim/experiment_2/subject_csvs/sub_A1CUDX7TTS2W61.csv')

The following runs through a single iteration of the reinforcement learning model, with a single $w$ parameter, a learning of the transition matrix, as well as $\eta$ and $\kappa$ each being left as free parameters this function will return a negative log-likelihood value for this iteration but the priors have not been applied here

In [4]:
# single run 
MB_MF_rllik_learn_mat_arms(
    params, sub_df, stakes='1', final=False, kappa_equivalent=False
)

array([977.33081467])

Now we will attach the code that weights this negative log-likelihood by our priors and we will get a slightly different value

In [5]:
model_func_parameters = [sub_df, '1', False, False]
apply_priors(params, model_func_parameters)

array([1019.15892043])

Now we can use the full scipy.optimize.minimize function to find the optimal parameters for this individual given our priors (this will take approximately 30 seconds)

In [6]:
fit = scipy.optimize.minimize(
    apply_priors,
    params,
    args=([sub_df, '1', False, False]),
    method="L-BFGS-B",
    bounds=bounds,
)

In [7]:
fit

      fun: array([154.58468828])
 hess_inv: <11x11 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 0.00028706,  0.00090949,  0.00017337,  0.00035811,  0.00013927,
        0.00013927,  0.00013927, -0.00093507,  0.00253522,  0.00043769,
       -0.00021032])
  message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 864
      nit: 52
     njev: 72
   status: 0
  success: True
        x: array([ 2.19670665,  0.81635205,  0.52280367,  0.88147683,  0.50001739,
        0.50001739,  0.50001739,  0.05185304, -0.14285279,  0.42637492,
        0.46106694])

Here we can see the final negative log likelihood was 154.584
The number of iterations was 39
and we get values out for each of the parameters. 

This subject's   
softmax inverse temp $\beta$ : 2.19670665  
learning rate $\alpha$ : 0.81635205     
eligibility trace decay $\lambda$ : 0.52280367     
weight low stakes low arm $w$ : 0.88147683 (this will be the overall weight if the stakes flag is not triggered)  
weight high stakes low arm (we fit with stakes = '1' so this isn't fit away from the initialization)  
weight low stakes high arm (we fit with stakes = '1' so this isn't fit away from the initialization)  
weight high stakes high arm (we fit with stakes = '1' so this isn't fit away from the initialization)  
stickiness $\pi$ : 0.05185304    
response stickiness $\rho$ : -0.14285279     
transition matrix updating $\eta$ : 0.42637492      
sophisticated updating of the other action $\kappa$ : 0.46106694    