# Data simulation and model comparison

In this notebook I will simulate data based on the risk and ambiguity task in the aging experiment.

In this version of the task there are 84 trials.

Values: 5, 8, 12, 25

Risk: 0.25, 0.5, 0.75

Ambiguity: 0, 0.24, 0.5, 0.74


## Libraries that will be used in the experiment

In [1]:
import pandas as pd
import numpy as np
import scipy as sp

from scipy.special import expit
from scipy import stats, special
from scipy.optimize import minimize 

import pymc3 as pm
import arviz as az
import cmdstanpy

import matplotlib.pyplot as plt
import seaborn as sns

## Load data and simulate participants choices

This file includes all the trials w/o choices

In [3]:
db = pd.read_csv('data/sim.csv')

Here we declare how many participants will be simulated.

risk aversion (α) is taken from a beta distribution with a mean of 0.33, and then multiply by 2 for a final mean of 0.66 (medium risk aversion).

Ambiguity aversion (β) is taken from a truncated normal distribution between -1.5 and 1.5 with a 0.3 average (slight ambiguity aversion).


In [14]:
n_subs = 20

α_true = np.random.beta(2,   4, n_subs)
α_true = α_true*2
β_true = stats.truncnorm.rvs(-1.5, 1.5, 0.3, size = n_subs)

values    = np.tile(np.array(db.value),     n_subs)
risk      = np.tile(np.array(db.risk),      n_subs)
ambiguity = np.tile(np.array(db.ambiguity), n_subs)

In [15]:
refValue       = 5 # constant
refProbability = 1 # constant
refAmbiguity   = 0 # constant

refProbabilities = np.tile(refProbability, len(values))
refValues        = np.tile(refValue,       len(values))
refAmbiguities   = np.tile(refAmbiguity,   len(values))

In [16]:
riskTol = np.repeat(α_true, len(risk) / n_subs)
ambTol  = np.repeat(β_true, len(ambiguity) / n_subs)

Simulate choice base on the utility function 

In [17]:
uRef = refValues ** riskTol
uLotto = (values ** riskTol) * (risk - (ambTol * ambiguity / 2))
p = sp.special.expit(uLotto - uRef)

choices = np.random.binomial(1, p, len(p))

In [18]:
sub_idx = np.arange(n_subs)
sub_idx = np.repeat(sub_idx, 84)
ID = sub_idx+1
n_trials = np.arange(len(choices))

Create a simulation dataframe 

In [19]:
simdata = pd.DataFrame({'sub': ID, 'choices': choices, 'values': values, 'risk': risk, 'ambiguity': ambiguity, 'riskTol': riskTol, 'ambTol': ambTol})
simdata.to_csv('results/simdata.csv')
simdata.head()

Unnamed: 0,sub,choices,values,risk,ambiguity,riskTol,ambTol
0,1,0,5,0.25,0.0,1.060578,-0.196382
1,1,0,5,0.5,0.5,1.060578,-0.196382
2,1,1,12,0.5,0.74,1.060578,-0.196382
3,1,1,25,0.5,0.74,1.060578,-0.196382
4,1,0,8,0.5,0.5,1.060578,-0.196382


## Maximum Likelihood

The first approach I will test is Maximum likelihood. 

In [21]:
def MLE_riskamb(parameters):
    # extract parameters
    α, β, γ = parameters
    
    
    svLotto = (db_sub['values'].values ** α) * (db_sub['risk'].values - (β * (db_sub['ambiguity'].values/2)))
    svRef = 5 ** α
    #p = 1/(1+np.exp((svRef - svLotto) * γ))
    p = sp.special.expit((svLotto - svRef)* γ)

    # Calculate the log-likelihood for normal distribution
    LL = np.sum(stats.norm.logpdf(db_sub.choices, p))
    
    # Calculate the negative log-likelihood
    neg_LL = -1*LL
    return neg_LL 

In [22]:
subs = simdata['sub'].unique()
mLL = pd.DataFrame()
for sub in subs:
    db_sub = simdata[simdata['sub'] == sub]
    mle_model = minimize(MLE_riskamb, np.array([1,1,1]), method='L-BFGS-B', bounds=[(0,2),(-1.5,1.5),(-np.inf,np.inf)])
    x = mle_model.x
    temp = {'sub': sub, 'alpha' : x[0], 'beta': x[1], 'gamma': x[2]}
    mLL = mLL.append(temp,ignore_index=True)

In [23]:
mLL['α_true'] = α_true
mLL['β_true'] = β_true
mLL['β_true'] = β_true
mLL['∆α'] = mLL.alpha - mLL.α_true
mLL['∆β'] = mLL.beta - mLL.β_true
mLL

Unnamed: 0,alpha,beta,gamma,sub,α_true,β_true,∆α,∆β
0,1.027755,-0.14762,1.202876,1.0,1.060578,-0.196382,-0.032824,0.048762
1,0.543841,0.277514,0.752108,2.0,0.482898,0.166313,0.060943,0.111201
2,0.861355,1.461015,25.407092,3.0,0.813778,1.17736,0.047577,0.283656
3,1.47478,0.914778,7.693282,4.0,1.385437,0.910229,0.089343,0.004549
4,0.0,-0.867271,1.440395,5.0,0.070385,-0.379166,-0.070385,-0.488105
5,0.966261,0.636993,0.984578,6.0,0.816412,0.648739,0.149849,-0.011746
6,0.0,-1.438835,1.15942,7.0,0.322352,-0.591591,-0.322352,-0.847244
7,0.672768,1.214418,0.940284,8.0,0.4671,1.317581,0.205668,-0.103163
8,0.0,-1.5,1.624996,9.0,0.136524,-0.596887,-0.136524,-0.903113
9,0.366484,-0.349624,0.90567,10.0,0.314409,-0.745825,0.052075,0.396201


## The second approach is using Bayes-STAN

In [24]:
AmbiguityModel = cmdstanpy.CmdStanModel(stan_file='models/AmbiguityModel.stan')

INFO:cmdstanpy:compiling stan file /home/nachshon/Documents/Aging/Aging/RiskandAmbiguity/models/AmbiguityModel.stan to exe file /home/nachshon/Documents/Aging/Aging/RiskandAmbiguity/models/AmbiguityModel
INFO:cmdstanpy:compiled model executable: /home/nachshon/Documents/Aging/Aging/RiskandAmbiguity/models/AmbiguityModel


In [25]:
standata_ambiguity = {
    'N' : len(refProbabilities),
    'choice' : simdata['choices'],
    'refProbabilities' : refProbabilities,
    'refAmbiguities' : refAmbiguities,
    'refValues' : refValues,
    'lotteryProbabilities' : simdata['risk'],
    'lotteryAmbiguities' : simdata['ambiguity'],
    'ID' : ID,
    'lotteryValues' : simdata['values'],
    'n_sub': n_subs
}

In [26]:
fit_ambiguity_model = AmbiguityModel.sample(
  data = standata_ambiguity,
  chains = 4,
  iter_warmup = 2000,
  iter_sampling = 2000,
  adapt_delta = .9,
  inits = 0.2,
)

INFO:cmdstanpy:CmdStan start procesing


chain 1 |          | 00:00 Status

chain 2 |          | 00:00 Status

chain 3 |          | 00:00 Status

chain 4 |          | 00:00 Status

                                                                                                                                                                                                                                                                                                                                

INFO:cmdstanpy:CmdStan done processing.





In [27]:
stan = az.from_cmdstanpy(posterior=fit_ambiguity_model,
                                       posterior_predictive="y_hat",
                                       log_likelihood="log_lik")

## Next we will look at PYMC3

In [28]:
with pm.Model() as RiskAmb:
    # hyper
    aMu = pm.Normal('rMu', 0.7 ,1)
    aSig = pm.Exponential('rSig', 1)
    bMu = pm.Normal('aMu', 0,1)
    bSig = pm.Exponential('aSig', 1)
    
    gMu = pm.Normal('nMu', 0,1)
    gSig = pm.Exponential('nSig', 1)
    
    
    α = pm.TruncatedNormal('α', aMu, aSig, lower = 0, upper = 2, shape = n_subs)
    β = pm.TruncatedNormal('β', bMu, bSig, lower = -1.5, upper = 1.5, shape = n_subs)
    γ = pm.Lognormal('γ', gMu , gSig, shape = n_subs)
    # Priors for unknown model parameters
    
      
    # Expected value of outcome
    svLotto = (simdata['values'].values ** α[sub_idx]) * (simdata['risk'].values - (β[sub_idx] * (simdata['ambiguity'].values/2)))
    svRef = 5 ** α[sub_idx]
    p = (svLotto - svRef)/γ[sub_idx]
    mu = pm.invlogit(p)
       
    # Likelihood (sampling distribution) of observations
    choice = pm.Binomial('choice',1, mu, observed=simdata['choices'])
    trace = pm.sample(4000, return_inferencedata=True, target_accept=0.95)

Auto-assigning NUTS sampler...
INFO:pymc3:Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
INFO:pymc3:Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc3:Multiprocess sampling (4 chains in 4 jobs)
NUTS: [γ, β, α, nSig, nMu, aSig, aMu, rSig, rMu]
INFO:pymc3:NUTS: [γ, β, α, nSig, nMu, aSig, aMu, rSig, rMu]


Sampling 4 chains for 1_000 tune and 4_000 draw iterations (4_000 + 16_000 draws total) took 537 seconds.
INFO:pymc3:Sampling 4 chains for 1_000 tune and 4_000 draw iterations (4_000 + 16_000 draws total) took 537 seconds.
There were 254 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc3:There were 254 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.8861047028927292, but should be close to 0.95. Try to increase the number of tuning steps.
There were 241 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc3:There were 241 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.9021393570062204, but should be close to 0.95. Try to increase the number of tuning steps.
There were 95 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc3:There were 

In [None]:
az.compare({'pymc':trace, 
            'stan': stan},
            ic='waic')

In [None]:
model_compare = az.compare({'pymc':trace, 'stan': stan})
az.plot_compare(model_compare, figsize=(12, 4))

plt.show()

In [None]:
α_pymc = az.summary(trace, var_names=['α'])['mean']
β_pymc = az.summary(trace, var_names=['β'])['mean']
α_stan = az.summary(stan, var_names=['riskTol'])['mean']
β_stan = az.summary(stan, var_names=['ambTol'])['mean']

In [None]:
bayesA = pd.DataFrame({'α_true': α_true, 'α_pymc': α_pymc.values, 'α_stan': α_stan.values,})

bayesA

In [None]:
bayesB = pd.DataFrame({'β_true': β_true, 'β_pymc': β_pymc.values, 'β_stan': β_stan.values, })

bayesB

In [None]:
az.summary(stan, var_names=['riskTol','ambTol'])

In [None]:
az.summary(trace, var_names=['α','β'])