# Delay discounting analyis: SOLO models
This notebooks gives an overview of using SOLO models for analysing delay discounting data. The SOLO models estimate parameters for each data file independently from the rest. Further, each datafile is processed entirely separately. This is scalable, thus useful for very large datasets. We avoid building _very_ large models with 100's or 1,000's or participants. It can still take time, but the point is we avoid both memory and computational capacity limitations.

**Parameter estimation**

We can do parameter estimation by creating a model instance and calling the `sample_posterior` method while providing the data.

**Posterior prediction**

Once we have a posterior distribution over the parameters given the data, then we can do some posterior predictive model checking by plotting the predicted discount function along with the data. This is done with the `plot_discount_functions_region` method.
However we can also use the `df_comparison(models, data)` function in order to plot the data along with posterior predictions of mulitple methods.

**Model comparison**

Some qualitative or sanity-check model evaluation is done with plotting the posterior predictions (see above). However, we might also want to do some quantitative evaluation.

- WAIC
- LOO

We also calculate the log loss goodness of fit metric.

First, some basic boilerplate setup code

In [None]:
# file handling
from glob import glob
import os

# data + modelling
import math
import numpy as np
import pandas as pd
import pymc3 as pm

# plotting
import seaborn as sns
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt

# set up plotting preferences
plt.style.use('seaborn-darkgrid')

SMALL_SIZE = 16
MEDIUM_SIZE = 18
BIGGER_SIZE = 22

plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title

## Import toolbox code

In [None]:
# autoreload imported modules. Convenient while I'm developing the code.
%load_ext autoreload
%autoreload 2

In [None]:
from models_solo import *
from df_data import build_metadata, import_raw_data
from model_comparison import *
from fitting import fit

# Data import
For more info on this, see the **Data preparation** notebook.

In [None]:
files = glob('data/non_parametric/*.txt')

## Step 1: create experiment level metadata

In [None]:
import os as os
from collections import namedtuple

def parse_filename(fname):
    """Extract experiment meta data from a filename. Return as a named tuple 
    where the fieldname will become the column header in the experiment meta data
    table."""
    path, file = os.path.split(fname)
    initials = file.split('-')[0]
    domain = file.split('-')[1]
    Metadata = namedtuple('Metadata', ['filename', 'initials', 'domain'])
    return Metadata(filename = fname, initials=initials, domain=domain)

expt_data = build_metadata(files, parse_filename)
expt_data.head()

## Step 2: Import raw behavioural data

In [None]:
raw_data = import_raw_data(expt_data['filename'])
raw_data.head()

# Parameter estimation + Model Comparison

First we set up a list of models that we want to examine. Then we 'fit' these models (parameter estimation) then do Bayesian model comparsin

In [None]:
models = [Coinflip,
          Exponential, 
          Hyperbolic,
          HyperboloidA, 
          HyperboloidB, 
          ConstantSensitivity, 
          ExponentialPower,
          ExponentialLog,
          HyperbolicLog,
          DoubleExponential,
          BetaDelta,
          TradeOff,
          ITCH,
          DRIFT]

# When we do model comparison we want model names in the WAIC/LOO plots. There will be a better solution, but we currenrly implement the workaround.
# See https://discourse.pymc.io/t/can-we-add-model-names-when-we-do-model-comparison/935/2 for more.
MODEL_NAME_MAP = {
    0: "Coinflip",
    1: "Exponential",
    2: "Hyperbolic",
    3: "Hyperboloid A",
    4: "Hyperboloid B",
    5: "Constant Sensitivity",
    6: "Exponential Power",
    7: "Exponential Log",
    8: "Hyperbolic Log",
    9: "Double Exponential",
    10: "BetaDelta",
    11: "TradeOff",
    12: "ITCH",
    13: "DRIFT"
}

Running 14 models on all the participants can take time. So for basic testing, we can use the code commented out below:

In [None]:
# models = [Coinflip,
#           Exponential, 
#           Hyperbolic]

# MODEL_NAME_MAP = {
#     0: "Coinflip",
#     1: "Exponential",
#     2: "Hyperbolic"
# }

In [None]:
# NOTE: This will take time to compute! And it will save outputs to the specified directory
results = fit(models, raw_data, expt_data, MODEL_NAME_MAP, save_dir='temp_analysis')

## Examine results
We should now have a series of saved plots. These are all located in the specified `save_dir` which by default equals `'temp'`. In this folder there are model comparison plots, one for each participant. There are also subfolders for each participant, which contains a series of plots for model diagnostics etc.

In [None]:
results