# Score discounting data

The goal of this notebook is to score the raw delay discounting data. 

Our primary analysis is intended to focus upon the $\log(k)$ parameter of the Hyperbolic discount function (Mazur, 1987). This was chosen because it only has one parameter, but is also well known to provide good empirical fits to discounting data.



For the sake of completeness and to test the robustness of the findings, we also conduct analyses based upon fits of multiple discount functions:
- Exponential (Samuelson, 1937)
- Hyperbolic (Mazur, 1987)
- Modified Rachlin (Vincent & Stewart, 2019)
- Hyperboloid (Myerson & Green, 1995).

Because these discount functions have different numbers of parameters, we must a) compare them based upon a single metric, and b) use analysis methods which control for model complexity (e.g. number of parameters). We achieve the former by using the Area Under Curve (AUC) metric (Myerson, Green, & Warusawitharana, 2001), and the latter by using AIC and BIC information criterion metrics.

The adaptive delay discounting procedure used was an early version of that developed by Vincent & Rainforth (pre-print). It's goal is to adaptively choose discount functions so as to reduce our uncertainty about the $\log(k)$ discount rate parameter of the hyperbolic function as efficientl as possible. So while the obtained behavioural data _can_ be used to estimate the parameters of other discount functions, we must bear in mind that distinguishing between different discount functions was not the aim of our adaptive procedure.

We output a long-format .csv file for each of the discount functions. We conduct Bayesian parameter estimation using mulitple discount functions (Exponential, Hyperbolic, ...) and then calculate the Area Under Curve (AUC). This is _non_ hierarchical, in that we do parameter estimation for each raw delay discounting experiment (ie participant/condition/commodity combination) separately.

In [None]:
import pandas as pd
import os
from glob import glob
import numpy as np
np.random.seed(123)  # Initialize random number generator

# plotting
import seaborn as sns
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt

# PyMC3 & my models
import pymc3 as pm
import models
from plot import plot_data

# Autoreload imported modules. Convenient while I'm developing the code.
%load_ext autoreload
%autoreload 2

Create a list of filenames of the raw delay discounting data files that we want to iterate over.

In [None]:
root_data_folder = f'../data/discounting/'
search_string = root_data_folder + '*.txt'
files = glob(search_string)

Define our set of models which we will use to conduct parameter estimation with.

In [None]:
model_list = {'Exponential': models.Exponential, 
              'Hyperbolic': models.Hyperbolic,
              'ModifiedRachlin': models.ModifiedRachlin, 
              'Hyperboloid': models.Hyperboloid}

In [None]:
sample_options = {'tune': 1_000, 'draws': 2_000,
                  'chains': 4, 'cores': 4,
                  'nuts_kwargs': {'target_accept': 0.95}}

Set up our output location for saved files.

In [None]:
out_dir = 'fits/'
if not os.path.exists(out_dir):
    os.makedirs(out_dir)

Iterate over all the raw files, fitting multiple models. As we go, we build up a list of dataframes which (at the end) is concatenated into a single dataframe. This will be in long format, so each row corresponds to a single file (ie participant, commodity, condition combination).

In [None]:
def parse_filename(fname):
    path, file = os.path.split(fname)
    initials, commodity, condition, date, time = file.split('-')
    return (initials, commodity, condition)

## Iterate over files and discount functions
🔥 This will take a few hours to compute 🔥

In [None]:
# Empty lists for our discount functions
fit_data = [[], [], [], []]

for i, fname in enumerate(files):
    
    fig, ax = plt.subplots(1, 2, figsize=(12, 6))
    
    initials, commodity, condition = parse_filename(fname)
    data = pd.read_csv(fname, sep='\t')
    
    plot_data(data, ax[0])
    
    # build a list of each model
    models = [None, None, None, None]
    
    for m, (model_name, model_build_func) in enumerate(model_list.items()):
        models[m] = model_build_func(data)
        models[m].fit(sample_options)
        models[m].plot(ax[0], label=model_name)

        # save info to appropriate element in fit_data list
        info = {'id': [initials], 
                'commodity': [commodity],
                'condition': [condition],
                'model': [model_name], 
                'log_loss': [np.median(models[m].metrics['log_loss'])],
                'AUC': [models[m].metrics['AUC']],
                'WAIC': [models[m].metrics['WAIC']], 
                'roc_auc': [models[m].metrics['roc_auc']]}
        # add params
        params = models[m].mean_parameters()
        # merge params into info
        info.update(params)
        row = pd.DataFrame.from_dict(info)
        display(row)
        fit_data[m].append(row)

    # update and export data + model fit figure
    ax[0].legend()
    ax[0].set_title(f'{initials}, {commodity}, {condition}')
    ax[0].set_xlabel('delay [days]')
    ax[0].set_ylabel('discount fraction')

    # now we've got all models for this file, we can do model comparison
    df_comp_WAIC = pm.compare({models[0].model: models[0].trace,
                               models[1].model: models[1].trace, 
                               models[2].model: models[2].trace,
                               models[3].model: models[3].trace})
    
    display(df_comp_WAIC)

    # save WAIC model comparison plot
    waic_plot = pm.compareplot(df_comp_WAIC, ax=ax[1])

    savename = f'fits/{initials}_{commodity}_{condition}.pdf'
    plt.savefig(savename, bbox_inches='tight')
    plt.close(fig)

    
# Now concatenate and export the .csv files for each model
for m, (model_name, _) in enumerate(model_list.items()):
    fit_data[m] = pd.concat(fit_data[m], ignore_index=True)
    fit_data[m].to_csv(f'parameter_estimation_{model_name}.csv', index=False)

## References

Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior (pp. 55–73). Hillsdale, NJ: Erlbaum.

Myerson, J., & Green, L. (1995). Discounting of delayed rewards: Models of individual choice. Journal of the Experimental Analysis of Behavior, 64 (3), 263–276

Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the Experimental Analysis of Behavior, 76(2), 235–243. http://doi.org/10.1901/jeab.2001.76-235

Samuelson, P. A. (1937). A note on measurement of utility. The Review of Economic Studies, 4(2), 155.

Vincent & Rainforth (pre-print) 

Vincent, B. T., & Stewart, N. (2019). The case of muddled units in temporal discounting.