# Assignment 3: BEST Decisions
## Quantified Cognition
### Psychology 5332


# Name: *Your Name Here*
# User ID: *Your ID Here*

# Objectives

Upon completion of this assignment, the student will demonstrate the ability to:

1. Build/Modify a RunDEMC model
2. Fit the model to data
3. Plot parameter posteriors
4. Evaluate model fit results


# Assignment

- The goal of this assignment is to analyze the reaction time data from the perceptual decision-making task we examined in the first assignment. We want to know whether there are reliable differences in RTs (either means or standard deviations) between the speeded, accuracy, or normal conditions. 

- You will perform this assignment by writing code in *this notebook* (***after making a copy and renaming it to have your userid in the title --- e.g., A03_BEST_decisions_mst3k***).

- In addition to this notebook, you will need to download the data from the same directory on GitHub. The file is decision_data.csv.

- ***When you are done, save this notebook as HTML (`File -> Download as -> HTML`) and upload it to the matching assignment on UVACollab.***

## HINTS

- Be sure to comment your code
- I have provided cells with general instructions for what they should contain.
  

In [None]:
# Load in necessary modules
%matplotlib inline

# import some useful libraries
import numpy as np                # numerical analysis linear algebra
import pandas as pd               # efficient tables
import matplotlib.pyplot as plt   # plotting
from scipy import stats

import arviz as az

# load pieces from RunDEMC
from RunDEMC import Model, Param, dists
from RunDEMC.io import arviz_dict

In [None]:
# Read in the data from the decision_data.csv file to a Pandas DataFrame
# (This is data from a perceptual decision-making task, BTW)
df = pd.read_csv('decision_data.csv', index_col='Unnamed: 0')

In [None]:
# Use the .head method of the DataFrame to show the first few lines of the file
# The columns are:
#   correct: whether they made a correct response (1=Correct, 0=Incorrect)
#   rt: their reaction time in seconds
#   cond: whether they were in a speeded, accuracy, or neutral condition
#   log_rt: log-transformed reaction time
df.head(5)

## Bayesian t-test with all three pairwise comparisons

We want to test whether there are differences between the log reaction times (`log_rt` column in the data) for *correct* responses in each of the three conditions (`cond` column): Accuracy, Neutral, and Speed.

I've pasted in the 2-class independent samples t-test from class. You will need to extend this to three classes. 

First, you'll need to extract the log RTs for each of the three conditions.

Then, you'll need to modify the model code pasted below to fit to one more data (it's currently fitting to A and B dataset, but you need to add a C dataset.)

Finally, you'll fit the model to the data and plot posteriors (especially of the differences in posteriors) to assess where there are differences in the means (mu) and standard deviations (sd) between pairs of conditions.


In [None]:
# Make three datasets from df
# one for each condition 
# (like A and B from class, but for all three experimental conditions).



In [None]:
# Build a RunDEMC model to perform an independent samples Bayesian t-test
# ***UPDATE THIS MODEL TO HANDLE THREE CONDITIONS***
# Define a likelihood function
def ind_students_like(pop, *args):
    # pull the dat out of the args
    datA = args[0]
    datB = args[1]
    
    # instantiate the model with the params, 
    distA = dists.students_t(pop['mu_A'][:, np.newaxis], 
                             pop['sigma_A'][:, np.newaxis], 
                             pop['nu'][:, np.newaxis])
    distB = dists.students_t(pop['mu_B'][:, np.newaxis], 
                             pop['sigma_B'][:, np.newaxis], 
                             pop['nu'][:, np.newaxis])
    
    # calc the log like and sum across obs
    log_like = np.log(distA.pdf(datA)).sum(1)
    log_like += np.log(distB.pdf(datB)).sum(1)
    log_like[np.isnan(log_like)] = -np.inf
    
    # return the log like for each proposal
    return log_like

# set up the model
params = [Param(name='mu_A', prior=dists.normal(overall_mean, overall_std*2.0)),
          Param(name='mu_B', prior=dists.normal(overall_mean, overall_std*2.0)),
          Param(name='sigma_A', prior=dists.halfcauchy(5)),
          Param(name='sigma_B', prior=dists.halfcauchy(5)),
          Param(name='nu', prior=dists.exp(1/29.), 
                transform=lambda x: x + 1,
                inv_transform=lambda x: x - 1)]

# set up DE
m = Model('t-test_ind', 
          params=params,
          like_fun=ind_students_like,
          like_args=(A, B),
          verbose=True)

# run the model, first with some burnin
times = m.sample(100, burnin=True)
times = m.sample(1600, burnin=False)

In [None]:
# convert the data to arviz


In [None]:
# show an arviz summary


In [None]:
# plot the traces


In [None]:
# calculate posterior differences


In [None]:
# plot the posterior differences for all possible comparisons of interest


## Evaluation

***Where do we see reliable differences in mean or standard deviation between conditions?***

### Write your short answer here:
