# Lesson 12: Final Project

## Intro to Quantified Cognition

<a href="https://colab.research.google.com/github/compmem/QuantCog/blob/master/notebooks/12_Final_Project.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

## Final Project

- The goal of the final project is to demonstrate some application of some of the modeling tools we've discussed this semester. 

- Unless you are developing a novel theory, most mechanistic cognitive modeling involves making use of an existing model, which you then may tweak for your needs/hypotheses.

- Below I list some options for this final project

## General Info

- Please turn in the project in the form of a Jupyter notebook, along with any other files I might need to run the notebook.

- It is due ***Wednesday, May 8th, 2019***.

- I will be available for meetings from now until it is due. Please email to schedule one.

- Feel free to work together in teams of *up to a maximum of 3 people*, but make it *VERY CLEAR* on your Jupyter notebook submission who was part of the team so you can all get credit.

- Please make use of other members of the class for help along the way.

## Option 0: Some analysis of your own data

Many of you have datasets from your own research. Your project would entail performing some form of Bayesian or cognitive model-based analysis of those data. 

Note, it would be acceptable to use PyMC3 and build a Bayesian model to generate your data, as long as we have not performed the specific analysis already in class.

You must include the following:

- A short intro (a couple sentences) explaining the question you'll be asking (to frame the analysis)
- Test some alternative model variants
- Some form of model comparison (via Bayes Factor, BPIC, WAIC, etc...)
- Show model fit (at least best-fitting params, but potentially posterior predictives)
- Summary of findings

## Option 1: Generating Model Contest

Very often we have some data and our goal is to identify what model might have generated those data. For this project, I have generated sets of data with two different decision models: WFPT and LBA. 

Your job is to fit models (and model varariants with and without some params free, such as between-trial variability in drift rate) to identify what models (and parameter values, as best fits or posteriors) were used to generate each set of data.

This will require model comparison, via some method of your choosing (Bayes Factor, BPIC, WAIC, etc...).

## Option 2: LBA Model Extension

Extend LBA to include generation of a confidence value, in addition to just a choice and reaction time. One method of achieving this (though I'm open to other approaches as long as you justify them) is to assume that confidence is directly proportional to the level of activation for the accumulator with the winning choice relative to the sum of all the accumulator activations at that time. 

Intuitively, this approach makes some sense. If the selected choice has a high level of activation relative to the non-selected choice, then the confidence will be high (close to 1.0). On the other hand, if there is strong evidence for both choices and one just barely wins out over the other, then the ratio of winning to all choices will be closer to .5.

To test whether this model is, indeed, making predictions that make sense, pick the variant of the LBA model that fit best to the speed--accuracy trade-off decision data (the one that allowed the drift rates to change between conditions) and perform the fit again with this new model. Even though you are not fitting to confidence, we can simulate the model with the best-fitting parameters and generate a distribution of confidence values for the speed condition and confidence values for the accuracy condition. Some questions to answer:

- Are people more confident in the accuracy condition? 
- Are the confidence values different for correct and incorrect answers?

## Option 3: Flanker Analysis

While there have been full models of cognitive control proposed to account for the dynamics of the decision process in congruent vs. incongruent conditions, insight can be gained by fitting to these two conditions separately with a standard decision-making model.

- Fit the WFPT model separately to the incongruent and congruent trials from the flanker task
- Decide what parameters should be kept constant between the two conditions and what parameters should be allowed to change
- Justify this decision (perhaps even via a model comparison)
- Show fits of the best-fitting parameters (or posterior predictives) to the data
- Show full posteriors for the parameters and discuss whether they make sense


## *ONLY* if on Google Colab

In [None]:
# to install RunDEMC
!pip install git+https://github.com/compmem/RunDEMC.git


In [None]:
# to retrieve the data
!wget https://raw.githubusercontent.com/compmem/QuantCog/master/notebooks/decision_data.csv
!wget https://raw.githubusercontent.com/compmem/QuantCog/master/notebooks/contest_dataset_1.csv
!wget https://raw.githubusercontent.com/compmem/QuantCog/master/notebooks/contest_dataset_2.csv
!wget https://raw.githubusercontent.com/compmem/QuantCog/master/notebooks/contest_dataset_3.csv
!wget https://raw.githubusercontent.com/compmem/QuantCog/master/notebooks/flanker_s1.csv

In [None]:
# to retrieve the wfpt model
!wget https://raw.githubusercontent.com/compmem/QuantCog/master/notebooks/wfpt.py

## Load and process the data

In [1]:
# load matplotlib inline mode
%matplotlib inline

# import some useful libraries
import numpy as np                # numerical analysis linear algebra
import pandas as pd               # efficient tables
import matplotlib.pyplot as plt   # plotting
from scipy import stats

from RunDEMC.density import kdensity
from RunDEMC import Model, Param, dists, calc_bpic, joint_plot

from wfpt import wfpt_like, wfpt_gen

from joblib import Parallel, delayed
try:
    import scoop
    from scoop import futures
except ImportError:
    print("Error loading scoop, reverting to joblib.")
    scoop = None


Error loading scoop, reverting to joblib.


In [2]:
def lba_sim(I=(1.0,1.5), A=.1, S=1.0, b=1.0, t0=0.0, 
            num_sims=1000, max_time=2., I_scales_S=False, **kwargs):
    # set drift rate from inputs
    dr = np.float64(I)
    
    # set the number of choices
    nc = len(dr)
    
    # pick starting points
    k = np.random.uniform(0., A, (num_sims, nc))
    
    # pick drifts
    if I_scales_S:
        # calc S from drift rates
        S = np.sqrt((dr**2).sum())*S
        
    # must make sure at least one d is greater than zero for each sim
    d = np.random.normal(dr, S, (num_sims, nc))
    
    # see where there are none above zero
    #ind = np.all(d<=0.0,axis=1)
    #while np.any(ind):
    #    d[ind,:] = np.random.normal(dr,S,(ind.sum(),nc))
    #    ind = np.all(d<=0.0,axis=1)

    # clip it to avoid divide by zeros
    d[d<=0.0] = np.finfo(dr.dtype).eps

    # calc the times for each
    t = (b-k)/d

    # see the earliest for each resp
    inds = t.argmin(1)
    times = t.take(inds+np.arange(t.shape[0])*t.shape[1])

    # process into choices
    times += t0
    
    # get valid responses
    resp_ind = times < (max_time)
    resp = inds+1
    resp[~resp_ind] = 0
    
    # return as data frame
    return pd.DataFrame.from_dict({'choice':resp, 'rt':times})
    


In [2]:
# load in the data
dat = pd.read_csv('decision_data.csv', index_col=0)
dat = dat[dat.cond != 'Neutral']
dat['rt_acc'] = dat['rt']
dat.loc[dat.correct==0,'rt_acc'] = -dat['rt']
dat.head()

Unnamed: 0,correct,rt,cond,log_rt,rt_acc
3,1,0.4784,Accuracy,-0.737308,0.4784
4,1,0.43,Accuracy,-0.84397,0.43
5,1,0.4486,Accuracy,-0.801624,0.4486
6,1,0.3991,Speed,-0.918543,0.3991
8,1,0.4393,Accuracy,-0.822573,0.4393
