# Assignment 3: BEST Decisions
## Quantified Cognition
### Psychology 5332


# Name: *Your Name Here*
# User ID: *Your ID Here*

# Objectives

Upon completion of this assignment, the student will demonstrate the ability to:

1. Build/Modify a PyMC3 model
2. Fit the model to data
3. Plot parameter posteriors
4. Evaluate model fit results


# Assignment

- The goal of this assignment is to analyze the reaction time data from the perceptual decision-making task we examined in the first assignment. We want to know whether there are reliable differences in RTs (either means or standard deviations) between the speeded, accuracy, or normal conditions. 

- You will perform this assignment by writing code in *this notebook* (***after making a copy and renaming it to have your userid in the title --- e.g., A03_BEST_decisions_mst3k***).

- In addition to this notebook, you will need to download the data from the same directory on GitHub. The file is decision_data.csv.

- ***When you are done, save this notebook as HTML (`File -> Download as -> HTML`) and upload it to the matching assignment on UVACollab.***

## HINTS

- Be sure to comment your code
- I have provided cells with general instructions for what they should contain.
  

In [None]:
# Load in necessary modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pymc3 as pm


In [None]:
# Read in the data from the decision_data.csv file to a Pandas DataFrame
# (This is data from a perceptual decision-making task, BTW)
df = pd.read_csv('decision_data.csv', index_col='Unnamed: 0')


In [None]:
# Use the .head method of the DataFrame to show the first few lines of the file
# The columns are:
#   correct: whether they made a correct response (1=Correct, 0=Incorrect)
#   rt: their reaction time in seconds
#   cond: whether they were in a speeded, accuracy, or neutral condition
#   log_rt: log-transformed reaction time
df.head(5)

## Bayesian t-test with all three pairwise comparisons

We want to test whether there are differences between the log reaction times (`log_rt` column in the data) for *correct* responses in each of the three conditions (`cond` column): Accuracy, Neutral, and Speed.

I've pasted in the 2-class independent samples t-test from class. You will need to extend this to three classes. 

First, you'll need to extract the log RTs for each of the three conditions.

Then, you'll need to modify the model code pasted below to fit to one more data (it's currently fitting to A and B dataset, but you need to add a C dataset.)

Finally, you'll fit the model to the data and plot posteriors (especially of the differences in posteriors) to assess where there are differences in the means (mu) and standard deviations (sd) between pairs of conditions.


In [None]:
# Make three datasets from df
# one for each condition 
# (like A and B from class, but for all three conditions).



In [None]:
# Build a PyMC3 model to perform a Bayesian t-test
# ***UPDATE THIS MODEL TO HANDLE THREE CONDITIONS***
with pm.Model() as model:
    # set up the params/priors for each condition
    mu_A = pm.Normal('mu_A', A.mean(), A.std()*2.0)
    sd_A = pm.HalfCauchy('sd_A', 5)
    
    mu_B = pm.Normal('mu_B', B.mean(), B.std()*2.0)
    sd_B = pm.HalfCauchy('sd_B', 5)
    
    # degrees of freedom are shared across conditions
    nu = pm.Exponential('df_minus_one', 1/29.) + 1.
    
    # build the model (needs to have three conditions)
    data_A = pm.StudentT('data_A', mu=mu_A, sd=sd_A, nu=nu, observed=A)
    data_B = pm.StudentT('data_B', mu=mu_B, sd=sd_B, nu=nu, observed=B)
    
    # set up some deterministic vars to keep (need this for all pairwise combo of A, B, and C)
    diff_of_means = pm.Deterministic('difference of means', mu_A - mu_B)
    diff_of_stds = pm.Deterministic('difference of stds', sd_A - sd_B)
    effect_size = pm.Deterministic('effect size',
                                   diff_of_means / np.sqrt((sd_A**2 + sd_B**2) / 2))


In [None]:
# fit the model
with model:
    trace = pm.sample(2000, cores=2)

In [None]:
# plot the traces
pm.traceplot(trace);

In [None]:
# plot the posterior differences
# ***Update this for all your pairwise comparisons***
pm.plot_posterior(trace, varnames=['difference of means','difference of stds', 'effect size'],
                  ref_val=0.0);

## Evaluation

***Where do we see reliable differences in mean or standard deviation between conditions?***

### Write your short answer here:
