# Analyzing behavioral data and fitting a hierarhical DDM in Python

In [1]:
# pandas is 'R' for Python: it offers convenient dataframes with split-apply-combine-functionality, etc.
import pandas

# seaborn is a plotting library
import seaborn as sns

import matplotlib.pyplot as plt

# Use numpy
import numpy as np



In [2]:
# plot inline
%matplotlib inline

Now we load the behavioral dataframe with pandas

In [3]:
df = pandas.read_pickle('/data/behavior.pandas')

[Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html) is a library for Python that aids in statitical analysis.

Have a short look at this [tutorial](http://pandas.pydata.org/pandas-docs/stable/10min.html).

1) Now use the .head()-method. What kind of data do you see?

2) Use the groupby()-method to find mean values for the different conditions. What do you find?

factorplot is a very nice tool to plot mean values over different conditions. It also provides 95% bootstrapped error-bars. Therefore it's important that you provide the factor that corresponds to independent observations (in this (and most) case(s) the subject: subj_idx)

In [None]:
sns.factorplot('cond', 'rt', data=df, units='subj_idx', kind='point')

Can you make the same plot for correct/incorrect?

In [None]:
sns.factorplot(...

2) To also use 'correct' as a factor, we have to map it to a string. Can you spot what this code does?

In [29]:
df['correct_str'] = df.correct.map({1:'correct', 0:'incorrect'})

3) Now plot RTs using factorplot, splitting on condition using the x-value (like before), and on correct/incorrect using the hue-variable.

4) Especially the apply-fucntion is very powerful. What am I doing here?

In [88]:
def return_quantiles(d):
    ps = [10, 30, 50, 70, 90]
    qs =  np.percentile(d, ps)
    tmp  = pandas.DataFrame()
    tmp['p'] = ps
    tmp['q'] = qs
    return tmp

In [None]:
dg = df.groupby(['subj_idx', 'cond', 'correct']).rt.apply(return_quantiles).reset_index()
dg.head()

In [None]:
g = sns.FacetGrid(dg, row="subj_idx", hue="correct", col='cond', palette=['red','green'], sharex=False)
g.map(plt.plot, "q", "p", lw=2, marker='o')
g.add_legend();
plt.ylim(0, 100)

6) What kind of patterns do you observe?

Try to use [Facetgrid](http://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html) and [sns.distplot](http://stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html#combining-plot-styles-distplot) to make plots of reaction time distributions, one panel for every subject, one color for every condition. 

In [None]:
g = sns.FacetGrid(df, col="subj_idx", ...)
g.map(...)

## HDDM

Now we are going to fit a very simple drift diffusion model. Have a look at this example code.

<code>m_stim = hddm.HDDM(data, depends_on={'v': 'stim'})
m_stim.find_starting_values()
m_stim.sample(2500, burn=1000)</code>

Try to do something similar but don't let drift rate vary over stimulus conditions, but let the threshold parameter $a$ vary over the speed-accuracy conditions (cond)

First make sure that:

* You have a column 'rt' with reaction time in _seconds_
* You have a column 'response' which indicates whether the response was correct or incorrect

In [107]:
import hddm

In [None]:
model = hddm.HDDM(
    ...

7) Now you can print the stats with model.print_stats(). What do you observe?

You can also acces the posteriors of the group-level parameters using the nodes_db:

In [37]:
v_trace = model.nodes_db.ix['v'].node.trace[:]

8) Plot this distribution of posteriors using distplot

9) Can you use similar code to acces the traces for the threshold-parameter $a$ belonging to the different conditions? Plot the distributions in the same figure. What do you see?

Save the results using

In [None]:
results = pandas.DataFrame(model.values.items(), columns=['parameter', 'value'])
results.save('/data/results_hddm.pandas')

Nice. Is the preprocessing already done? Let's go to [part 3](http://54.74.144.112:8889/notebooks/part3_registration.ipynb)