# SMBFMRI 
(Simultaneous model-based fMRI)

This is inspired, with some small modifications, by the work int hese two papers: 

Turner, B. M., Forstmann, B. U., Wagenmakers, E.-J., Brown, S. D., Sederberg, P. B., & Steyvers, M. (2013). A Bayesian framework for simultaneously modeling neural and behavioral data. NeuroImage, 72, 193–206. doi:10.1016/j.neuroimage.2013.01.048
Turner, B. M., van Maanen, L., & Forstmann, B. U. (2015). Informing cognitive abstractions through neuroimaging: The neural drift diffusion model. Psychological Review, 122(2), 312–336. doi:10.1037/a0038894

In traditional model-based fMRI (e.g. O’Doherty, J. P., Hampton, A., & Kim, H. (2007) doi:10.1196/annals.1390.022), one estimates cognitive model parameters from behavioral data, and then attempts to estimate neural correlates of these model parameters from fMRI data. The use of point estimates of cognitive model parameters overestimates confidence in the combined model, while worsening the estimate. A particularly notable deficiency is that model-based fMRI makes it impossible to estimate trial-by-trial changes of cogitive model parameters without an explicit trial-by-trial learning model. 

A better approach is to simultaneously estimate the cognitive model parameters and their neural correlates. We do so in a Bayesian setting. We specify 4 things: a dimension reduction model for brain data, a neural model for the trial-by-trial distribution of neural data, a cognitive model for behavioral data, and a model of their connection. The cognitive model serves as a "dimension augmentation" model for behavioral data, inferring some latent properties from the distribution of behavioral responses. 

In the Turner et al. papers, the dimension reduction is done independently, so they use point estimates of the features rather than distirbutions. Conveniently this means we can directly reuse feature extraction pipelines like TFA for doing this off-the-shelf. That said, the right way to do this would be in a semi-supervised way where the dimension reduction is informed by the rest of the model. 

We apply this here to the NYU slow event-related flanker task, which allows us to use wiener first passage time distributions to model behavioral response times and choices. The dataset is available at https://openfmri.org/dataset/ds000102/, though ideally for this example we would find a dataset in the examples provided with some existing package (but is has to be a simple behavioral task like random dots, stroop/flanker, etc). 

Here is our model, for a single subject only: 

$$
y_{i} \sim WFPT(\psi)\\
b_{i} \sim \mathcal{N}(\phi, S)\\
[\psi, b] \sim \mathcal{N}(\mu, \Sigma)
$$

Here $i$ indexes behavioral trials (and TRs during which they occur -- since this is an event-related design, the mapping is trivial). $y$ is a tuple of RT and choice, drawn from the wiener first passage time ("DDM") distribution with DDM (cognitive) parameters $\psi$. $b_i$ is some dimension-reduced representation of the fMRI data for the $i$th TR (in our case, the output from TFA, though it could also be ICA/PCA). In a perfect world, we would explicitly model the conditional distribution of $b$ and the raw neural data here as well. 

To do this, first we need to load data and do TFA: 

In [None]:
from nilearn.input_data import NiftiMasker
import numpy as np
import pandas as pd
import nibabel as nib
from brainiak.factor_analysis.tfa import TFA
from wfpt.wfpt import wfpt_logp
from scipy.stats import multivariate_normal
import logging
import sys
import scipy.optimize as op

logger = logging.getLogger(__name__)
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
np.set_printoptions(precision=2)

# modified from tfa extract example
def extract_data(nifti_file, mask_file=None, zscore=True, detrend=True, smoothing_fwmw=False):
    if mask_file is None:
        #whole brain, get coordinate info from nifti_file itself
        mask = nib.load(nifti_file)
    else:
        mask = nib.load(mask_file)
    affine = mask.get_affine()
    if mask_file is None:
        mask_data = mask.get_data()
        if mask_data.ndim == 4:
            #get mask in 3D
            img_data_type = mask.header.get_data_dtype()
            n_tr = mask_data.shape[3]
            mask_data = mask_data[:,:,:,n_tr//2].astype(bool)
            mask = nib.Nifti1Image(mask_data.astype(img_data_type), affine)
        else:
            mask_data = mask_data.astype(bool)
    else:
        mask_data = mask.get_data().astype(bool)

    #get voxel coordinates
    R = np.float64(np.argwhere(mask_data))

    #get scanner RAS coordinates based on voxel coordinates
    if affine is not []:
        R = (np.dot(affine[:3,:3], R.T) + affine[:3,3:4]).T

    #get ROI data, and run preprocessing
    nifti_masker = NiftiMasker(mask_img=mask, standardize=zscore, detrend=detrend, smoothing_fwhm=smoothing_fwmw)
    img = nib.load(nifti_file)
    all_images = np.float64(nifti_masker.fit_transform(img))
    data = all_images.T.copy()

    #save data
    return data, R


# get brain data
neurodata1,R1 = extract_data('PATH_TO_DOWNLOADED_DATA.nii.gz')

# to make TFA quick for testing, only do it on the TRs where we have behavior. To do this, first load behavior: 
behavdata = pd.read_table('PATH_TO_DOWNLOADED_DATA.tsv')

# figure out which neurodata had behavior in it
behavdataTRs = (behavdata.onset/2).values.astype(int)
# subset neuro data
neurodata = neurodata[:,behavdataTRs]

# Run TFA
n_voxel, n_tr = neurodata.shape
tfa = TFA(K=5, max_num_voxel=int(n_voxel*0.5), max_num_tr=int(n_tr*0.5), verbose=True)
tfa.fit(neurodata, R)
# get weights (and transpose to have standard format)
brainWeights = tfa.W_.T

# extract behavior data
behavdata = np.array([behavdata.response_time.values,(behavdata.correctness=='correct').values.astype(int), behavdata.StimVar.values])

DDM parameters have finite support. Infinite support makes it much easier to sample and optimize, so we write some transformations here (maybe these should live in wfpt_py?)

In [None]:
# get a vec'd likelihood
wfpt_logp_vec = np.vectorize(wfpt_logp)

def wfptParamTransform(x0, t0, cong_a, incong_a, cong_z, incong_z):
    # go from infinite support to where we actually want to be
    t0 = np.exp(t0)
    cong_a = np.exp(cong_a)
    incong_a = np.exp(incong_a)
    cong_z = np.exp(cong_z)
    incong_z = np.exp(incong_z)
    return x0, t0, cong_a, incong_a, cong_z, incong_z

def wfptParamUntransform(x0, t0, cong_a, incong_a, cong_z, incong_z):
    # go from infinite support to where we actually want to be
    t0 = np.log(t0)
    cong_a = np.log(cong_a)
    incong_a = np.log(incong_a)
    cong_z = np.log(cong_z)
    incong_z = np.log(incong_z)
    return x0, t0, cong_a, incong_a, cong_z, incong_z


Now construct the likelihood. 

In [None]:
def modelLik(p, neurodata, behavdata):
    # unpack parameters
    x0, t0, cong_a, incong_a, cong_z, incong_z, *neuroparams = p
    # transform them into positive support as needed
    x0, t0, cong_a, incong_a, cong_z, incong_z = wfptParamTransform(x0, t0, cong_a, incong_a, cong_z, incong_z)

    logger.debug("behav params: x0=%f, t0=%f, cong_a=%f, incong_a=%f, cong_z=%f, incong_z=%f" % (x0, t0, cong_a, incong_a, cong_z, incong_z))

    # 5 TFA centers so neuroparams will have a length-5 vector
    neuromean = np.array(neuroparams[0:5])
    # next, cholesky of 5x5 covariance matrix (to ensure positive semidefinite)
    neuroChol = np.zeros((5, 5))
    neuroChol[np.triu_indices(5, 0)] = np.array(neuroparams[5:20])
    neuroCov = neuroChol.T @ neuroChol

    logger.debug("neuroDet %f" % np.linalg.det(neuroCov) )
    # if det is 0 we have singular covariance, return -inf? ideally we would never get here
    if np.linalg.det(neuroCov) <= 1e-5:
        logger.debug("singular neurocov!")
        logger.debug(neuroChol)
        logger.debug(neuroCov)
        return -np.inf

    # next the hyperprior will have a 11-place mean
    hypermean = np.array(neuroparams[20:31])
    # hypercov is the rest
    hyperChol = np.zeros((11,11))
    hyperChol[np.triu_indices(11, 0)] = np.array(neuroparams[31:])
    hyperCov = hyperChol.T @ hyperChol
    logger.debug("hyperDet %f" % np.linalg.det(hyperCov) )

    # again if det is 0 return -inf
    if np.linalg.det(hyperCov) <= 1e-5:
        logger.debug("singular hypercov!")
        logger.debug(hyperChol)
        logger.debug(hyperCov)
        return -np.inf

    # unpack behav data
    rts, accs, conds = behavdata
    
    # dispense paraams according to condition
    a = np.where(conds==1, cong_a, incong_a)
    z = np.where(conds==1, cong_z, incong_z)
    
    # ddm likelihood
    ddmlik = np.sum(wfpt_logp_vec(rts, accs, x0, t0, a, z, eps = 1e-10))
    
    # if ddm likelihood is 0 might as well return here
    if not np.isfinite(ddmlik):
        return -np.inf

    # neuro likelihood
    neurolik = np.sum(multivariate_normal.logpdf(neurodata, neuromean, neuroCov))

    # hyper likelihood
    hyperlik = multivariate_normal.logpdf(np.r_[x0, t0, cong_a, incong_a, cong_z, incong_z, neuromean], hypermean, hyperCov)

    logger.debug("ddmlik %f, neurolik %f, hyperlik %f" % (ddmlik, neurolik, hyperlik))

    return ddmlik + neurolik + hyperlik


Emcee recommends finding the MLE to initialize the samplers, so we try to do this here:

In [None]:
nll = lambda *args: -modelLik(*args)

# generate initial guess
psi0 = np.array([0.1, 0.3, 1, 0.9, 1.5, 1])
neuromean0 = np.zeros(5)
neurocov0 = np.eye(5)[np.triu_indices(5, 0)].flatten()
hypermean0 = np.zeros(11)
hypercov0 = np.eye(11)[np.triu_indices(11, 0)].flatten()

x0 = np.r_[wfptParamUntransform(*psi0), neuromean0, neurocov0, hypermean0, hypercov0]

# test initial guess
modelLik(x0, brainWeights, behavdata)

# find MLE
result = op.minimize(nll, x0, args=(brainWeights, behavdata))

Now we do MCMC! 

In [None]:
import emcee

ndim, nwalkers = x0.shape[0], 350
# initial condition is MLE plus gaussian ball
pos = [result['x'] + 0.1*np.random.randn(ndim) for i in range(nwalkers)]

# create sampler
sampler = emcee.EnsembleSampler(nwalkers, ndim, modelLik, args=(brainWeights, behavdata))

sampler.run_mcmc(pos, 500)