> credits: originally by Gael Varoquaux
> 
> shamelessly ~~stolen~~ adapted by: Chris Holdgraf

In [None]:
import warnings
warnings.filterwarnings("ignore")
import pylab as plt
import os.path as op
path_data = op.join(op.expanduser('~'), 'nilearn_data/')
%matplotlib inline

# About this tutorial
This is a quick introduction to neuroimaging decoding with `nilearn`. It covers the following topics:

* Loading a neuroimaging dataset suitable for decoding
* Extracting an ROI and vectorizing our 3D data so it may be passed to a `sklearn` object
* Using pandas to munge some data
* Fitting a `sklearn` classifier on our neuroimaging data
* Visualizing classifier weights on the subject's brain

  > Note: A lot of the material in this tutorial is drawn from the `nilearn` collection of examples. Open-source packages can be a great way to learn both about a package, and about the things that package tries to do (e.g., machine learning).

  > In addition, many `nilearn` developers have recently released a paper covering the topics of decoding brain states with fMRI (and other modalities).

  > * [Link to original `nilearn` tutorial](https://nilearn.github.io/auto_examples/plot_decoding_tutorial.html#sphx-glr-auto-examples-plot-decoding-tutorial-py)
  > * [Link to Varoquaux decoding paper](https://arxiv.org/pdf/1606.05201v2.pdf)

# A introduction tutorial to fMRI decoding (brain -> world)
Thus far we've covered the general topics of machine learning, but how do they apply specifically to neuroscience data? 

This is a short tutorial on decoding with nilearn. It reproduces the
Haxby 2001 study on a face vs cat discrimination task in a mask of the
ventral stream.

* [Here's a link](http://www.pymvpa.org/datadb/haxby2001.html) to the Haxby 2001 Dataset (w/ a link to the paper too)

In a decoding model, we ask "what information about the world can we predict using brain activity?" These are often called "backward models" as they run counter to the flow of time (for most experimental setups).

Examples of **inputs** to a decoding model include:
* Mean voxel activity in each trial
* Full timecourses of voxel activity (e.g., with continuously-varying stimuli)

Examples of **outputs** from a decoding model include:

* Experimental conditions
* Muscle movement
* Stimulus features

## What's the difference between encoding and decoding?

That is...a complicated question. The simplest answer is:

* Encoding == forward model == world -> model -> brain
* Decoding == backward model == brain -> model -> world

People disagree about when you should use one vs the other, and what you can interpret from each. This lecture won't try to answer any of those questoins (we'd need several beers for this). However, here is a list of useful papers covering this topic:

* [Weichwald et al, 2015](https://arxiv.org/pdf/1511.04780.pdf)
* [Kay, 2017](http://www.sciencedirect.com/science/article/pii/S1053811917306638?via%3Dihub)
* [Naselaris, 2012](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3037423/)


# Preparing the data

## Retrieve and load the fMRI data from the Haxby study

The `nilearn.datasets.fetch_haxby` function will download the
Haxby dataset if it's not present on the disk. It'll put this in the `nilearn` data directory and only needs to be downloaded once.

In [None]:
from nilearn import datasets
from nilearn import plotting

# By default the 2nd subject will be fetched
haxby_dataset = datasets.fetch_haxby(data_dir=path_data)
fmri_filename = haxby_dataset.func[0]

# print basic information on the dataset
print('First subject functional nifti images (4D) are at: %s' %
      fmri_filename)  # 4D data

In [None]:
mask_filename = haxby_dataset.mask_vt[0]
plotting.plot_roi(mask_filename, bg_img=haxby_dataset.anat[0],
                  cmap='Paired')

In [None]:
from nilearn.input_data import NiftiMasker
# Load the mask from disk
masker = NiftiMasker(mask_img=mask_filename, standardize=True)

# Fitting the transformer initializes it to operate on new data
masker.fit(fmri_filename)

In [None]:
# Now we'll transform our fMRI data
fmri_masked = masker.transform(fmri_filename)

The variable "fmri_masked" is a numpy array. It is 2-D.

In [None]:
print(fmri_masked)

Its shape corresponds to the number of time-points x the number of
voxels in the mask. Note that this is much fewer than the total number of voxels in the nifty image.

In [None]:
print(fmri_masked.shape)

## Load the behavioral labels

Now that we have our vectorized fMRI activity, we need labels for the state of the experiment in order to fit our classifier. The behavioral labels are stored in a CSV file, separated by spaces.

We use `pandas` to load them in an array. This is a library that is excellent for representing and manipulating tabular data. It's got a steep learning curve but is very powerful.

In [None]:
import pandas as pd
import numpy as np

# Load target information as string and give a numerical identifier to each
labels = pd.read_csv(haxby_dataset.session_target[0], delimiter=' ')
print(labels.head())

It looks like labels has the same length as our fMRI data, meaning that they share the same time-base.

In [None]:
print(labels.shape)
print(fmri_masked.shape)

Next, we'll retrieve the behavioral targets from the labels. These will be the "classes" that we attempt to predict.

Note that these labels aren't integers like before. That's fine - `sklearn` will try to be clever and convert these into integer representations when we fit the model.

In [None]:
print(labels['labels'].values[:50])

## Restrict the analysis to cats and faces

As we can see from the targets above, the experiment contains many
conditions. Today we'll restrict the decoding to two categories of interest: cats and faces.

To do this, we'll use `pandas` to create a mask corresponding to these categories, and then use it to extract only the rows we care about.

In [None]:
# Create a mask w/ Pandas
condition_mask = labels.eval('labels in ["face", "cat"]').values

# Create a mask w/ Numpy
# condition_mask = np.logical_or(target == b'face', target == b'cat')

In [None]:
# We apply this mask along the "samples" axis to restrict the
# classification to the face vs cat discrimination
fmri_masked = fmri_masked[condition_mask]
targets = labels[condition_mask]['labels'].values

Note that we now have fewer samples.

In [None]:
print(fmri_masked.shape)

# Decoding with an SVM

Now we have all the components we'll need to fit a model. We have:

* Masked a subset of voxels in which we are interested.
* Vectorized those masked voxels so that they have shape (n_samples, n_features). **This is `X`**
* Extracted a set of labels, one for each timepoint, corresponding to the stimulus being shown at that time. **This is `y`**
* Masked our time dimension so that we only have two classes of interest.

Now we'll fit our model. As before, we'll use the [scikit-learn](http://www.scikit-learn.org>)  toolbox on the fmri_masked data.

As a decoder, we'll use a Support Vector Classification, with a linear
kernel.

In [None]:
from sklearn.svm import SVC
svc = SVC(kernel='linear')
print(svc)

As our data is already in the shape for `sklearn`, we can quickly fit the model.

In [None]:
svc.fit(fmri_masked, targets)

## Predicting with / Scoring our model
Machine learning is all about **prediction**. As such, we'll use our fit model to make a prediction about the class of some data, given the structure the model has already found:

In [None]:
# Here we'll predict with the same input training data
prediction = svc.predict(fmri_masked)
print(prediction)

`sklearn` has a number of functions for defining the "score" of a model. The proper one to use depends on the nature of your model and data. Here we'll use a simple scorer for classification. These functions expect two arrays:

* The array of "true" classes for each sample
* The array of predicted classes given our model.

In [None]:
from sklearn.metrics import accuracy_score

print(accuracy_score(targets, prediction))

Wow, a score of 100%! Ship it off to *Nature*, right? Not quite yet. This is because you should **always make predictions on data that hasn't been used to fit the model**.

## Validating our model

The proper way to measure error rates or prediction accuracy is via
cross-validation: leaving out some data and testing on it. There are many ways to do this.

Here we'll do this by **manually leaving out datapoints** during fit. We'll set them aside and use them to score the model.

Let's leave out the 30 last data points during training, and test the
prediction on these 30 last points:

In [None]:
svc.fit(fmri_masked[:-30], targets[:-30])

prediction = svc.predict(fmri_masked[-30:])
print((prediction == targets[-30:]).sum() / len(targets[-30:]))


However, this seems unfortunate. We've now got 50% less data in order to fit the model. Ideally, we'd like to do two things:

* Validate our model properly (aka, on held-out data not used in model fitting)
* Use as much of our data as possible.

It is difficult to satisfy both of these conditions properly, but *cross-validation* is one way of getting closer to this goal. 

For more details on this, check out the [cross validation notebook](01.5-cross_validation.ipynb).

## Inspecting the model weights


Finally, now that our model has been fit to data, and validated on held-out data, it may be useful to inspect and display the model weights. This is often used to understand the voxels that were particularly important in discriminating these two classes.

`sklearn` models that are linear store their weights in an attribute called `coef_`. We'll look at this below. Note that there is one weight per feature (in this case, voxels)

In [None]:
coef_ = svc.coef_
print(coef_[:5])

In [None]:
print(svc.coef_.shape)
print(fmri_masked.shape)

### Turning the weights into a nifti image
Using our `NiftiMasker`, we can collect these weights and reshape them back into the 3-D space of our fMRI data.

We need to turn it back into a Nifti image, in essence, "inverting"
what the NiftiMasker has done.

For this, we can call inverse_transform on the NiftiMasker:

In [None]:
coef_img = masker.inverse_transform(coef_)
print(coef_img)

coef_img is now a NiftiImage. Essentially, this is like the statistical "t-maps" that we've visualized before. 

We can save the coefficients as a nii.gz file:

In [None]:
coef_img.to_filename('haxby_svc_weights.nii.gz')

## Visualizing our weights

Finally, we can plot the weights that the model found, using the subject's anatomical as a background.

In [None]:
from nilearn.plotting import plot_stat_map, show

plot_stat_map(coef_img, bg_img=haxby_dataset.anat[0],
              title="SVM weights")

# Recap
Above we've shown a sample pipeline for fitting a linear classifier using fMRI activity and object classes. We've covered a few of the basics in the machine learning pipeline, and shown how `nilearn` and `sklearn` complement one another.

Much more information about these functions, and the options available for machine learning, can be found in the `nilearn` documentation.