# Basic Brain Decoding: Classification

This notebook provides a simple brain decoding analysis tutorial: **classification of stimulus image categories from fMRI signals**.

## Setup

*For Kamitani lab members:*

Please skip the following cell if you run this notebook in Kamitani lab servers.
This notebook works on the default Python environment of our servers without additional package installation.

In [None]:
!pip install numpy
!pip install scikit-learn
!pip install matplotlib
!pip install hdf5storage
!pip install git+https://github.com/KamitaniLab/bdpy.git

This notebook requires the following Python packages.
Please install them via pip (or conda).

- bdpy
- numpy
- sklearn
- matplotlib
- hdf5storage
- tqdm

### Data

In [None]:
!mkdir data

In [None]:
!curl https://ndownloader.figshare.com/files/28215453\?private_link=3bd9a1c29f19649c8c0d -o data/sub-04_ImageNet12Cat_volume_native_preproc.h5

### Module import

In [None]:
import os

import bdpy
import bdpy.ml
import matplotlib.pyplot as plt
import numpy as np
import sklearn.svm
import sklearn.metrics

## Classification of stimulus image categories

In this notebook, we will classify categories of images from fMRI signals collected from a human subject viewing images.

### Stimuli and experiment design

In the experiment, brain activity was collected with fMRI from a subject viewing static images.
The images were selected from ImageNet.
One of 240 images (20 images/category * 12 categories) was presented for 8 seconds in each trial (flashed at every 500 msec).
Each image presentation trial was initiated by red-flashing of the fixation point.
Each run was composed of 13 image presentation trials including one-back repetition trials.
In one-back repetition trials, the same image presented in the previous trial was shown again.
The subject was required to press a key when the same image was presented again.
This "one-back repetition task" was introduced to keep the subject's attention to the visual images.

12 categories:

1. animal
2. body part
3. cloth
4. dish
5. furniture
6. human
7. indoor
8. natural food
9. outdoor
10. plant
11. tool
12. vehicle

### MRI data acquisition

- Voxel size: 2 mm isotropic
- FOV: 192x192 mm
- 76 slices
- TR: 2 sec
- Multi-band EPI

### Preprocessing

The following preprocessing was applied after the typical preprocessing of fMRI signals with SPM (slice timing correction, motion correction, anatomical-functional coregistration).

- Temporal shifting of samples
- Regressing-out motion parameters, mean subtraction, and linear detrending
- Outlier reduction
- Temporal averaging within blocks (trials)
- Removal of rest and repetition blocks

In [None]:
bdata = bdpy.BData('data/sub-04_ImageNet12Cat_volume_native_preproc.h5')

In this hands-on, we use fMRI responses in the lateral occipital complex (LOC).

In [None]:
fmri_data_loc = bdata.select('ROI_LOC')
print('fMRI data (samples x voxels): {}'.format(fmri_data_loc.shape))

In [None]:
# Stimulus labels
stimulus_labels = bdata.get_label('stimulus_name')
stimulus_labels

In [None]:
category_labels = bdata.get_label('category_name')
category_labels

The aim of this analysis is to predict the category labels from fMRI data.

In [None]:
# Run numbers
runs = bdata.select('Run')
runs

We will evaluate the model performance with cross-validation.
Specifically, we will conduct *run-wise corss-validation* or *leave-one-run-out cross-validation*, in which samples from each run consisute each fold of K-folds cross-validation.

Now everything you need for the decoding analysis is ready. The fMRI data is saved as an array of sample-by-feature (voxels), so you can run the decoding with typical machine learning code.

In [None]:
cvindex = bdpy.ml.cvindex_groupwise(runs)

prediction_accuracy_cv = []

for ind_train, ind_test in cvindex:
    x_train = fmri_data_loc[ind_train, :]
    y_train = np.array(category_labels)[ind_train]
    x_test = fmri_data_loc[ind_test, :]
    y_test = np.array(category_labels)[ind_test]
    
    # Normalization
    norm_mean = np.mean(x_train, axis=0)
    norm_scale = np.std(x_train, axis=0, ddof=1)
    
    x_train = (x_train - norm_mean) / norm_scale
    x_test = (x_test - norm_mean) / norm_scale

    # Model training
    model = sklearn.svm.LinearSVC()
    model.fit(x_train, y_train)

    # Prediction
    y_pred = model.predict(x_test)
    acc = sklearn.metrics.accuracy_score(y_test, y_pred)

    prediction_accuracy_cv.append(acc)
    
prediction_accuracy = np.mean(prediction_accuracy_cv)
print('Prediciton accuracy: {}'.format(prediction_accuracy))

The prediction accuracy is acutally modest but higher than the chance level ($1 / 12 = 0.08$).

## Exercise tasks

**Task 1**: Try classification with fMRI data from early visual areas (V1 + V2 + V3), LOC, FFA, and PPA. Then, plot the prediction accuracies as a bar chart.

In [None]:
rois = ['Early', 'LOC', 'FFA', 'PPA']
rois_select = {
    'Early': 'ROI_V1 + ROI_V2 + ROI_V3',
    'LOC':   'ROI_LOC',
    'FFA':   'ROI_FFA',
    'PPA':   'ROI_PPA',
}

prediction_accuracies = []

# Your code comes here

# Plotting
xticks = list(range(len(rois)))

plt.bar(xticks, prediction_accuracies)

plt.xlim([-1, len(rois)])
plt.xticks(xticks)
plt.gca().set_xticklabels(rois)

plt.ylabel('Prediction accuracy')
plt.ylim([0, 1])

# Chance level
plt.plot([-1, len(rois)], [1 / 12, 1 / 12], color='k', linestyle=':')

**Task 2**: Try another classification methods such as logistic regression. Use fMRI signals in "LOC".

In [None]:
# Your code comes here

**Task 3**: Typically, decoding or machine learning-based analysis of fMRI data is suffered from overfitting due to high dimensionality of the features (voxels). Think of a method to solve the overfitting on fMRI, implement it, and see whether it works or not. Use fMRI signals in "LOC".

In [None]:
# Your code comes here

## References

- Pereira et al. (2009) Machine learning classifiers and fMRI: A tutorial overview. NeuroImage. [doi:10.1016/j.neuroimage.2008.11.007](http://dx.doi.org/10.1016/j.neuroimage.2008.11.007)