# Decoding

Updated: 2020-07-01

This notebook provides a brief example of fMRI decoding analysis.

In [None]:
# Additional setup for Google Colab
!pip install numpy==1.17
!pip install scipy==1.1
!pip install bdpy

In [1]:
# Setup for Google Colaboratory
#from google.colab import drive
#drive.mount('/content/drive')

#data_dir = '/content/drive/My Drive/path/to/data/directory'

# Setup for local run
data_dir = './data'

In [2]:
import os

import bdpy
import bdpy.ml
import matplotlib.pyplot as plt
import numpy as np
import sklearn.svm
import sklearn.metrics
import tqdm

## fMRI data

The fMRI data used in this notebook is colected in [Shen, Horikawa, Majima & Kamitani (2019) Deep image reconstruction from human brain activity. PLOS Comput Biol](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006633).

In brief, the subject was presented natural images selected from ImageNet during fMRI scans.
1200 images from 150 object categories (synsets) were used as the stimuli (thus, 8 images/category).
50 images in different categories were presented in each run (with 5 one-back repetition trials).
Thus, it took 3 runs to cover the 150 categories, and 24 runs to present all the images.
In the original study, each image was presented for five times.
So it took 120 runs in total.
The entier data contains 6000 samples (1200 images x 5 presentations).
Please see the paper for more details.

In this notebook, we use the subset of the orignal data: fMRI data for stimuli from 10 categories are used.
Thus, the example data contains 400 samples (8 images/categories x 10 categories x 5 presentations).

The file `sub-04_task-ImageNetTraining_bold_preproc_native_VC_10cat.h5` containes preprocessed fMRI data for 24 runs from one subject.
After slice timing correction, motion correction, and coregistration, the fMRI data

The full dataset is available at <https://openneuro.org/datasets/ds001506/> (raw data) or <https://figshare.com/articles/Deep_Image_Reconstruction/7033577> (preprocessed data).

In [3]:
bdata = bdpy.BData(os.path.join(data_dir, 'sub-04_task-ImageNetTraining_bold_preproc_native_VC_10cat.h5'))

In this example, we use fMRI responses in the lateral occipital cortex (LOC), which underlies object recongition.

In [4]:
# fMRI data in the lateral occipital cortex (LOC)
fmri_data_loc = bdata.select('ROI_LOC')
print('fMRI data (samples x voxels): {}'.format(fmri_data_loc.shape))

fMRI data (samples x voxels): (400, 3066)


In [5]:
# Stimulus labels
stimulus_labels = bdata.get_label('stimulus_name')
#stimulus_labels

In [6]:
# Run numbers
runs = bdata.select('Run')

# Regroup runs for cross-validation
runs_groups = (runs + 2) // 3
#runs_groups

## Classification of object categories

Here we try decoding of object categories from the fMRI data.
We use linear SVM for the classifier and evaluate prediction performance with leave-three-run-out cross-validation since the three runs cover the all 150 categories.

First, we convert stimulus labels (e.g., `n01639765_47681`) to cateogry labels (e.g., `n01639765`).
The liberal before `_` in the stimulus labels is ImageNet synset ID representing the category.

http://www.image-net.org/synset?wnid=n01639765

In [7]:
# Convert stimulus labels to category labels
category_labels = np.array([
    lb.split('_')[0]
    for lb in stimulus_labels
])
len(np.unique(category_labels))  # This should be 10.

10

In [8]:
cvindex = bdpy.ml.cvindex_groupwise(runs_groups)

prediction_accuracy_cv = []

for ind_train, ind_test in tqdm.tqdm(cvindex):
    x_train = fmri_data_loc[ind_train, :]
    y_train = category_labels[ind_train]
    x_test = fmri_data_loc[ind_test, :]
    y_test = category_labels[ind_test]
    
    # Normalization
    norm_mean = np.mean(x_train, axis=0)
    norm_scale = np.std(x_train, axis=0, ddof=1)
    
    x_train = (x_train - norm_mean) / norm_scale
    x_test = (x_test - norm_mean) / norm_scale

    # Model training
    model = sklearn.svm.LinearSVC()
    model.fit(x_train, y_train)

    # Prediction
    y_pred = model.predict(x_test)
    acc = sklearn.metrics.accuracy_score(y_test, y_pred)

    prediction_accuracy_cv.append(acc)
    
prediction_accuracy = np.mean(prediction_accuracy_cv)
print('Prediciton accuracy: {}'.format(prediction_accuracy))

40it [05:33,  8.34s/it]

Prediciton accuracy: 0.415





## Exercise

Try the classification with fMRI data in V1 (`ROI_V1`).

[Optional] Compare the decoding accuracy between LOC and V1 and discuss the decoding accuracy from the two brain regions are different.

In [None]:
# Your code comes here

Your answer comes here:

Try another classification methods such as logistic regression.

In [None]:
# Your code comes here

[Optional] Typically, decoding or machine learning-based analysis of fMRI data is suffered from overfitting due to high dimensionality of the features (voxels). Think of a method to solve the overfitting on fMRI, implement it, and see whether it works or not.

In [None]:
# Your code comes here