# Assignment 3: Decoding and Classification (10 pts in total)

In [12]:
# Load required tools
import warnings
import sys
if not sys.warnoptions:
    warnings.simplefilter('ignore')

# Import neuroimaging, analysis and general libraries
import numpy as np

# Import classification related toolbox
from sklearn import svm
from sklearn.model_selection import cross_val_score

In [13]:
# Download the dataset used in this assignment
!gdown 1LZStu_QhsBErJ5LHv_u9qfF2Duc-E-T_

Downloading...
From: https://drive.google.com/uc?id=1LZStu_QhsBErJ5LHv_u9qfF2Duc-E-T_
To: /content/data_classification.npy
  0% 0.00/1.60M [00:00<?, ?B/s]100% 1.60M/1.60M [00:00<00:00, 24.7MB/s]


### Dataset description
In this experiment,the subjects viewed 4 types of images (conditions) in the fMRI scanner, including outdoor scenes, indoor scenes, animals and tools. The preprocessed fMRI signals from two neural areas _Retrosplenial cortex (RSC)_ and _Lateral occipital complex (LOC)_ are recorded in the dataset dictionary “data_classification.npy”. This dictionary contains 3 arrays, named [ 'label', 'fMRI_RSC',’fMRI_LOC’]. In the fMRI array, each row corresponds to an experimental trial, and each column corresponds to the activation of a voxel within the region when looking at the image. The ‘label’ array contains the condition of each experimental trial.

In [14]:
# Read in data
data_classification = np.load('data_classification.npy',allow_pickle=True).item()
print(data_classification['fMRI_RSC'].shape)
print(data_classification['fMRI_LOC'].shape)

(400, 200)
(400, 300)


#### ✏️ Do it yourself (1 pts):
Now we want to build a classifier to use the fMRI signal to decode the experimental conditions (image label). What is the chance level for the classification?

Write your answer here:
> **Answer:**
>
>   This is a binary classification model. The chance level for this classification is 0.5 (or 50%). This is because with binary classification (2 classes), random guessing would be correct 50% of the time on average.


#### ✏️ Do it yourself (4 pts):
### Build linear classifier with 2-fold CV
1. Build linear support vector machine (SVM) classifier
2. Specify cross-validation
3. Run it with the RSC and LOC data
4. Calculate the mean accuracy for each region

_Hint: use `svm` in the sklearn package to build the classifier. Use `cross_val_score` for cross-validation._

In [18]:
# Fill out the missing codes in this cell


# Specify the SVM classifier
clf = svm.SVC(kernel='linear')

# Calculate cross-validation score (accuracy) for RSC
accScores_RSC = cross_val_score(clf,
                                data_classification['fMRI_RSC'],
                                data_classification['label'],
                                cv=2)
print(np.mean(accScores_RSC))

# Calculate cross-validation score (accuracy) for LOC
accScores_LOC = cross_val_score(clf,
                                data_classification['fMRI_LOC'],
                                data_classification['label'],
                                cv=2)
print(np.mean(accScores_LOC))

0.21250000000000002
0.865


Run the alternative SVM classifier with a non-linear sigmoid kernel. What are the 2-fold cross-validation performances (accuracy) for both regions? Does the SVM classifier with non-linear kernel provide better performance? Why?

#### ✏️ Do it yourself (3 pts):
### Build non-linear classifier with 2-fold CV
1. Build a SVM classifier with a non-linear kernel
2. Run classifier with cross-validation
3. Calculate the mean accuracy for each region

In [22]:
# Fill out the missing codes in this cell

# Specify the SVM classifier
svc = svm.SVC(kernel='sigmoid', C=1)

# Calculate cross-validation score for RSC
SVMscores_RSC = cross_val_score(svc,
                                data_classification['fMRI_RSC'],
                                data_classification['label'],
                                cv=2)
print(np.mean(SVMscores_RSC))

# Calculate cross-validation score for LOC
SVMscores_LOC = cross_val_score(svc,
                                data_classification['fMRI_LOC'],
                                data_classification['label'],
                                cv=2)
print(np.mean(SVMscores_LOC))

0.23249999999999998
0.8025


#### ✏️ Do it yourself (2 pts):
Describe how the performance of SVM classifier with non-linear kernel change compared with linear SVM. Explain why.

Write your answer here:
> **Answer:**
> The performance of SVM classifier with non-linear kernel increased compared with linear SVM because non-linear kernels can map the embedding space to a higher dimension.
>