## **Inside Out ML: Emotion Classification from fMRI Using Parcellation and Projection Techniques**

#### **Group 36 on Canvas**


<div style="text-align: center">
    <img src="./cover-img.png" alt="image" width="600" />
</div>

## **Abstract**

This study explores feature extraction techniques for analyzing high-dimensional, non-independently distributed data in multi-class classification tasks using functional magnetic resonance imaging (fMRI).Extending beyond the scope of CS 6140 Machine Learning, this investigation aims to establish a protocol for applying machine learning and deep learning methods to fMRI data by identifying meaningful brain parcels and data projections for emotion classification. Keywords: computer vision, functional magnetic resonance imaging (fMRI), multi-class classification, feature extraction, emotion recognition

## **Introduction**

Recent years, researchers have investigated the ability of deep learning and machine learning models to analyze neuroimages. Before applying a model to a dataset, due to the high-dimensionality and density of neuroimaging data, practitioners often reduce dimensions using brain parcellation masks. These masks average voxel values over specific brain parcels, to focus analysis by regions of interest (ROI), rather than by voxel, which can go up to a 1 billion.

Given this development, studies have begun to explore best practices for choosing brain parcellations technique for feature extraction. The inspiration for our investigation came from a study that analyzed a teleological approach [B], recommending brain parcellation based on the aim of the study. To expand upon this work. we investigated coefficients of a traditional machine learning model to identify important embeddings and regions of interest for emotion classification.


This paper bridges machine learning methodology with neuroscience application by evaluating how both statistical and domain-specific feature extraction techniques affect multi-class classification of emotional states from fMRI data. Our study results will recomend interdisciplinary researchers specific ROI and embeddings to focus on when studying emotion and mood in neurodegenerative, neurodevelopmental and psychiatric disorders.


## **Data**

The dataset comprises 270 fMRI scans collected from 30 participants by **Northeastern University’s Affective and Brain Sciences Lab**. Each subject was shown emotionally evocative images designed to elicit anger, fear, or disgust. This was followed by a priming word that was either congruent (same emotion as the image), incongruent (different emotion), or neutral (emotionally neutral).

Each subject completed all combinations of emotional state and priming condition (3 emotions × 3 priming types = 9 scans per subject), resulting in 270 scans in total. Each scan is stored as a .nii.gz file and labeled with subject ID, emotion state, and priming condition. The classification targets are emotional state and priming condition, while subject ID—used to model the hierarchical structure—introduces non-independence across samples from the same participant.

Due to the dataset’s size, it is stored in a Hugging Face dataset and loaded locally. In the future, this setup will be modified to support streaming, avoiding the need to store all 270 .nii.gz files locally.

The input data is 4D, with dimensions 270 × 91 × 109 × 91, resulting in 243 million total values (902,629 per scan). This is too large to feed directly into a classifier. Therefore, dimensionality reduction and brain parcellation techniques will be applied, reducing each scan to approximately 20–23 features for downstream modeling.

<div style="text-align: center">
    <img src="./sub_16_anger_neutral.png" alt="image" width="600" />
</div>

## **Implementation**

### **Data Organization**

Here is how the project is structured. Review the `README.md` to understand how to navigate the repo and grade the project deliverables.

<pre>
<strong>emotion-fmri-classification</strong>: project directory
├── <strong>requirements.txt</strong>: libraries and packages to install
├── <strong>README.md</strong>: run and navigation instructions (including this tree structure)
├── <strong>emotion-fmri-neu</strong>: 270 .nii.gz files of fMRI data
├── <strong>workflow.ipynb</strong>: full project pipeline, loading 
├── <strong>output.ipynb</strong>: outputs all information
├── <strong>report.ipynb</strong>: Jupyter Notebook Version of report with comprehensive, complete analysis
├── <strong>report.pdf</strong>: IEEE Version of report with abridged analysis  
├── <strong>presentation.pdf</strong>: slide deck of project overview and analysis  
├── <strong>presentation.mov</strong>: 10 min presentation recording of project 
├── <strong>data</strong>
│   ├── <strong>collinearity</strong>
│   ├── <strong>cost_function_logs</strong>
│   ├── <strong>dimension_reductions</strong>
│   ├── <strong>interpretation</strong>
│   ├── <strong>parcellations</strong>
│   ├── <strong>bivariate_data</strong>
│   ├── <strong>models</strong>
│   ├── <strong>log_reg_results</strong>
│   │   ├── <strong>classification_reports</strong>
│   │   ├── <strong>coefficients</strong>
│   │   ├── <strong>errors</strong>
│   │   └── <strong>metrics</strong>
├── <strong>visualizations</strong>
│   ├── <strong>atlas_maps</strong>
│   ├── <strong>collinearity</strong>
│   ├── <strong>confusion_matrices</strong>
│   ├── <strong>cost_functions</strong>
│   ├── <strong>heteroscedasticity</strong>
│   ├── <strong>bivariate_data</strong>
│   │   ├── <strong>emotion</strong>
│   │   ├── <strong>subject</strong>
│   │   └── <strong>priming</strong>
└── <strong>cover-image.png</strong>: image from Pixar Movie Inside Out of characters Anger, Disgust and Fear
</pre>

### Brain Parcellations

Brain parcellation is a popular technique for its spatial interpretability and computational efficiency. The nature of neuroimaging data is not a tabular structure, but rather spatial and temporal dependencies to show where and when brain activity occurs. Each data point represents a specific voxel activation value which for fMRI can go up to 1 million features, per image. Aside from high dimensionality, these raw images often contain noisy signals that are not related to blood oxygen level dependent (BOLD) signals, which reflect brain activity. Brain parcellation uses predetermined masker objects or atlas maps to segment the high-dimensional voxel data into groups that represent specific regions of the brain. These regions of interest (ROI) are non-overlapping, and produce a structured signal (an average activation value for each ROI) that can be used as a feature and minimize noise [Pereira].

All atlases were accessed using the `nilearn` library, which provides a standardized interface to several widely-used functional neuroimaging templates. Each `.nii.gz` file was first loaded using **Nibabel**, converted to voxel-level arrays, and transformed into 1D feature vectors using the `NiftiLabelsMasker` class in **Nilearn**. The corresponding atlas maps were retrieved via `nilearn.datasets.fetch_atlas_[ATLAS_NAME]`, and stored in a dictionary along with their labels and metadata for reproducibility and interpretability.

The following atlases were used in this study:

---

#### A. Harvard-Oxford Atlases

The Harvard-Oxford atlases are probabilistic anatomical maps created from structural MRI data by the Harvard Center for Morphometric Analysis. They segment both cortical and subcortical regions and are widely used in structural parcellation. The Cortical Atlas (48 ROIs), labeled `harvard_oxford_cort_0_1`, encompasses bilateral cortical structures at a 0% probability threshold with a resolution of 1 mm. The Left Cortical Atlas (96 ROIs), designated as harvard_oxford_cortl_0_1, targets cortical regions in the left hemisphere, providing enhanced spatial detail at the same 1 mm resolution. The Subcortical Atlas (21 ROIs), referred to as harvard_oxford_sub_0_1, includes key subcortical structures such as the thalamus, hippocampus, and amygdala, also at 1 mm resolution.


---

#### B. Talairach Atlases

The Talairach atlas provides multiple anatomical segmentations of the brain based on tissue type and macrostructural organization. Several variants were included:

- **Brodmann Areas (71 ROIs):**  
  `talairach_ba` — Segments the brain into Brodmann areas, commonly used in functional localization.

- **Gyrus Level (55 ROIs):**  
  `talairach_gyrus` — Provides segmentation at the level of cortical gyri.

- **Combined Hemi-Lobe-Tissue (22 ROIs total):**  
  Merged feature set combining:
  - Hemisphere (`talairach_hemi`, 7 ROIs)  
  - Lobe (`talairach_lobe`, 12 ROIs)  
  - Tissue type (`talairach_tissue`, 3 ROIs)  
  Due to their individually low dimensionality, these three were concatenated into a single feature set.

---

#### C. Schaefer 2018 Atlas

- **Schaefer 100 × 17 × 1 (100 ROIs):**  
  `schaefer_100_17_1` — A functional atlas based on resting-state fMRI data, dividing the cortex into 100 regions grouped into 17 large-scale networks, at 1 mm resolution.

---

#### D. AAL (SPM12) Atlas

- **AAL SPM12 (116 ROIs):**  
  `aal_spm12` — The Automated Anatomical Labeling atlas for SPM12, derived from the MNI single-subject T1 template. It is commonly used for whole-brain anatomical segmentation.

---

#### E. Juelich Atlas

- **Juelich 0 × 1 (62 ROIs):**  
  `juelich_0_1` — A probabilistic cytoarchitectonic atlas derived from postmortem histological data, with a threshold of 0 and 1 mm resolution. Distributed via FSL.

---

These atlases enable diverse views of brain organization—ranging from structural anatomy to functional networks—facilitating the extraction of meaningful and compact features for downstream classification models.


### **Dimension Reduction**




#### **Sammon Mapping**

from open source contirbutor: https://pypi.org/project/sammon-mapping

incldued warning about convergence and did not converge

attempted to customize from scratch implementation but did converge in time so decided to remove it from analysis

output, E = sammon_mapping(input_data, d)
                    errors[d] = E


#### **Autoencoder**

coded from scratch but took too long to compute. didn't include sci-kit learn equivalient

#### **Principal Component Analysis**

Best practice for PCA is that the number of components is reduced to the minimum number necessary to explain 99% of variance.

Which in this case is **226 components**. Below you can see where the curve meets the horizontal line at the 0.99 threshold.

fit PCA model
pca = PCA()
X_test_pca = pca.fit_transform(X)

get cumulative explained variance
cumulative_variance = np.cumsum(pca.explained_variance_ratio_)

find the number of components that reach 99% variance
n_components_99 = np.argmax(cumulative_variance >= 0.99) + 1

didn't include from scratch implementation because of explained variance (if had time would've implemented)

<div style="text-align: center">
    <img src="./visualizations/cost_functions/PCA.png" alt="image" width="600" />
</div>

#### **Isomap**

Isomap

reconstruction_error

Skipped d=151: There are significant negative eigenvalues (0.000294068 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 161
Skipped d=161: There are significant negative eigenvalues (0.0013739 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 171
Skipped d=171: There are significant negative eigenvalues (0.00256321 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 181
Skipped d=181: There are significant negative eigenvalues (0.00432735 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 191
Skipped d=191: There are significant negative eigenvalues (0.00599045 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 201
Skipped d=201: There are significant negative eigenvalues (0.00808176 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 211
Skipped d=211: There are significant negative eigenvalues (0.0108316 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 221
Skipped d=221: There are significant negative eigenvalues (0.0138365 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 231
Skipped d=231: There are significant negative eigenvalues (0.0174674 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 241
Skipped d=241: There are significant negative eigenvalues (0.0238572 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 251
Skipped d=251: There are significant negative eigenvalues (0.0332415 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.
Round: 261
Skipped d=261: There are significant negative eigenvalues (0.0524513 of the maximum positive). Either the matrix is not PSD, or there was an issue while computing the eigendecomposition of the matrix.

#### **Hessian Eigenmapping**

For Heissan Eigenmapping the number of neighbords must be greater than $$ \frac{n \cdot (n + 3)}{2}$$. Because there are in total 270 data instances, the maximum number of neighbors = n such that $$270 \leq \frac{n \cdot (n + 3)}{2}$$. This formula can be rewritten as $$n^2 + 3n - 540$$ and solved using the quadratic formula as I did below. The maximum components we can use Hessian Eigenmapping for this given dataset is 21 where the number of neighbors is 262.

def quadratic_formula(a, b, c):
    return (-b + ((b**2 - 4*a*c) **0.5)) / 2*a

d = quadratic_formula(1, 3, -540)
int(21 * (21 + 4) / 2), d

LocallyLinearEmbedding with method set to hessian

#### **Modified Locally Linear Kernel**

LocallyLinearEmbedding with method set to modified

#### **Multidimensional Scaling**

MDS dissimilarity='euclidean', normalized_stress='auto'

stress_

#### **t-Distributed Stochastic Neighbor Embedding**

t-SNE cannot expand beyond more than 3 dimensions so that is the maximum set here.

coded from scratch but took too long to compute. didn't include sci-kit learn equivalient (for consistency just decided on importing scikit learn packages for everything included personal from scratch code for reference on attempts. 

### **Exploratory Data Analysis**

#### **Multicollinearity**

correlation matrices visualized as heatmaps for parcellation / projection methods that produced less than 30 input features (anything more was harder to visualize but may be considered in the future. Although Logistic Regression 

make this 1 dataset instead of 3 due to the very low dimesnional input space: & + 3 + 12 = 22 features

calculate VIF values, drop features until all VIF < 5 (standard practice) calculate the % of features dropped to see how the parcellation and projection techniques overlap

visualize percentages of multicollineratiry in feature extraction methods in bar charts, grouping parcellation methods and dimensionality reduction methods

identify which methods produce the most multicollinearity 



#### **Homogeneity of Variance**
levene test for 

#### Auto correlation

plot every feature against the subject variable to see the correlation of each feature to the subject. can see which parcellation/projection techniques are likely to not perform well due to the overdependence on the subject number

### **Multi-Classification**

#### **Hierarchical K-Fold Cross Validation**

#### **Logistic Regression**

### **Results**

shows what the top coefficients were for each parcellation method and the magnitude for it was
then show which features were the top overall and do some little bit on the research adn waht it likely means

relationship between multico and prediction as well as HEV and prediction

explain why it makes sense for none of the features from dimensionality reduction methods to be collinear

3 tables

9 figures
- only include tsne for the separability because none of these look separable
- also include one where it sows the different 1:1 positive correlaiton relationship of some of the variables to subject number

categorize by analysis type and what you learned

### **Conclusion**

include the rest of the images i think

axes are abstract and harder to map back to brain regions

Interpretability: Highly interpretable. If you select "Activity in left amygdala" and "Activity in right prefrontal cortex," you know exactly what each axis represents in physiological terms.

mean averages across categories and then feeding that into a model may be better than we think

#### Future Work

train the model without subject included

Laplacian Eigenmaps as done in [D]

```
BEFOREEE


```
neater organization of files
- separate test and train and parcellation and projections

### **Appendix**

include the rest of the images i think

### **Acknowledgements**

Thank you to Professor Ahmad for a great semester!

### **Reference**

Works Cited

Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in neuroinformatics, 8, 14.

Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: a tutorial overview. Neuroimage, 45(1), S199-S209.