# Music Brain Project

*Can we guess how you will react to music?*


In the Spring of 2019, me (Ted Lewitt), Ben Hahn and Jack Elliott partnered with the Brain and Creativity Institute at USC to use deep learning methods to predict patients response to instrumental music.

### The Dataset

36 Patients went into a MRI and listened to 5 minute long instrumental songs that were either meant to be uplifting or sad. The MRI recorded their brain activations and the patients used a sliding scale to represent how they felt while listening to the music. 

![fd](Images/glass_brain.png "A sample MRI image")

### Our Hypothesis

We can use a recurrent nueral network to predict how a patient felt at a moment in time, based off the MRI data from that moment. 


In [None]:
import pandas as pd
import numpy as np
import nibabel as nib
import helperFunctions as hf
import keras
from keras import metrics
from keras.models import Sequential
from keras.layers import Activation,Dense
from keras.layers.recurrent import LSTM
import nilearn
import os
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

## Preprocessing

**Data Types** <br>

+ The MRI data file is known as fMRI and is stored in a NIFTI file (suffix .nii). It is a time series of MRI data, with each time step being a 3D volumetric snapshot of brain activations.

+ The slider data was a text file with timestamps and activations, on a scale of 0 to 127, with 127 being very happy and 0 being very sad.

### Preprocessing the Slider Data

**Initial Steps**

- The fMRI data was at 1 Hz but the slider data was at 30 Hz, so we had to down-sample the slider data to match the fMRI data.
- We scaled the slider data to be on a scale of (0,1) from (0,127).

- Human brains take an average of 6 seconds to process the music, so we need to add a 6 second delay between the slider data and the MRI image.

**Bucketing the Data & Switching from Regression to Classification** <br>

We were having issues with regressing the data (see our conculsions section for more info) so we decided to switch to a classsification problem using data bucketing.
![Bucket](Images/binning.png "An example of bucketing data")

Data bucketing transforms a continous variable (our slider data) into a discrete variable by taking all the continous data that lies in a certain area and lumping it all together in one bucket. In our example, all the continous slider data above 67.5 can be bucketed into the **Happy** bucket and all the data below 67.5 is bucketed into the **Sad** bucket. This allows us to use one hot encoding to represent the different classes, depending on how many buckets we decide to use.

In [1]:
NUM_BUCKETS = 2 # or 4
def preprocess_slider_data(data_file):
    data = load_data()
    down_sampled_data = down_sample(data)
    bucket_data = bucket(down_sampled_data,NUM_BUCKETS)
    return bucket_data
def load_data(file):
    
def down_sample(array,NUM_BUCKETS):

def bucket(data,buckets):


IndentationError: expected an indented block (<ipython-input-1-263b007d4f18>, line 9)

### Preprocessing the MRI Data
We used the niLearn package to transform our 4D fMRI (time steps, 3 spatial dimensions) to a 2D time series (time steps,
features). The initial shape of the input tensor was (495,100,90,100) and we wanted it to be (495,#Features) so it could be fed into our Tensorflow LSTM.

NiLearn has a class called NiftiMasker that does exactly that, transforming our data into an (495,250000) shaped time series. (This number of features is incedibly high we know, see our conclusions for more). <br>
![dg](Images/masking1.jpg "Illustration of Nifti Masker")


In [None]:
from Ipython import Image
Image

In [None]:
def create_time_series(mri_file):
    nifti_masker = nilearn.input_data.NiftiMasker(standardize=True, mask_strategy='template')
    indexer = [i for i in range(0,495-DELAY)]
    #Split the fMRI data into 495 consecutive images
    result=nilearn.image.index_img(mri_file,indexer)
    nifti_masker.fit(result)
    masked=nifti_masker.transform(result)
    masked=np.array(masked)

To keep track of which label data corresponded to which MRI data, the patients given an ID, so we load the data based on the ID.
This step does all the preprocessing and will take a long time.

In [None]:
txt_files_dir = ""
mri_files_dir = ""

for count in range(1,40):
    if count<10:
        count=str(0)+str(count)
    
    #First we preprocess the nii file
    niifile="patient_"+str(count)+".nii"
    labelfile = "patient_"+str(count)+".txt"
    fullfile = os.path.join(txt_files_dir,labelfile)
    fullNii=os.path.join(mri_files_dir,niifile)
    try:
        nii, sizeValue = create_time_series(fullNii)        
        label = np.array(preprocess_slider_data(fullfile)
    except FileNotFoundError:
        print("Incomplete Data for Patient %d" % count)
                         

nii_array=np.array(nii_array)
label_array=np.array(label_array)
    

In [None]:
totalFiles=len(label_array)

train_labels, test_labels, train_nii, test_nii=train_test_split(label_array, nii_array, train_size=.75, test_size=.25)

train_labels = np.array(train_labels)
train_nii = np.array(train_nii)
test_labels = np.array(test_labels)
test_nii = np.array(test_nii)

### Hyperparameters

These are the optimal hyperparameters we found for our model.
See PostProcessing for an discussion on other hyperparameters we explored for our model

In [None]:
EPOCHS = 4
BATCH_SIZE = 5
if NUM_BUCKETS == 2: 
    LOSS = 'binary_cross_entropy'
    METRICS = [metrics.binary_accuracy]
else:
    LOSS = 'categorical_crossentropy'
    METRICS = [metrics.categorical_accuracy]
OPTIMIZER = 'Adam'
inputShape = (495-DELAY,sizeValue)


## Training

We decided to use the Keras implementation of the Long Short-Term Memory RNN becuase of the time series nature of the data. Also the human brain has the memory of the whole song when forming emotions, so we needed a model that also had this capability. The strength of the LSTM to capture both short term trends and keep a longer memory suited our needs well.

To predict a bucket over all 495 time steps at the same time we added the TimeDistributed Dense Layer. This means the model's output was a (495,NUM_BUCKETS) matrix with one hot encoding fo the predicted bucket.

ADD PICTURE FROM SLIDE 6 OF CAIS FINAL PROJECT

In [None]:
#Model Initilization
model=Sequential()

model.add(LSTM(units=inputShape[0], activation='tanh', dropout=.15, input_shape=inputShape, return_sequences=True))

model.add(keras.layers.TimeDistributed(Dense(NUM_BUCKETS,activation='softmax')))

print(model.summary())

In [None]:
## Training
model.compile(loss=LOSS,optimizer=OPTIMIZER, metrics=METRICS)

model.fit(train_nii,train_labels,epochs=EPOCHS,batch_size=BATCH_SIZE)

In [None]:
#Evaulation
score=model.evaluate(test_nii,test_labels,verbose=1, batch_size=4)
print(score)

### HyperParameter Tuning

To find optimal hyperparameters, we used a naive grid search method, combining all of the following options:

**Epochs**- [4,8,12,16] <br>
**Batch Size** - [5,10] <br>
**Dropout** - [.08,.15,.25,.40] <br>
**Number of Buckets**- [2,4,10] <br>

The optimal combination we found is: <br>
**Epochs**: 4 <br>
**Batch Size**: 5 <br>
**Dropout**: .15 <br>
**Number of Buckets**: 2 <br> 

We found the model has major issues overfitting, so less epochs performed better.

We also found less buckets made the model more accurate, which intiutively makes sense becuase it has a much larger range of error with one bucket instead of two.

### Best Results
+ 88% Accuracy with 2 Buckets
+ 32% Accuracy with 4 Buckets


# Conclusion

Our project was a failure. We didnt acheive high accuracy and it doesn't seem like our model learned anything about the human brain. That doesn't mean that we did not learn a lot. Analyzing why it didn't work has been a huge learning experience for us and will help us on future projects. Here is an non-exasutive list of things that went wrong.

+ **Lack of Domain Knowledge in Preprocessing** We had no domain knowledge in the field of fMRI, so we blindly used the niLearn preprocessing with little understanding of the methods behind it.
+ **Too Many Features and Too Little Data** Our feature space was over 250,000 features, far too many for any predictive model, even with an massive dataset. Our model could never hope to accurately use them all and suffered accordingly. Even if we could have reduced the feature space, we still only had 17,325 (495 seconds X 35 people) instances in our full dataset. 
+ **Oversimplification of the Human Brain** Even if we had an incredibly powerful model, human brains are intricate and work in complex ways so trying to model it this way makes a lot of assumption. At the time of writing this, the most accurate paper (Bandettini et al)[https://www.sciencedirect.com/science/article/abs/pii/S1053811918320263] we could find has 50% accuracy with 4 buckets, showing how even the state of the art methods still struggle with modeling the human brain.



## Future Steps

+ **Try a CNN_LSTM** Instead of using niLearn to preprocess our data, feed the MRI data directly into a 3D Convolutional Neural Network and pass it's output into the LSTM. This will be much more challenging to train and our dataset is far too small, but could see some improvements due to the CNN's ability to capture spatial features.

+ **Reduce the Feature Space using ROIs** Try only using certain parts of the brain (Regions of Interest) that are known to strongly correlate to music and or emotion to reduce the feature space to a subsection of the brain. 
