# Perceptron Demo - Distinguishing Traces of Schizophrenia

## CSCI 4850-5850 - Neural Networks

Being able to detect traces of schizophrenia in a person's brain can be a valuable thing. Diagnosing schizophrenia can be done in a variety of ways, such as physical examinations, tests and screenings, or psychiatric evaluation. Obtaining a solid diagnosis can be difficult and time/cost consuming. In our project, we wanted to apply the use of neural nets to try and detect traces of schizophrenia and accurately diagnose it.

## What data to use - fMRI

There are several ways to detect schizophrenia, but one of the most popular ways is through brain scans. Since schizophrenia is diagnosed as a mental disorder, the brain is directly correlated with it. The dopamine produced by the brain is tied to the hallucinations that schizophrenic patients see or hear. A good way to detect the activity of the brain is through Functional Magnetic Resonance Imaging (fMRI). An fMRI measures the flow of blood in one's brain. By viewing an fMRI, a doctor can see if certain activity/inactivity in a region of the patient's brain could be a sign of schizophrenia. So, since we have an image that can tell us if a patient has traces of schizophrenia, we can plug that into a neural net to see if it can detect it for us! Hopefully, this will allow doctors to just be able to scan a patient's brain, plug in into the neural net, and wait for the net to decide if that patient has schizophrenia or not. However, we want the highest possible accuracy we can get in order to cut down on misdiagnoses, time and cost.

Since fMRI is a highly valuable dataset with a lot of information packed into a few dimensions, this proves it to be difficult to use in a neural net. An fMRI is a scan of the patient's brain sliced into several regions over several timestamps, which makes it difficult to efficiently feed into a neural net. 

The data that we'll be using for this demo is provided by The Center for Biomedical Research Excellence (COBRE). This dataset contains MR data from 72 schizophrenic patients and 75 MR scans from healthy controls. The ages of these test patients range from 18 to 65.

## Step 1: Taking a look at the data

In order to use the COBRE data set, we need to use a few tools: `nilearn` and `nibabel`

In [3]:
# nilearn helps with loading and handling of the COBRE dataset and is actually built to help run this dataset
import nilearn
from nilearn import plotting
from nilearn import image
from nilearn import datasets

In [5]:
# nibabel also helps with the testing of the dataset
from nibabel.testing import data_path
import nibabel as nib

# import other basic necessities
import os
import numpy as np
import keras
from keras import backend as K
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
%matplotlib inline
from IPython.display import display

Now that we have our tools all set out, lets start unpacking the data.

In [7]:
# Get dataset with nilearn function
# if not downloaded, will download. If already downloaded, will uses local version
dataset = nilearn.datasets.fetch_cobre(n_subjects=146, 
                                       data_dir="/nfshome/sandbox/perceptron",
                                       url=None,
                                       verbose=0)


In [8]:
phenotypes = dataset["phenotypic"]
file_paths = dataset["func"]
phenotypes.sort(0) #sort by column corresponding to patient number
file_paths.sort() #sort file names by alphabetical order, which will result in sorting by patient number
display(phenotypes[0])
display(file_paths[0])
#file_paths is now a regular python list of the file paths to the fmri scans
#phenotypes is now a np.recarray of np.records storing patient info

(40000, 20, b'Female', b'Right', b'Patient', b'295.9', 140, 0.21234, 0.20245)

'/nfshome/sandbox/perceptron/cobre/fmri_0040000.nii.gz'

In [9]:
# get just the diagnosis information from the phenotypes
diagnosis = phenotypes['diagnosis']
diagnosis_converted = []
#this stem is necessary to convert np.byte array into strings, and then fit those strings into 2 categories:
#Schizophrenia or no Schizophrenia
for item in diagnosis:
    s = item.decode('UTF-8')
    if s != "None":
        diagnosis_converted.append(float(1))   #person has schizophrenia
    else:
        diagnosis_converted.append(float(0))   #person doesn't have schizophrenia

del diagnosis_converted[74]                  # item 74 is a messed up scan with different dimensions
del file_paths[74]                           # so it needs to be removed

y_train = np.array(diagnosis_converted)
y_train = keras.utils.to_categorical(y_train, len(np.unique(y_train))) #one hot encoding

# Make x train from the file paths
scans = []
for item in file_paths:
    scan = nib.load(item)
    data = scan.get_fdata()
    scans.append(data)
x_train = np.array(scans)

In [10]:
#X train is now 145 different fmri scans, with dimensions 27x32x26x150
#the 27x32x26 is length, width, and height
#the 150 is time, there are 150 different 3d 'voxels' or times for each full fmri scan
x_train.shape

(145, 27, 32, 26, 150)