In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from processing_functions import create_numeric_labels

In this notebook I'm going to attempt to reproduce the findings in the dataset paper regarding the five benchmark models they tried: LightGBM (hyperparameter optimized by Hyperopt), Multilayer Perceptron (1 hidden layer of 3 neurons), Random Forests, Support Vector Machine (polynomial kernel) and kNearestNeighbor (k = 3). They considered both the AD-CN (Alzheimer's vs. Healthy) and the FTD-CN (Frontotemporal Dementia vs. Healthy) classification problems. The metrics they reported were accuracy, sensitivity, specificity, and F1 score, obtained by Leave-One-Subject-Out cross-validation. This cross-validation method iterates through all subjects, iteratively leaves out one subject at a time, builds a model using the rest of the dataset, and then computes the confusion matrix for that model's predictions on the data corresponding to the left-out subject. These confusion matrices are then summed over the results corresponding to each subject being left out and the metrics are then computed from the resulting total confusion matrix. 

I will be using two different processing methods to obtain the relative band power and comparing the results. The first method is the one indicated in the dataset paper, which takes epoch_length = 2000 (4 seconds) and nperseg = 256 (default value, frequency resolution ~ 1.95). The other method is one I found that suggested in a sleep research blog post that also partially conforms with the method used in the CNN paper, which is to take epoch_length = 15000 (30 seconds) and nperseg = 2000 (frequency resolution 0.25).  I'll be referring to the first as "short epochs" and the second as "long epochs". The short epochs method has the advantage of producing a much larger dataset to train on but has the disadvantage that the frequency resolution is far too low to accurately capture the lower cutoff of the Delta range or even the cutoff between the Alpha and Beta ranges. The long epochs method produces a much smaller dataset but has the advantage of allowing precise integration over each of the five frequency bands. 

To obtain the .npy files used in this notebook you should run the following code block uncommented at the end of the data_processing notebook (after running the imports and the function definition blocks). You can of course also experiment with other choices of parameters.

In [3]:
'''
process_and_save(epoch_length=2000,overlap_ratio=0.5,freq_bands=np.array([0.5,4.0,8.0,13.0,25.0,45.0]),nperseg=256,filenames=['processed_data/short_num_epochs','processed_data/short_rbp'])
process_and_save(epoch_length=15000,overlap_ratio=0.5,freq_bands=np.array([0.5,4.0,8.0,13.0,25.0,45.0]),nperseg=2000,filenames=['processed_data/short_num_epochs','processed_data/short_rbp'])
'''

"\nprocess_and_save(epoch_length=2000,overlap_ratio=0.5,freq_bands=np.array([0.5,4.0,8.0,13.0,25.0,45.0]),nperseg=256,filenames=['short_num_epochs','short_rbp'])\nprocess_and_save(epoch_length=15000,overlap_ratio=0.5,freq_bands=np.array([0.5,4.0,8.0,13.0,25.0,45.0]),nperseg=2000,filenames=['long_num_epochs','long_rbp'])\n"

If everything goes well then the following code block should execute.

In [5]:
short_num_epochs = np.load('processed_data/short_num_epochs.npy')
long_num_epochs = np.load('processed_data/long_num_epochs.npy')
short_rbp = np.load('processed_data/short_rbp.npy')
long_rbp = np.load('processed_data/long_rbp.npy')

The shapes should be (88,), (88,), (88,639,5,19), (88,84,5,19).

In [9]:
print(short_num_epochs.shape)
print(long_num_epochs.shape)
print(short_rbp.shape)
print(long_rbp.shape)

(88,)
(88,)
(88, 639, 5, 19)
(88, 84, 5, 19)


The following piece of code creates numerical labels for the target variable: 0 for healthy group, 1 for Alzheimer's, and 2 for Frontotemporal dementia. The subject indices for the resulting array are aligned with the subject indices (the first dimension) for each of the arrays we loaded in. 

In [31]:
ppt_diagnostics = pd.read_csv('data/ds004504/participants.tsv',sep='\t')
target_labels = ppt_diagnostics['Group'].apply(create_numeric_labels).values
target_labels

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
      dtype=int64)

The sklearn models expect the input to take the form (number of examples) x (number of features). This requires flattening some of the dimensions of our arrays, which is done using the following helper function. The partial_flatten function reshapes the relative band power array so that the first dimension is epochs and the second dimension covers the relative band power for each of the five bands across all 19 channels. The num_epochs array we loaded in is passed as the second argument in order to easily exclude the zero-padded parts of the array. The function also returns a 1-d array of dimension (number of examples,) containing the corresponding class labels for each of these epochs. 

The function assumes that classes that aren't used in the classification have already been removed from the rbp_array, i.e., if you are doing Alzheimer's/healthy classification (for instance) then all FTD examples have been removed. This can be done by feeding the rbp_array, num_epochs, and target_labels into the remove_class function first and then feeding the results into the partial_flatten function. 

If the "exclude" argument is not None then the subject corresponding to that index is left out in the returned arrays. Note that this exclude index corresponds to the index of the subject in the rbp_array being fed into the function, which may not necessarily be the same as the subject's index in the original rbp array that we loaded. The flatten_final argument is a boolean that specifies whether or not the bands x channels part of the array should be flattened into a single dimension. This defaults to True, which is used for the sklearn models, but the non-flattened version will likely later be used in some other models (it's used in the CNN paper, for instance).

In [21]:
def remove_class(rbp_array,num_epochs,target_labels,class_):
    if class_ == 'F':
        return rbp_array[:65].copy(),num_epochs[:65].copy(),target_labels[:65].copy()
    if class_ == 'A':
        return rbp_array[36:].copy(),num_epochs[36:].copy(),target_labels[36:].copy()
    if class_ == 'C':
        return np.concatenate(rbp_array[:36],rbp_array[65:]), np.concatenate(num_epochs[:36],num_epochs[65:]), np.concatenate(target_labels[:36],target_labels[65:])

def partial_flatten(rbp_array,num_epochs,target_labels,exclude=None,flatten_final=True):
    total_subjects = len(target_labels)
    feature_arrays = []
    target_arrays = []
    for i in range(total_subjects):
        feature_arrays.append(rbp_array[i,0:num_epochs[i],:,:])
        target_arrays.append(target_labels[i]*np.ones(num_epochs[i]))
    if exclude==None: 
        features= np.concatenate(feature_arrays)
        targets = np.concatenate(target_arrays)
    else:
        features= np.concatenate(feature_arrays[:exclude] + feature_arrays[exclude+1:])
        targets = np.concatenate(target_arrays[:exclude] + target_arrays[exclude+1:])
    if flatten_final:
        features = features.reshape((features.shape[0],-1))
    return features, targets

The next set of functions take in a 2 x 2 (total) confusion matrix presented as a numpy array and compute the metrics we are interested in. These functions assume that the 0-index of the confusion matrix corresponds to negative examples and the 1-index corresponds to positive examples. 

In the comments TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative. 

In [28]:
def accuracy(confusion):
    # (TN + TP)/total
    return (confusion[0,0]+confusion[1,1])/np.sum(confusion)
def sensitivity(confusion):
    # TP/(TP+FN)
    return confusion[1,1]/(confusion[1,1]+confusion[1,0])
def specificity(confusion):
    # TN/(TN+FP)
    return confusion[0,0]/(confusion[0,0]+confusion[0,1])
def precision(confusion):
    # TP/(TP+FP)
    return confusion[1,1]/(confusion[1,1]+confusion[0,1])
def f1(confusion):
    # harmonic mean of precision and sensitivity
    return 2*(precision(confusion)*sensitivity(confusion))/(precision(confusion)+sensitivity(confusion))

The function below runs our two-class kNN modeling routine with leave-one-subject-out cross-validation. It returns a dictionary of cross-validation accuracy, sensitivity, specificity, and F1 scores computed from the total confusion matrix. removed_class indicates the class to be excluded to create a two-class classification problem (same labels as the remove_class function) and n_neighbors indicates the number of neighbors k to use in kNN. 

In [67]:
def kNN_cross(rbp_array,num_epochs,target_labels,removed_class,n_neighbors):
    if removed_class == 'F':
        labels = [0,1]
    if removed_class == 'A':
        labels = [0,2]
    if removed_class == 'C':
        labels = [1,2]
    confusion_matrices = []
    mod_rbp, mod_num_epochs, mod_target_labels = remove_class(rbp_array,num_epochs,target_labels,removed_class)
    for i in range(len(mod_target_labels)):
        train_X, train_y = partial_flatten(mod_rbp,mod_num_epochs,mod_target_labels,exclude=i,flatten_final=True)
        test_X = mod_rbp[i,0:mod_num_epochs[i],:,:].reshape(mod_num_epochs[i],-1)
        test_y = mod_target_labels[i]*np.ones(mod_num_epochs[i])

        scaler = StandardScaler()
        train_X = scaler.fit_transform(train_X)
        
        ThreeNN = KNeighborsClassifier(n_neighbors=n_neighbors)
        ThreeNN.fit(train_X, train_y)
        
        test_X = scaler.transform(test_X)
        
        confusion_matrices += [confusion_matrix(test_y,ThreeNN.predict(test_X),labels=labels)]
    confusion_matrices = np.array(confusion_matrices)
    total_confusion = np.sum(confusion_matrices, axis= 0)
    return {'acc':accuracy(total_confusion), 'sens':sensitivity(total_confusion), 'spec':specificity(total_confusion), 'f1':f1(total_confusion)}

Below we do leave-one-subject-out cross-validation for a two class kNN classifier for Alzheimer's vs Healthy with k = 3. Both the short and long epoch versions are done. 

First the short version.

In [68]:
short_ThreeNN_metrics = kNN_cross(short_rbp,short_num_epochs,target_labels,removed_class='F',n_neighbors=3)
short_ThreeNN_metrics

{'acc': 0.6473271507200482,
 'sens': 0.638900372054568,
 'spec': 0.6575091575091575,
 'f1': 0.6647073581592058}

Now the long version.

In [69]:
long_ThreeNN_metrics = kNN_cross(long_rbp,long_num_epochs,target_labels,removed_class='F',n_neighbors=3)
long_ThreeNN_metrics

{'acc': 0.6884960880904086,
 'sens': 0.6636652542372882,
 'spec': 0.7184900831733845,
 'f1': 0.6998045238760123}

Note the significant boost in performance (3-4% across all 4 metrics). They are still each about 2% worse than the paper's reported perforamnce of kNN on these metrics with the short epoch version. Still not sure why that's the case. 

Below we also look at the performance of the classifier for healthy vs. FTD for the short epoch version and long epoch version. 

In [70]:
short_ThreeNN_metrics = kNN_cross(short_rbp,short_num_epochs,target_labels,removed_class='A',n_neighbors=3)
short_ThreeNN_metrics

{'acc': 0.6413633224819967,
 'sens': 0.5192447349310094,
 'spec': 0.7253579753579753,
 'f1': 0.5412907702984039}

In [71]:
long_ThreeNN_metrics = kNN_cross(long_rbp,long_num_epochs,target_labels,removed_class='A',n_neighbors=3)
long_ThreeNN_metrics

{'acc': 0.7318647930117737,
 'sens': 0.6177570093457944,
 'spec': 0.8099808061420346,
 'f1': 0.6518737672583826}

The long epoch version has quite good accuracy and specificity for FTD vs. healthy (with the short version not being terrible either), though we can see that this comes at the cost of worse sensititivy and worse F1 scores. The low sensitivity and high specificity for both the short and long version in particular means that the classifier is not great at identifying FTD cases as being FTD. 