# iQ Winter School 2018 on Machine Learning Applied to Quantitative Analysis of Medical Images 

## Hands-on Session 1 - Tutorial
## Part 1 - Machine Learning basics

### 1. Import libraries

First, we are going to import the libraries that will be used in this tutorial.

In [None]:
import numpy as np
import scipy.ndimage as sndim
from pylab import *
from mpl_toolkits.axes_grid1 import ImageGrid
import sklearn.model_selection, sklearn.linear_model, sklearn.metrics, sklearn.pipeline, \
sklearn.decomposition, sklearn.feature_selection, sklearn.ensemble, sklearn.cluster, \
skimage.segmentation

import warnings
warnings.filterwarnings("ignore",category=DeprecationWarning)
  
print("Hoorray, no import errors!")

### 2. Load data from the OASIS database.

We are now going to load the dataset with which we are going to work throughout this tutorial. The data was retrieved and adapted from the OASIS database: http://www.oasis-brains.org/ and consists of coronal slices of T1-weighted brain scans acquired from both healthy subjects and patients with cognitive impairment (all diagnosed with "probable Alzheimer's Disease" - AD). The goal is to separate these two classes, which have been defined as follows:

* label 0: cognitively healthy subjects, ie. a Clinical Dementia Rating (CDR) of 0
* label 1: subjects with "probable AD", ie. a Clinical Dementia Rating (CDR) of 0.5 or 1

The T1-images have been preprocessed: the skulls have been removed, the bias field that is often present in MR images has been corrected for and the images have been affinely registered to a brain template.

In [None]:
# Loads the data into a numpy array
subjects = np.load(r'data/subjects_masked_bfc_resc.npy')
labels = np.load(r'data/labels.npy')

# Counts the number of healthy subjects and AD patients and the number of features (pixels) per image
nb_labels_class = np.bincount(labels)
N_subjects = subjects.shape[0]
N_features = subjects.shape[1]
print("%d healthy, %d AD" % (nb_labels_class[0], nb_labels_class[1]))
print("number of samples: %d, number of features: %d" % (N_subjects, N_features))

Each row of the variable `subjects` corresponds to a subject and the columns are the individual pixels of the brain slice, all put together in a single row.

The variable `labels` contains the class of each of the 137 subjects: 0 for healthy subjects, 1 for Alzheimer's patients.

### 3. Visualize some examples

Next, we are going to display some of the T1 images:

In [None]:
# Selects the healthy subjects
healthy_subjects = subjects[labels==0]

# Selects AD patients
ad_subjects = subjects[labels==1]

# Creates figure for plotting
fig = figure(figsize=(16,12))
N = 6
img_shape = (176,176)
grid = ImageGrid(fig, 111,  nrows_ncols=(2, N), axes_pad=0.1) 
for i in range(N):
    grid[i].imshow(healthy_subjects[i].reshape(img_shape), cmap='gray') # first row
    grid[i+N].imshow(ad_subjects[i].reshape(img_shape), cmap='gray') # second row
    grid[i+N].get_xaxis().set_ticks([])

    if i == 0:
        grid[i].get_xaxis().set_ticks([])
        grid[i].get_yaxis().set_ticks([])
        grid[i].set_ylabel('healthy')
        
        grid[i+N].get_yaxis().set_ticks([])
        grid[i+N].set_ylabel('AD')
show()

Often, and especially if we are planning to use the image pixel values directly as features, some standardization needs to be performed. This has already been done in this dataset, but it is still worth confirming that this is the case by taking a look at the image histograms.

In [None]:
Nbins = 100

# Collects the histograms of all subjects
histograms = np.zeros((N_subjects, Nbins))
figure(figsize = (8,8))
for i in range(N_subjects):
    mask = subjects[i,:] > 0 # creates a brain mask (discard all background pixels)
    h,b = np.histogram(subjects[i,:][mask], bins = Nbins, normed = True)
    histograms[i,:] = h
    plot(b[:-1],h)

ylabel("histogram")
xlabel("pixel intensity")
show()

The image intensities are all within the same range and the three histogram modes are approximately aligned, indicating that the different tissue types (gray, white matter and cerebrospinal fluid) have similar intensity ranges across all patients.

### 4. Logistic regression on the pixel values. 

One of the most basic linear classifiers is Logistic Regression. We will start by applying it to the pixel intensities. For a proper evaluation of the classifier, we will perform 3-fold cross-validation on the entire dataset of 137 samples. We opt for stratified folds, as the dataset is slightly imbalanced, thereby ensuring that the class proportions remain the same across all folds. To evaluate the performance of the classifier in each fold, we will use the F1 score, which takes both precision and recall into account.

#### 4.1 Benchmark: Logistic Regression applied to the original pixel values

In [None]:
# Initializes the Logistic Regression (LR) classifier
log_reg = sklearn.linear_model.LogisticRegression()

# Initializes the cross validation scheme with 3 folds
cv = sklearn.model_selection.StratifiedKFold(n_splits = 3)

# Computes the F1 score on the test set for each cross-validation fold
scores_logreg = sklearn.model_selection.cross_val_score(log_reg, subjects, labels, cv=cv, scoring='f1')
print("scores in each CV fold: ", scores_logreg)
print("average F1 score: %0.2f +/- %0.2f" % (scores_logreg.mean(), scores_logreg.std()))

##### ----- What happens when you increase the amount of folds in the cross validation scheme? Why is this?
##### ----- What happens if you try a different cross-validation function? (ShuffleSplit)

#### 4.2 Overfitting

Although this simple experiment gives an indication of how the classifier is performing on the test sets (for each cross-validation split), it is worth checking whether we are under- or overfitting (if at all). This will help us decide what to do next.

A simple way of understanding the behavior of the classifier is to observe how the training and test performance scores vary as we increase the amount of training samples.

We will then make the so-called "learning curves" for this classification problem. As a reference, we also plot the F1 score we would obtain if we would classify the samples randomly. 

In [None]:
train_sizes = np.array([0.1,0.3,0.6,0.9]) # fraction of samples to be used for training
train_sizes, train_scores, test_scores = sklearn.model_selection.learning_curve(log_reg, \
                                                                                subjects, \
                                                                                labels, \
                                                                                cv = cv, \
                                                                                train_sizes = train_sizes, \
                                                                                scoring = 'f1')

F1_random = nb_labels_class[1]/labels.size # random classification F1 score

figure(figsize=(8,8))
title("Logistic Regression on the original pixel values")
xlabel("# Training examples")
ylabel("F1-Score")
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plot(train_sizes, train_scores_mean, 'o-', color="r", label="Training")
fill_between(train_sizes, train_scores_mean - train_scores_std, train_scores_mean + train_scores_std, alpha=0.1, color="r")
plot(train_sizes, test_scores_mean, 'o-', color="g", label="Test")
fill_between(train_sizes, test_scores_mean - test_scores_std, test_scores_mean + test_scores_std, alpha=0.1, color="g")
plot(train_sizes, np.ones(train_sizes.size) * F1_random, '--k', label = 'Random')
legend()
ylim(0.48,1.01)
show()

While we obtain perfect training performance (F1 score = 1) for all training set sizes, for the test set the classifier performs only slightly above random. This is a common indication that we are overfitting the data. In fact, we could already anticipate this, as the number of training samples we have is significantly smaller than the number of features (in our case, less than 137 subjects with ~30k pixel intensities).

A straightforward way of avoiding overfitting would be to increase the amount of training samples. However, and just like in many other medical imaging applications, our dataset is limited.

Another common strategy to prevent overfitting is to decrease the dimensionality of the problem, i.e., to reduce the number of features.

We can achieve this by either selecting some of the pre-existing features or by extracting new features from the original ones. We will start by trying feature selection using univariate analysis on the original data. Basically, we will look at each pixel individually and see whether its intensity is statistically significantly different between the two classes.

#### 4.3 Feature selection: univariate analysis

In [None]:
features = np.array([1, 10, 100, 1000, 10000, N_features]) # number of features to be selected/extracted
n_plots = len(features)
scores_avg = np.zeros(features.size)
scores_std = np.zeros(features.size)

# Iterates over the number of features 
for i,n_feats in enumerate(features):
    # feature reduction line - change this for other types of feature selection/extraction
    feature_red = sklearn.feature_selection.SelectKBest(k=n_feats) # univariate feature selection

    # Applies a sequence of transforms: feature selection followed by Logistic Regression
    pipeline = sklearn.pipeline.Pipeline([('feat reduction', feature_red), ('log reg', log_reg)])
    
    # Performs cross-validation
    scores = sklearn.model_selection.cross_val_score(pipeline, subjects, labels, cv=cv, scoring='f1')
    scores_avg[i] = scores.mean()
    scores_std[i] = scores.std()
    

figure(figsize=(8,8))
plot(np.log(features), scores_avg,'-o', label='feature reduction')
fill_between(np.log(features), scores_avg - scores_std, scores_avg + scores_std, alpha=0.1, color="b")
plot(np.log(features), np.ones(len(features))*F1_random, '--k', label = 'Random')
ylim(0.48,1.01)
xlabel('# features')
ylabel('F1 score (3-fold CV) on the whole dataset')
legend()
gca().set_xticks(np.log(features))
gca().set_xticklabels(features)
xticks(rotation=70)
show()

In this case, selecting statistically relevant features from the original dataset does not improve the performance. However, we observe that with 100 features the classifier performs only slightly worse than when all features are used, suggesting that redundant information is present in the feature set we are using. We could, then, decide to keep these 100 features and try to improve the performance of the classifier in some other way.

What is typically also worth investigating is the location of the features that have been selected as "most significant". For example, if we search for the 100 most significant features and overlay them on one subject's image...

In [None]:
# Selects 100 features (pixels) 
feature_sel = sklearn.feature_selection.SelectKBest(k=100)
feature_sel.fit(subjects,labels)

sel_features = feature_sel.get_support() # returns a binary mask for the selected features
sel_features_img = sel_features.reshape(img_shape) # reshapes the mask to the natural image shape
sel_features_img = np.ma.masked_where(sel_features_img == 0, sel_features_img) # keeps only the selected features

figure(figsize=(8,8))
imshow(subjects[1,:].reshape(img_shape), cmap='gray')
imshow(sel_features_img, cmap='autumn')
axis('off')
show()

... we observe that they are located near the ventricle boundaries and close to the hippocampus, which are areas typically affected in Alzheimer's patients. The pixel intensities here are capturing the brain shrinkage and the consequent increasing amount of cerebrospinal fluid (dark areas).

We can also extract features from the original set of pixel intensities. This means that new features will be created, which may be more relevant for the classification we want to perform.

#### 4.4 Feature extraction with PCA

If you go back to the code where we performed feature selection, you can change one line to do feature extraction instead. A simple method for that is Principal Component Analysis (PCA), which linearly projects the data points into the directions of largest variance.

##### ----- What sklearn function can you use to perform PCA? How different are the results compared to univariate feature selection? What are the disadvantages of extracting rather than selecting features?

Finally, instead of directly taking the pixel intensities as features, we can use the corresponding histograms (displayed at the very beginning of this tutorial) as feature vectors. This is a common way of summarizing the information contained in images.

#### 4.5 Logistic Regression on the histograms

In [None]:
scores_logreg = sklearn.model_selection.cross_val_score(log_reg, histograms, labels, cv=cv, scoring='f1')
print("all scores (histograms): ", scores)
print("average F1 score (histograms): %0.2f +/- %0.2f" % (scores_logreg.mean(), scores_logreg.std()))


##### ----- What other features do you think would be relevant for this classification task? Modify the code below accordingly.

In [None]:
nb_features = 2
new_features = np.random.randn(N_subjects, nb_features) # random feature vectors
#for i in range(N_subjects):
    #new_features[i,0] = feature of interest 1 
    #new_features[i,1] = feature of interest 2
    #...
scores_logreg = sklearn.model_selection.cross_val_score(log_reg, new_features, labels, cv=cv, scoring='f1')
print("all scores (new features): ", scores_logreg)
print("average F1 score: %0.2f +/- %0.2f" % (scores_logreg.mean(), scores_logreg.std()))

### 5. Support Vector Machine

We will try now a Support Vector Machine (SVM) with a linear kernel. An important parameter in this classifier is the regularization parameter (C). Low C values correspond to highly regularized models (low penalty for training errors and consequently wider margins).

We will take a look at what happens when we vary this parameter. In particular, we will compare the training and the test F1 scores, on a fixed training-test (50/50) split, at increasing C values.

In [None]:
# Splits the dataset into training and testing
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(subjects, \
                                                                            labels, \
                                                                            test_size=0.5, \
                                                                            random_state=0)

# Creates an array of equally spaced numbers over a certain interval
Cs = 10**np.linspace(-6,-1,6)
training_scores = []
test_scores = []

for i,C in enumerate(Cs):
    # Initializes the SVM model
    linear_svm = sklearn.svm.SVC(kernel='linear', C=C)
    
    # Creates the model by fitting it to the data
    linear_svm.fit(X_train, y_train)
    
    # Predicts the training and test set labels
    training_pred = linear_svm.predict(X_train)
    test_pred = linear_svm.predict(X_test)
    
    # Saves the training and test scores for this iteration (one of the values of C)
    training_scores.append(sklearn.metrics.f1_score(training_pred, y_train))
    test_scores.append(sklearn.metrics.f1_score(test_pred, y_test))

figure(figsize=(8,8))
plot(np.log(Cs), training_scores,'-r', label='Training')
plot(np.log(Cs), test_scores,'-b', label='Test')
plot(np.log(Cs), np.ones(len(Cs))*F1_random, '--k', label = 'Random')
ylim(0.48,1.01)
xlabel('regularization parameter C')
ylabel('F1 score')
legend()
gca().set_xticks(np.log(Cs))
gca().set_xticklabels(Cs)
xticks(rotation=70)
show()

We see the effect of decreasing the amount of regularization (increasing C) on the learning curves: at low C values, the training and the test errors are similar, meaning that we are neither over- nor underfitting. We reach an optimum score on the test set at C=1e-3, after which the performance deteriorates, depite the increase in the training score.

In practice, we would pick the "best" C parameter by performing a search over a set of possible values in a nested cross-validation within the training set.

##### ----- We have been using the F1 score as a performance metric, but what are the individual precision and recall values? In which situations would you favor one over the other?
##### ----- How can you find the best parameters for your model when many are available? (GridSearchCV)


### 6. Decision boundaries

When our problem is bi- or tri-dimensional, we are able to visualize the feature space and the decision boundary determined by the classifier. 

For illustration purposes, we will now keep the 2 most "relevant" features (after univariate feature selection) and apply both Logistic Regression and linear SVM to the whole dataset:

In [None]:
# Selects the two "best" features
feature_sel = sklearn.feature_selection.SelectKBest(k=2)
feature_sel.fit(subjects,labels)
features_2D = feature_sel.transform(subjects)

# Builds a 2D feature space in which the decision boundary will be calculated
step_size = (features_2D.max() - features_2D.min())/1000
xmin, xmax = features_2D[:,0].min() - step_size, features_2D[:,0].max() + step_size
ymin, ymax = features_2D[:,1].min() - step_size, features_2D[:,1].max() + step_size
xx, yy = np.meshgrid(np.arange(xmin, xmax, step_size),np.arange(ymin, ymax, step_size))

# Applies the classifiers to the samples in the feature space
log_reg.fit(features_2D, labels)
logreg_labels = log_reg.predict(np.c_[xx.ravel(), yy.ravel()])
logreg_labels = logreg_labels.reshape(xx.shape)

linear_svm = sklearn.svm.SVC(kernel='linear', C=1e5)
linear_svm.fit(features_2D,labels)
linear_svm_labels = linear_svm.predict(np.c_[xx.ravel(), yy.ravel()])
linear_svm_labels = linear_svm_labels.reshape(xx.shape)

# Plots the decision boundaries
figure(figsize=(12,6))
subplot(121)
title('Logistic Regression')
plot(features_2D[labels==0,0], features_2D[labels==0,1],'bo', markeredgecolor='black', label='healthy')
plot(features_2D[labels==1,0], features_2D[labels==1,1],'ro', markeredgecolor='black', label='AD')
contourf(xx, yy, logreg_labels, cmap='Paired', alpha=.5)
xlabel('Feature 1')
ylabel('Feature 2')
legend()
subplot(122)
title('Linear SVM')
plot(features_2D[labels==0,0], features_2D[labels==0,1],'bo', markeredgecolor='black', label='healthy')
plot(features_2D[labels==1,0], features_2D[labels==1,1],'ro', markeredgecolor='black', label='AD')
contourf(xx, yy, linear_svm_labels, cmap='Paired', alpha=.5)
xlabel('Feature 1')
legend()

show()

Both Logistic Regression and SVM with no kernel are linear classifiers. Sometimes it may be worth using more complex models that have more degrees of freedom to fit the data. Examples of non-linear classifiers include SVMs with kernels (e.g.: a Radial Basis Function - RBF), Random Forests (RF) or even K-Nearest Neighbors (kNN).

We will now show what the decision boundary may look like for these non-linear classifiers when we apply them to the 2 best features selected above. The exact shapes depend on each classifier's specific hyperparameters.

In [None]:
# Creates the classifiers, trains them and applies them to the samples in the 2D feature space
rbf_svm = sklearn.svm.SVC(kernel='rbf', C=100, gamma = 10)
rbf_svm.fit(features_2D, labels)
rbf_svm_labels = rbf_svm.predict(np.c_[xx.ravel(), yy.ravel()])
rbf_svm_labels = rbf_svm_labels.reshape(xx.shape)

rf = sklearn.ensemble.RandomForestClassifier(n_estimators=10, max_depth=3)
rf.fit(features_2D, labels)
rf_labels = rf.predict(np.c_[xx.ravel(), yy.ravel()])
rf_labels = rf_labels.reshape(xx.shape)

knn = sklearn.neighbors.KNeighborsClassifier(n_neighbors=10)
knn.fit(features_2D, labels)
knn_labels = knn.predict(np.c_[xx.ravel(), yy.ravel()])
knn_labels = knn_labels.reshape(xx.shape)

# Plots the decision boundaries
figure(figsize=(14,6))
subplot(131)
title("RBF SVM")
plot(features_2D[labels==0,0], features_2D[labels==0,1],'bo', markeredgecolor='black', label='healthy')
plot(features_2D[labels==1,0], features_2D[labels==1,1],'ro', markeredgecolor='black', label='AD')
contourf(xx, yy, rbf_svm_labels, cmap='Paired', alpha=.5)
legend()
axis('off')
subplot(132)
title("Random Forest")
plot(features_2D[labels==0,0], features_2D[labels==0,1],'bo', markeredgecolor='black', label='healthy')
plot(features_2D[labels==1,0], features_2D[labels==1,1],'ro', markeredgecolor='black', label='AD')
contourf(xx, yy, rf_labels, cmap='Paired', alpha=.5)
legend()
axis('off')
subplot(133)
title("k-Neareast Neighbors")
plot(features_2D[labels==0,0], features_2D[labels==0,1],'bo', markeredgecolor='black', label='healthy')
plot(features_2D[labels==1,0], features_2D[labels==1,1],'ro', markeredgecolor='black', label='AD')
contourf(xx, yy, knn_labels, cmap='Paired', alpha=.5)
legend()
axis('off')
show()

##### ----- Can you explain the different boundary shapes obtained by these three classifiers? What happens when you change the classifiers' hyperparameters (eg: $\gamma$ for the RBF-SVM, maximum depth for the RF, number of neighbors for the kNN)?


We can also compare the performance of these classifiers on a fixed training-test split:

In [None]:
# Splits the samples into training (50%) and test (50%)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(features_2D, \
                                                                            labels, \
                                                                            test_size=0.5, \
                                                                            random_state=0)

training_scores = []
test_scores = []
for clf in [log_reg, linear_svm, rbf_svm, rf, knn]: # iterates over the previosuly defined classifiers
    clf.fit(X_train, y_train)
    train_pred = clf.predict(X_train)
    training_scores.append(sklearn.metrics.f1_score(train_pred, y_train))
    test_pred = clf.predict(X_test)
    test_scores.append(sklearn.metrics.f1_score(test_pred, y_test))

# Plots the performance results
figure(figsize=(8,8))
bar(np.arange(1,6), training_scores, width = 0.1, color = 'r', label = 'training')
bar(np.arange(1,6)+0.2, test_scores, width = 0.1, color = 'b', label = 'test')
legend()
gca().set_xticks(np.arange(1,6)+.1)
gca().set_xticklabels(["Logistic Regression", "Linear SVM", "RBF SVM", "Random Forest", "k-Neareast Neighbors"])
xticks(rotation=70)
ylabel('F1 score')
show()


### 7. 3D downsampled images

Finally, we can take a look at what happens when, instead of using a single 2D slice per subject, we take the whole 3D volume. As we have memory restrictions, we will use 3D images that have been downsampled 4 times.

In [None]:
# Loads the (previously downsampled) 3D volumes 
subjects_3d = np.load(r'data/subjects_masked_bfc_3D_ds4.npy')
img_shape_3d = (44,52,44)

# Shows three orthogonal slices for one subject
img = subjects_3d[0,:].reshape(img_shape_3d).transpose((2,1,0))
figure(figsize=(12,12))
subplot(131)
imshow(img[22,::-1,:], cmap='gray')
axis('off')
title('transversal slice')
subplot(132)
imshow(img[::-1,36,:], cmap='gray')
axis('off')
title('coronal slice')
subplot(133)
imshow(img[::-1,:,22], cmap='gray')
axis('off')
title('sagittal slice')

show()

If we apply Logistic Regression directly on the pixel intensities, similarly to what we've done at the beginning of this tutorial:

In [None]:
log_reg = sklearn.linear_model.LogisticRegression()
cv = sklearn.model_selection.StratifiedKFold(n_splits = 5) # 5-fold cross-validation
scores_logreg = sklearn.model_selection.cross_val_score(log_reg, subjects_3d, labels, cv=cv, scoring='f1')
print("all scores: ", scores)
print("average F1 score: %0.2f +/- %0.2f" % (scores_logreg.mean(), scores_logreg.std()))

... the F1-score is slightly higher (0.67) than when using one slice only (0.62), meaning that relevant information is present in other parts of the brain. 

##### ----- Feature selection/extraction would likely improve the performance even further. A more complex classifier would also probably perform better. Can you combine the techniques illustrated above as a final experiment on the 3D images?

### 8. Clustering for image segmentation

We now want to segment the brain images into three separate tissues: white matter, gray matter and cerebrospinal fluid. We don't have labeled data on which we could train a classifier, so we will do unsupervised classification.

For that purpose we will use a basic clustering algorithm: K-Means. Also, in order to extract more information from the image, we will create two feature maps and use them as inputs for the clustering.

In [None]:
# Selects one image
subj_index = 5
img = subjects[subj_index,:].reshape(img_shape)
brain_mask = img > 0

# Filters the image with a gaussian filter and the laplacian of a gaussian
gaussian_map = sndim.gaussian_filter(img, sigma=1)
laplace_map = sndim.gaussian_laplace(img, sigma=1)

# Shows the two filtered images
figure(figsize=(12,12))
subplot(121)
imshow(gaussian_map, cmap='gray')
title('gaussian filtered image')
axis('off')
subplot(122)
imshow(laplace_map, cmap = 'gray')
title('laplacian filtered image')
axis('off')
show()

In [None]:
# Performs K-Means clustering with 3 classes (clusters) using the gaussian and laplacian filtered 
# pixel intensities as features
kmeans = sklearn.cluster.KMeans(n_clusters=3)
labels_kmeans = kmeans.fit_predict(np.c_[gaussian_map[brain_mask], laplace_map[brain_mask]])
clusters = np.zeros(img_shape, dtype=np.int8)
clusters[brain_mask] = labels_kmeans + 1

# Shows the obtained clusters and corresponding contours
contours = skimage.segmentation.mark_boundaries(img, clusters)
figure(figsize=(12,12))
subplot(121)
imshow(clusters)
axis('off')
subplot(122)
imshow(contours)
axis('off')

# Show the 2D feature space and the clustering result (the colors are the same as in 
# the segmented brain image)
figure(figsize=(8,8))
plot(gaussian_map[clusters==1], laplace_map[clusters==1], 'ob')
plot(gaussian_map[clusters==2], laplace_map[clusters==2], 'og')
plot(gaussian_map[clusters==3], laplace_map[clusters==3], 'oy')
xlabel('gaussian')
ylabel('laplacian')
axis('square')
show()

##### ----- Why do the clusters have this specific shape?

##### ----- Knowing that there is cortical shrinkage in Alzheimer's patients and that the gray matter/cerebrospinal fluid volume ratio is often affected, how could you build a feature set that incorporates that information? Can you test the classifiers used above on these features?

In [None]:
def segment_brain(brain_img):
    # add code here
    brain_labels = np.zeros(brain_img.shape, dtype=bool)
    return brain_labels

def extract_features(brain_labels, nb_features):
    feature_vector = np.random.randn(nb_features)
    # add code here
    return feature_vector

# Extracts features for each subject
nb_features = 2
new_features = np.random.randn(N_subjects, nb_features)
for i in range(N_subjects):
    brain_labels = segment_brain(subjects[i,...].reshape(img_shape))
    new_features[i,:] = extract_features(brain_labels, nb_features)
    #...

# Classifies using logistic regression
scores_logreg = sklearn.model_selection.cross_val_score(log_reg, new_features, labels, cv=cv, scoring='f1')
print("all scores: ", scores_logreg)
print("average F1 score: %0.2f +/- %0.2f" % (scores_logreg.mean(), scores_logreg.std()))