# Unsupervised machine learning

Previously, we have taken MR images and attempt assign to them values of normal/diseased, at a whole image level (classification) or a pixel level (segmentation). But what if we don't have labels?

We are going to spent this tutorial exploring how to find patterns in data using unsupervised machine learning. We'll introduce three new tools that require no labels (though we'll keep track of them to evaluate our performance):

- Autoencoders: deep neural network designed minimize the "reconstruction error" between the input and output (which are the same)
- K-means clustering: a way of automatically find groups of unlabelled data points in space based on distances between them
- K-nearest neighbours: assigning a new data point a label based on its proximity to other labelled data points

In [10]:
%load_ext autoreload
train_path = 'C:/Users/jxb29/Dropbox (Partners HealthCare)/Teaching/BRATS_10_Updated/*/*.nii.gz'
sequences = ['t1', 't2']

In [3]:
from skimage.measure import label, regionprops

def normalize_images(channel_copy):
        
    label_image = label(channel_copy == 0)

    largest_label, largest_area = None, 0
    for region in regionprops(label_image):
        if region.area > largest_area:
            largest_area = region.area
            largest_label = region.label

    mask = label_image == largest_label     
    masked_channel = np.ma.masked_where(mask, channel_copy)

    masked_channel = masked_channel - np.mean(masked_channel)
    masked_channel = masked_channel / np.std(masked_channel)
    masked_channel = np.ma.getdata(masked_channel)
    return masked_channel

In [6]:
from glob import glob
import nibabel as nib
from os.path import basename
import numpy as np 

all_images = glob(train_path)
slices = []
labels = []

for nifti_file in all_images:
    
    seq = basename(nifti_file).split('.')[0].split('_')[-1]
    
    if seq not in sequences:
        continue
    
    # Load Nifti file, normalize it
    vol = nib.load(nifti_file).get_data()
    vol = normalize_images(vol)
    
    # Take a middle-ish section of the volume
    halfway_point = vol.shape[2] // 2
    sample = [vol[:,:,i] for i in range(halfway_point-20, halfway_point+20)]
    slices.extend(sample)
    
    # Keep track of the labels (sequence ID: 0 == t1, 1 == t2)
    index = sequences.index(seq)
    index_list = [index] * 40
    labels.extend(index_list)
    break

In [7]:
# (samples: 40 * N, rows: 240, columns: 240, channels: 1)
X = np.expand_dims(np.asarray(slices), axis=-1)
print(X.shape)

(40, 240, 240, 1)


In [24]:
# from models import autoencoder
%autoreload 2
import models
ae = models.autoencoder(image_shape=X.shape[1:])
print(ae.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_23 (Conv2D)           (None, 240, 240, 8)       80        
_________________________________________________________________
batch_normalization_22 (Batc (None, 240, 240, 8)       32        
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 120, 120, 8)       0         
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 120, 120, 16)      1168      
_________________________________________________________________
batch_normalization_23 (Batc (None, 120, 120, 16)      64        
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 60, 60, 16)        0         
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 60, 60, 32)        4640      
__________