# Medical Image Classification
## Solution 1: Building a Convnet from Scratch

In this exercise, we will build a classifier model from scratch that is able to distinguish medical images. We will follow these steps:

1. Prepare the data & model hyperparameters
2. Explore the data
3. Build a small convnet from scratch to solve our classification problem
4. Evaluate training and validation accuracy

Let's go!

## Prepare data to classicfication classes, set Hyperparams

In [1]:
import os
import numpy as np
import SimpleITK as sitk
from matplotlib import pyplot as plt
os.environ['CUDA_VISIBLE_DEVICES'] = str(1)
base_dir = '/data/wdm_Projects/X-ray-images-classification-with-Keras-TensorFlow/data'
class_names = ['CT','Probe']
img_width = img_height = 512
color_channels = 3

#layers & optimizer
target_width = target_height = 512
target_size = (128, 128, 128)#(30, 256, 256)#(100,306,386)#(100, 306, 386)
pooling_window = 2
conv_window = 3
kernel_size = 4
dropout=0.01
activation = "relu"
optimizer = 'Adam'
loss='categorical_crossentropy'
#metrics= ['acc']
metrics=['categorical_accuracy']
class_mode = 'categorical'
batch_size = 200
epochs = 100
verbose = 2
lr = 0.0001

## Explore the data

## Building a Small Convnet from Scratch to Get to 72% Accuracy

The images that will go into our convnet are 1024x1024 color images (in the next section on Data Preprocessing, we'll add handling to resize all the images to 299x299 before feeding them into the neural network).

Let's code up the architecture. We will stack 3 {convolution + relu + maxpooling} modules. Our convolutions operate on 3x3 windows and our maxpooling layers operate on 2x2 windows. Our first convolution extracts 16 filters, the following one extracts 32 filters, and the last one extracts 64 filters.

**NOTE**: This is a configuration that is widely used and known to work well for image classification. Also, since we have relatively few training examples (1,000), using just three convolutional modules keeps the model small, which lowers the risk of overfitting (which we'll explore in more depth in Exercise 2.)

In [2]:
from tensorflow.keras import layers
from tensorflow.keras import Model

  from ._conv import register_converters as _register_converters


In [17]:
# Our input feature map is 150x150x3: 150x150 for the image pixels, and 3 for the three color channels: R, G, and B
#img_input = layers.Input(shape=(img_width, img_height, color_channels))
img_input1 = layers.Input(shape=target_size + (1,))
img_input2 = layers.Input(shape=target_size + (1,))
img_input = layers.Subtract()([img_input2, img_input1])
#img_input = layers.concatenate([img_input1, img_input2])
#x = layers.concatenate([img_input, img_input1])
#x = layers.concatenate([x, img_input2])
# First convolution extracts 16 filters that are 3x3
# Convolution is followed by max-pooling layer with a 2x2 window
net = layers.Conv3D(kernel_size, conv_window, activation=activation, padding='valid', strides=4)(img_input)
#x = layers.BatchNormalization()(x)
x = layers.MaxPooling3D(pooling_window)(net)
#x = layers.BatchNormalization()(x)

#x = layers.Conv3D(kernel_size, conv_window, activation=activation, padding='valid')(x)
#branch_1 = x

# Second convolution extracts 32 filters that are 3x3
# Convolution is followed by max-pooling layer with a 2x2 window
x = layers.Conv3D(2*kernel_size, conv_window, activation=activation, padding='same', strides=1)(x)
#x = layers.BatchNormalization()(x)
x = layers.MaxPooling3D(pooling_window)(x)
#x = layers.BatchNormalization()(x)
#x = layers.Conv3D(2*kernel_size, conv_window, activation=activation, padding='same')(x)

# Third convolution extracts 64 filters that are 3x3
# Convolution is followed by max-pooling layer with a 2x2 window
#x = layers.Conv3D(4*kernel_size, conv_window, activation=activation, padding='same')(x)
#x = layers.Conv3D(4*kernel_size, conv_window, activation=activation, padding='valid', strides=2)(x)
#x = layers.Conv3D(4*kernel_size, conv_window, activation=activation)(x)
#x = layers.MaxPooling3D(pooling_window)(x)
x = layers.Conv3D(6*kernel_size, conv_window, activation=activation, padding='same', strides=1)(x)
#branch_2 = x
#x = layers.MaxPooling3D(pooling_window)(x)
x = layers.Conv3D(6*kernel_size, conv_window, activation=activation, padding='same')(x)
x = layers.Conv3D(6*kernel_size, conv_window, activation=activation, padding='same', name='conv_3d')(x)
#x = layers.BatchNormalization()(x)

On top of it we stick two fully-connected layers.

In [18]:
# Flatten feature map to a 1-dim tensor so we can add fully connected layers
x = layers.Flatten(name='Flatten')(x)

# Create a fully connected layer with ReLU activation and 512 hidden units
x = layers.Dense(128, activation=activation, name='Dense_512')(x)

# Add Droptout Regularization
#x = layers.Dropout(dropout)(x)

# Create output layer with a single node and sigmoid activation
#output = layers.Dense(1, activation='sigmoid')(x)
output = layers.Dense(len(class_names), activation = 'softmax', name='Dense_2') (x)


# Create model: input = input feature map
# output = input feature map + stacked convolution/maxpooling layers + fully connected layer + sigmoid output layer
model = Model(inputs=[img_input1,img_input2], outputs=output)

Let's summarize the model architecture:

In [19]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_8 (InputLayer)            (None, 128, 128, 128 0                                            
__________________________________________________________________________________________________
input_7 (InputLayer)            (None, 128, 128, 128 0                                            
__________________________________________________________________________________________________
subtract_3 (Subtract)           (None, 128, 128, 128 0           input_8[0][0]                    
                                                                 input_7[0][0]                    
__________________________________________________________________________________________________
conv3d_9 (Conv3D)               (None, 32, 32, 32, 4 112         subtract_3[0][0]                 
__________

The "output shape" column shows how the size of your feature map evolves in each successive layer. The convolution layers reduce the size of the feature maps by a bit due to padding, and each pooling layer halves the feature map.

Next, we'll configure the specifications for model training. We will train our model with the `categircal_crossentropy` loss, because it's a multi-class classification problem and our final activation is a softmax. (For a refresher on loss metrics, see the [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/descending-into-ml/video-lecture).) We will use the `Adam` optimizer with a learning rate of `0.001`. During training, we will want to monitor classification accuracy.

**NOTE**: In this case, using the [Adam](https://wikipedia.org/wiki/Stochastic_gradient_descent#Adam) is preferable to [stochastic gradient descent](https://developers.google.com/machine-learning/glossary/#SGD) (SGD), because Adam automates learning-rate tuning for us. (Other optimizers,such as RMSprop [RMSprop optimization algorithm](https://wikipedia.org/wiki/Stochastic_gradient_descent#RMSProp) and [Adagrad](https://developers.google.com/machine-learning/glossary/#AdaGrad), also automatically adapt the learning rate during training, and would work equally well here.)

In [20]:
from tensorflow.keras.optimizers import Adam

model.compile(loss=loss,
              optimizer=optimizer,
              metrics=metrics)

### Data Preprocessing

Let's set up data generators that will read pictures in our source folders, convert them to `float32` tensors, and feed them (with their labels) to our network. We'll have one generator for the training images and one for the validation images. Our generators will yield batches of 20 images of size 150x150 and their labels (binary).

As you may already know, data that goes into neural networks should usually be normalized in some way to make it more amenable to processing by the network. (It is uncommon to feed raw pixels into a convnet.) In our case, we will preprocess our images by normalizing the pixel values to be in the `[0, 1]` range (originally all values are in the `[0, 255]` range).

In Keras this can be done via the `keras.preprocessing.image.ImageDataGenerator` class using the `rescale` parameter. This `ImageDataGenerator` class allows you to instantiate generators of augmented image batches (and their labels) via `.flow(data, labels)` or `.flow_from_directory(directory)`. These generators can then be used with the Keras model methods that accept data generators as inputs: `fit_generator`, `evaluate_generator`, and `predict_generator`.

In [21]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator


def threscut(subject_data, threshold_min=0, threshold_max=1700):# 1000,2000
    subject_data[subject_data > threshold_max] = threshold_max
    subject_data[subject_data < threshold_min] = threshold_min

    return subject_data


def normalize_3D(img_3D, nor_min=0, nor_max=1700):
    """ The shape of img_3D should be (depth, width, height)"""
    data_3D = img_3D - nor_min
    data_3D = data_3D / np.float32(nor_max-nor_min)
    #data_3D = img_3D - img_3D.min()
    #data_3D = data_3D / np.float32(img_3D.max())
    return np.asarray(data_3D, np.float32)

def hist_of_3d_array(array_3d):
    array_1d = np.reshape(array_3d, [np.size(array_3d)])
    num_bins=10
    hist = plt.hist(array_1d, num_bins, facecolor='blue', alpha=0.5)
    return(hist)

def data_augumentation_flip(array_3d):
    flip_index = np.random.randint(0,3)
    if flip_index == 0:
        array_3d = array_3d[::-1,:,:]
    if flip_index == 1:
        array_3d = array_3d[:,::-1,:]
    if flip_index == 2:
        array_3d = array_3d[:,:,::-1]
    return array_3d

def data_augumentation_noise(img_3d):
    # Create a list of intensity modifying filters, which we apply to the given images
    filter_list = []
    
    # Smoothing filters
    
    filter_list.append(sitk.SmoothingRecursiveGaussianImageFilter())
    filter_list[-1].SetSigma(2.0)
    
    filter_list.append(sitk.DiscreteGaussianImageFilter())
    filter_list[-1].SetVariance(4.0)
    
    filter_list.append(sitk.BilateralImageFilter())
    filter_list[-1].SetDomainSigma(4.0)
    filter_list[-1].SetRangeSigma(8.0)
    
    filter_list.append(sitk.MedianImageFilter())
    filter_list[-1].SetRadius(8)
    
    # Noise filters using default settings
    
    # Filter control via SetMean, SetStandardDeviation.
    filter_list.append(sitk.AdditiveGaussianNoiseImageFilter())
    filter_list[-1].SetMean(0.2)
    filter_list[-1].SetStandardDeviation(0.7)

    # Filter control via SetProbability
    filter_list.append(sitk.SaltAndPepperNoiseImageFilter())
    
    # Filter control via SetScale
    filter_list.append(sitk.ShotNoiseImageFilter())
    
    # Filter control via SetStandardDeviation
    filter_list.append(sitk.SpeckleNoiseImageFilter())

    filter_list.append(sitk.AdaptiveHistogramEqualizationImageFilter())
    filter_list[-1].SetAlpha(1.0)
    filter_list[-1].SetBeta(0.0)

    filter_list.append(sitk.AdaptiveHistogramEqualizationImageFilter())
    filter_list[-1].SetAlpha(0.0)
    filter_list[-1].SetBeta(1.0)
    
    aug_image_lists = [] # Used only for display purposes in this notebook.
    f = filter_list[4]
    aug_image = f.Execute(img_3d)
    return aug_image

def dilation(img_3d):
    d = sitk.GrayscaleDilateImageFilter()
    for i in range(8):
        img_3d = d.Execute(img_3d)
    return img_3d

def pre_process(img_3d):
    img_data = sitk.GetArrayFromImage(img_3d)
    img_data = threscut(img_data)
    img_data = normalize_3D(img_data)
    img = sitk.GetImageFromArray(img_data)
    img = dilation(img)
    return img

def generate_single_batch_train_data(data_path, batch_size):
    while 1:
        #img_data_batch = np.zeros([30,512,512,batch_size])
        #label_batch = np.zeros([1,2,batch_size])
        #for i in os.listdir(data_path):
        for j in range(batch_size):
            img_data_sum = np.zeros([1,256,128,128,1])
            size = int(img_data_sum.shape[1]/2)
            #label_sum = np.zeros([1,2])
            class_ind = np.random.randint(0,2)
                        
            #if class_ind == 0:
            data_path_1 = data_path + '/CT_full_size'
            i = np.random.randint(len(os.listdir(data_path_1)))
            
            img = sitk.ReadImage(os.path.join(data_path_1, sorted(os.listdir(data_path_1))[i]))
            img = pre_process(img)
            img_data = sitk.GetArrayFromImage(img)
            img_data = np.expand_dims(img_data, axis=0)
            img_data = np.expand_dims(img_data, axis=4)
                
            img_data_sum[0,:size,:,:,:] = img_data
            #label = np.expand_dims(label, axis=0)
            
                
            #else:
            data_path_1 = data_path + '/Probe_full_size'
            

            img = sitk.ReadImage(os.path.join(data_path_1, sorted(os.listdir(data_path_1))[i]))
            img = pre_process(img)
            img_data = sitk.GetArrayFromImage(img)
            img_data = np.expand_dims(img_data, axis=0)
            img_data = np.expand_dims(img_data, axis=4)
            img_data_sum[0,size:2*size,:,:,:] = img_data
            if class_ind ==0:
                label = np.array([1,0])
            if class_ind == 1:
                label = np.array([0,1])
            label = np.expand_dims(label, axis=0)
            
            if class_ind == 0:
                yield [img_data_sum[:,:size,:,:,:], img_data_sum[:,size:2*size,:,:,:]], label
            else:
                yield [img_data_sum[:,size:2*size,:,:,:], img_data_sum[:,:size,:,:,:]], label
                
def generate_single_batch_validation_data(data_path):
    while 1:
        for j in range(batch_size):
            img_data_sum = np.zeros([1,256,128,128,1])
            size = int(img_data_sum.shape[1]/2)
            #label_sum = np.zeros([1,2])
            class_ind = np.random.randint(0,2)
                        
            #if class_ind == 0:
            data_path_1 = data_path + '/CT_full_size'
            i = np.random.randint(len(os.listdir(data_path_1)))
            
            img = sitk.ReadImage(os.path.join(data_path_1, sorted(os.listdir(data_path_1))[i]))
            img = pre_process(img)
            img_data = sitk.GetArrayFromImage(img)
            img_data = np.expand_dims(img_data, axis=0)
            img_data = np.expand_dims(img_data, axis=4)
                
            img_data_sum[0,:size,:,:,:] = img_data
            #label = np.expand_dims(label, axis=0)
            
                
            #else:
            data_path_1 = data_path + '/Probe_full_size'
            

            img = sitk.ReadImage(os.path.join(data_path_1, sorted(os.listdir(data_path_1))[i]))
            img = pre_process(img)
            img_data = sitk.GetArrayFromImage(img)
            img_data = np.expand_dims(img_data, axis=0)
            img_data = np.expand_dims(img_data, axis=4)
            img_data_sum[0,size:2*size,:,:,:] = img_data
            if class_ind ==0:
                label = np.array([1,0])
            if class_ind == 1:
                label = np.array([0,1])
            label = np.expand_dims(label, axis=0)
            
            if class_ind == 0:
                yield [img_data_sum[:,:size,:,:,:], img_data_sum[:,size:2*size,:,:,:]], label
            else:
                yield [img_data_sum[:,size:2*size,:,:,:], img_data_sum[:,:size,:,:,:]], label
        
train_generator = generate_single_batch_train_data(base_dir+'/training', batch_size)
validation_generator = generate_single_batch_validation_data(base_dir + '/validation')

## Training
Let's train on all 42.862 images available, for 15 epochs, and validate on all validation images . (This will take a couple of hours to run.)

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint
filepath="./models/weights-improvement-{epoch:02d}-{val_loss:.2f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, mode='max', period=2)
callbacks_list = [checkpoint]
#model.load_weights('./models/work_model.h5')
history = model.fit_generator(
    train_generator,
    steps_per_epoch = batch_size,
    validation_data = validation_generator, 
    validation_steps =2,
    callbacks=callbacks_list,
    epochs = epochs)

Epoch 1/100
Epoch 2/100
Epoch 00002: saving model to ./models/weights-improvement-02-12.04.h5
Epoch 3/100
Epoch 4/100
Epoch 00004: saving model to ./models/weights-improvement-04-12.80.h5
Epoch 5/100
Epoch 6/100
Epoch 00006: saving model to ./models/weights-improvement-06-13.24.h5
Epoch 7/100
Epoch 8/100
Epoch 00008: saving model to ./models/weights-improvement-08-13.56.h5
Epoch 9/100
Epoch 10/100
Epoch 00010: saving model to ./models/weights-improvement-10-13.81.h5
Epoch 11/100
Epoch 12/100
Epoch 00012: saving model to ./models/weights-improvement-12-16.12.h5
Epoch 13/100
Epoch 14/100
Epoch 00014: saving model to ./models/weights-improvement-14-16.12.h5
Epoch 15/100
Epoch 16/100
Epoch 00016: saving model to ./models/weights-improvement-16-12.63.h5
Epoch 17/100
Epoch 18/100
Epoch 00018: saving model to ./models/weights-improvement-18-16.12.h5
Epoch 19/100
Epoch 20/100
Epoch 00020: saving model to ./models/weights-improvement-20-14.67.h5
Epoch 21/100
Epoch 22/100
Epoch 00022: saving mod

Epoch 00032: saving model to ./models/weights-improvement-32-14.80.h5
Epoch 33/100
Epoch 34/100
Epoch 00034: saving model to ./models/weights-improvement-34-15.01.h5
Epoch 35/100
Epoch 36/100
Epoch 00036: saving model to ./models/weights-improvement-36-16.12.h5
Epoch 37/100
Epoch 38/100
Epoch 00038: saving model to ./models/weights-improvement-38-15.80.h5
Epoch 39/100
Epoch 40/100
Epoch 00040: saving model to ./models/weights-improvement-40-15.91.h5
Epoch 41/100

### Visualizing Intermediate Representations

To get a feel for what kind of features our convnet has learned, one fun thing to do is to visualize how an input gets transformed as it goes through the convnet.

Let's pick a random cat or dog image from the training set, and then generate a figure where each row is the output of a layer, and each image in the row is a specific filter in that output feature map. Rerun this cell to generate intermediate representations for a variety of training images.

In [37]:
import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img

# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after
# the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
visualization_model = Model(img_input, successive_outputs)

# Let's prepare a random input image of a cat or dog from the training set.
cat_img_files = [os.path.join(train_ate_dir, f) for f in train_ate_fnames]
dog_img_files = [os.path.join(train_car_dir, f) for f in train_car_fnames]
img_path = random.choice(cat_img_files + dog_img_files)

img = load_img(img_path, target_size=target_size)  # this is a PIL image
x = img_to_array(img)  # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) + x.shape)  # Numpy array with shape (1, 150, 150, 3)

# Rescale by 1/255
x /= 255

# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)

# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]

# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
  if len(feature_map.shape) == 4:
    # Just do this for the conv / maxpool layers, not the fully-connected layers
    n_features = feature_map.shape[-1]  # number of features in feature map
    # The feature map has shape (1, size, size, n_features)
    size = feature_map.shape[1]
    # We will tile our images in this matrix
    display_grid = np.zeros((size, size * n_features))
    for i in range(n_features):
      # Postprocess the feature to make it visually palatable
      x = feature_map[0, :, :, i]
      x -= x.mean()
      x /= x.std()
      x *= 64
      x += 128
      x = np.clip(x, 0, 255).astype('uint8')
      # We'll tile each filter into this big horizontal grid
      display_grid[:, i * size : (i + 1) * size] = x
    # Display the grid
    scale = 20. / n_features
    plt.figure(figsize=(scale * n_features, scale))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')

NameError: name 'train_ate_fnames' is not defined

As you can see we go from the raw pixels of the images to increasingly abstract and compact representations. The representations downstream start highlighting what the network pays attention to, and they show fewer and fewer features being "activated"; most are set to zero. This is called "sparsity." Representation sparsity is a key feature of deep learning.


These representations carry increasingly less information about the original pixels of the image, but increasingly refined information about the class of the image. You can think of a convnet (or a deep network in general) as an information distillation pipeline.

### Evaluating Accuracy and Loss for the Model

Let's plot the training/validation accuracy and loss as collected during training:

In [38]:
# Retrieve a list of accuracy results on training and test data
# sets for each training epoch
acc = history.history['acc']
val_acc = history.history['val_acc']

# Retrieve a list of list results on training and test data
# sets for each training epoch
loss = history.history['loss']
val_loss = history.history['val_loss']

# Get number of epochs
epochs = range(len(acc))

# Plot training and validation accuracy per epoch
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Training and validation accuracy')

plt.figure()

# Plot training and validation loss per epoch
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Training and validation loss')

NameError: name 'history' is not defined

As you can see, we are **overfitting** like it's getting out of fashion. Our training accuracy (in blue) gets close to 100% (!) while our validation accuracy (in green) stalls as 70%. Our validation loss reaches its minimum after only five epochs.

Since we have a relatively small number of training examples (2000), overfitting should be our number one concern. Overfitting happens when a model exposed to too few examples learns patterns that do not generalize to new data, i.e. when the model starts using irrelevant features for making predictions. For instance, if you, as a human, only see three images of people who are lumberjacks, and three images of people who are sailors, and among them the only person wearing a cap is a lumberjack, you might start thinking that wearing a cap is a sign of being a lumberjack as opposed to a sailor. You would then make a pretty lousy lumberjack/sailor classifier.

Overfitting is the central problem in machine learning: given that we are fitting the parameters of our model to a given dataset, how can we make sure that the representations learned by the model will be applicable to data never seen before? How do we avoid learning things that are specific to the training data?

In the next exercise, we'll look at ways to prevent overfitting in the cat vs. dog classification model.

## Clean Up

Before running the next exercise, run the following cell to terminate the kernel and free memory resources:

import os, signal
os.kill(os.getpid(), signal.SIGKILL)