This notebook is for training a single large CNN with pretrained weights. It is expected to have decent performance, but will not stand up to a more complex model

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import *
import pydicom as dcm
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import os
import math
from PIL import Image

In [3]:
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
keras.mixed_precision.set_global_policy('mixed_float16')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
%matplotlib inline

INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: A100-SXM4-40GB, compute capability 8.0


In [4]:
model_path = '/home/jupyter/base-cnn-model/checkpoint.ckpt'
train_img_dir = 'rsna-intracranial-hemorrhage-detection/stage_2_train_imgs/'
train_label_path = 'rsna-intracranial-hemorrhage-detection/train_labels.csv'

In [5]:
model = keras.models.load_model(model_path)
model.compile(loss=keras.losses.BinaryCrossentropy(from_logits=False), 
              metrics=['binary_accuracy', 
                       keras.metrics.AUC(multi_label=True, num_labels=6, from_logits=False),
                       keras.metrics.Precision(), keras.metrics.Recall()],
             optimizer=keras.optimizers.Nadam(learning_rate=1e-4))

In [6]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 resnet101 (Functional)      (None, 16, 16, 2048)      42658176  
                                                                 
 global_average_pooling2d (G  (None, 2048)             0         
 lobalAveragePooling2D)                                          
                                                                 
 dense (Dense)               (None, 6)                 12294     
                                                                 
Total params: 42,670,470
Trainable params: 42,565,126
Non-trainable params: 105,344
_________________________________________________________________


In [7]:
labels = pd.read_csv(train_label_path)
labels = {l[0]: l[1:].astype(np.int8) for l in labels.to_numpy()}

In [8]:
def get_img_tensor(img_path):
    img = Image.open(img_path)
    
    return tf.convert_to_tensor(np.asarray(img, dtype=np.float32) / 255.)

In [9]:
class RSNASequence(keras.utils.Sequence):
    def __init__(self, img_dir, data, labels, batch_size):
        '''each element in the epoch_data list will represent all the data to use for one epoch.
        During training, the model will loop through epoch_data as many times as it needs to  '''
        
        self.img_dir = img_dir
        self.x = data
        self.labels = labels
        self.batch_size = batch_size
        
        # self.set_x()
        
    # def set_x(self):
        # ind = np.random.choice(list(range(len(os.listdir(self.img_dir)))), size=self.num_train, replace=False)
        # self.x = [img_name.split('.')[0] for img_name in np.array(os.listdir(self.img_dir))[ind]]
    
    def __len__(self):
        return math.ceil(len(self.x) / self.batch_size)
    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = [self.labels[img_id] for img_id in batch_x]
        
        return (tf.stack([get_img_tensor(self.img_dir+img_id+'.png') for img_id in batch_x], axis=0), 
               tf.convert_to_tensor(batch_y))
    
    def on_epoch_end(self):
        # self.set_x()
        np.random.shuffle(self.x)
    

In [10]:
batch_size = 32
imgs = list(map(lambda x: x.split('.')[0], os.listdir(train_img_dir)))
train_sequence = RSNASequence(train_img_dir, imgs, labels, batch_size)

In [11]:
cp_callback = keras.callbacks.ModelCheckpoint(filepath='reproduce_training_2/checkpoint.ckpt',
                                              save_weights_only=False,
                                              verbose=1,
                                              save_freq=1000
                                             )

In [26]:
model.fit(x=train_sequence, epochs=1, callbacks=[cp_callback])

  999/23517 [>.............................] - ETA: 2:59:43 - loss: 0.0648 - binary_accuracy: 0.9769 - auc_1: 0.9703 - precision_1: 0.8561 - recall_1: 0.7034
Epoch 00001: saving model to reproduce_training_2/checkpoint.ckpt
INFO:tensorflow:Assets written to: reproduce_training_2/checkpoint.ckpt/assets


  layer_config = serialize_layer_fn(layer)
  return generic_utils.serialize_keras_object(obj)


 1999/23517 [=>............................] - ETA: 3:01:22 - loss: 0.0648 - binary_accuracy: 0.9767 - auc_1: 0.9704 - precision_1: 0.8560 - recall_1: 0.7049
Epoch 00001: saving model to reproduce_training_2/checkpoint.ckpt
INFO:tensorflow:Assets written to: reproduce_training_2/checkpoint.ckpt/assets


  layer_config = serialize_layer_fn(layer)
  return generic_utils.serialize_keras_object(obj)


 2417/23517 [==>...........................] - ETA: 3:04:22 - loss: 0.0652 - binary_accuracy: 0.9767 - auc_1: 0.9699 - precision_1: 0.8555 - recall_1: 0.7067

KeyboardInterrupt: 

In [13]:
num_eval = 10000
eval_ind = np.random.choice(list(range(len(imgs))), size=num_eval, replace=False)
train_sequence = RSNASequence(train_img_dir, np.array(imgs)[eval_ind], labels, batch_size)
model.evaluate(x=train_sequence)



[0.07517775148153305,
 0.9747167229652405,
 0.9538996815681458,
 0.8938660025596619,
 0.6385125517845154]

This model was taken from the ResNet101, pretrained on ImageNet, which was modified to accept DICOM images of size (512,512,3) and outputs to a custom dense layer that returns 6 predictions corresponding to the probabilities of the suubtypes of ICH being present in the image. The entire network was tuned with a low learning rate (10^-4) using the Adam optimizer. This model serves as only a baseline for the minimum performance that an ICH detection model should have.

The model performs fairly well. The most important metrics here are the precision and recall, as binary accuracy is NOT indicative of model performance because most images do not contain ICH. The precision and recall during training are pretty good (~85% and ~70% respectively), so the model is usually able to find ICH in any given slice of a CT scan. However, with a recall of 70%, it is also prone to missing many cases of ICH. Additionally, we can see in the cell above that its performance varies a lot when given a random subset of data; this is likely because the number of positive classes is so small that when the model sees a positive class, its predictions are uncertain.

Also, in hindsight it was probably not necessary to include binary accuracy as an evaluation metric, because the number of negative classes far outnumber the positive classes. 

Note: it is intractable to automate the process of selecting different models and experimenting with hyperparameter settings, because models of this type take a very long amount of time at a very high computational cost (I'm using a VM with an NVIDIA A100 GPU, the absolute fastest on the market). Therefore, I was only able to experiment with 3 models.