

Copyright (c) University of Strasbourg. All Rights Reserved.

<a href="https://colab.research.google.com/github/CAMMA-public/ai4surgery/blob/master/ai4surgery.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div>
<a href="http://camma.u-strasbg.fr/">
<img src="https://raw.githubusercontent.com/CAMMA-public/ai4surgery/master/figs/data_camma_logo_tr.png" width="100" align="left"/>
</a>
</div>

<h1><center>AI for surgery: Hands-on material</center></h1>


**Artificial Intelligence in Surgery: An AI Primer for Surgical Practice**

**Neural Networks and Deep Learning**

_Deepak Alapatt and Pietro Mascagni, Vinkle Srivastav, Nicolas Padoy_


# Introduction

This hands-on lab demonstrates how designing and training a neural network works in practice. Specifically, in this notebook we will develop and train a simple neural network for surgical tool detection in laparoscopic images. With the advent of advanced deep learning software libraries like tensorflow, many of the steps involved have been abstracted out allowing for a much simpler process.

# Basics

Below is a code cell that assigns the value 3 and 4 to variables *a* and *b*, respectively. Then the function *max* assigns the maximum value of *a* and *b* to a variable *c*. The values stored in all three variables are then printed to the screen.

In [None]:
# Comments following a # can be added and won't affect the code flow

a = 3
b = 4
c = max(a, b)                                             #assigns the maximum value of a and b to c
print(a, b, c)

To run the code in the above cell, select it with a click and then either press the play button on the left of the cell, or use the shortcut "Command/Ctrl+Enter". To edit the code, just type directly into the cell and re-run the cell to see the result.


# Imports

We import some libraries which contain functions that are useful for building and visualizing neural networks. For those unfamiliar with programming, just execute the cell below and move on to the next section.

In [None]:
#Install needed libraries

!pip install -q tensorflow==2.2.0
!pip install -q matplotlib==3.2.2
!pip install -q numpy==1.18.5

#Import needed libraries

# Tensorflow contains functions needed to build and train neural networks
import tensorflow as tf
# matlpotlib is a plotting library to help visualize our inputs and outputs
import matplotlib.pyplot as plt
# numpy is a library to simplify operations on matrices and some other mathematical objects
import numpy as np
# random is a library to help generate random numbers, make random choices, etc
import random

print("Libraries successfully imported!")

# Dataset

The neural network will take an input image containg a surgical tool tip and classify the image as showing either a **grasper**, a **hook**, a **clipper** or a **scissor**. For this purpose, we will use **Cholec-tinytools**, a dataset of images of surgical tool tips generated from **Cholec80**. **Cholec80** is a vastly used dataset containing 80 laparoscopic cholecystectomy videos annotated with surgical phases and tool presence released by the [CAMMA research group](http://camma.u-strasbg.fr/) (University of Strasbourg, France).


![](https://raw.githubusercontent.com/CAMMA-public/ai4surgery/master/figs/dataset.png)

In [None]:
# Download and extract the tool tip dataset from an online repository 

DATA_URL = 'https://s3.unistra.fr/camma_public/github/ai4surgery/cholec-tinytools.zip'
path = tf.keras.utils.get_file('cholec-tinytools.zip', DATA_URL, extract=True)  
path = path.strip('.zip')                                     #Stores the dataset in the variable "path"

print("Dataset successfully extracted!")

Note: The code above will return an error ("name 'tf' is not defined") if you didn't import the libraries!

In [None]:
# Define the dataset characteristics

TRAINING_SET_SIZE = 1200                                      #~60% of the dataset for training
VALIDATION_SET_SIZE = 200                                     #~10% of the dataset for validation
TEST_SET_SIZE = 599                                           #~30% of the dataset for testing
CLASS_NAMES = ['grasper', 'hook', 'scissor', 'clipper']       # The names of surgical tools we want to classify
NUMBER_CLASSES = 4                                            # The numbers of surgical tools we want to classify
IMAGE_SIZE = (86, 128)                                        # Each input image will have a resolution of 86x128
CHANNELS = 3                                                  # RGB IMAGES are represented with 3 channels (red, blue, green)
BATCH_SIZE = 16                                               # Number of images per batch

print("Dataset charachteristics defined!")

Each pixel of the RGB image will be represented as a combination of red, green and blue values each between 0-255.


In [None]:
# Preprocess the images in the dataset

# To artificially add additional samples to the training set, each training image is randomly rotated between 0-20 degrees.
train_preprocessing = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, rotation_range=20)
train_set = tf.keras.preprocessing.image.DirectoryIterator(
    path +'/train',
    image_data_generator=train_preprocessing,
    batch_size=BATCH_SIZE,
    target_size=IMAGE_SIZE,
    classes=CLASS_NAMES,
    shuffle=True,
)

# The validation and test images will also be scaled down by 255 for consistency.
validation_test_preprocessing = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
validation_set = tf.keras.preprocessing.image.DirectoryIterator(
    path +'/validation',
    image_data_generator=validation_test_preprocessing,
    batch_size=BATCH_SIZE,
    classes=CLASS_NAMES,
    target_size=IMAGE_SIZE
)

test_set = tf.keras.preprocessing.image.DirectoryIterator(
    path +'/test',
    image_data_generator=validation_test_preprocessing,
    batch_size=BATCH_SIZE,
    classes=CLASS_NAMES,
    target_size=IMAGE_SIZE
)

print("\nDataset successfully preprocessed!")

In [None]:
#Preview 4 random images from the training set

fig, axes = plt.subplots(1,4, figsize=(15, 15))               #define the figure using a matplotlib.pyplot function
random_batch =  random.randint(0, len(train_set)-1)
random_images, random_labels = train_set[random_batch]

for axs in axes:                                              #for loops are used to iterate over a sequence   
    i =  random.randint(0, len(random_images)-1)
    img = random_images[i]
    axs.imshow(img)
    axs.axis("off")
    axs.set_title(CLASS_NAMES[np.argmax(random_labels[i])])

Rerun the cell above to visualize the different appearances of the 4 types of tool tips in the dataset.

# Convolutions



<div>
<center>
<img src="https://raw.githubusercontent.com/CAMMA-public/ai4surgery/master/figs/convolution.gif" width="350" /> </center>
</div>

Here we will apply filters for edge detection to demonstrate the kind of information a single convolutional filter can extract from an image.

Kernels for edge detection

<div>
<center>
<img src="https://raw.githubusercontent.com/CAMMA-public/ai4surgery/master/figs/sobel_filter.png" width="450" /> </center>
</div>


The above kernels or filters contain [known weights](https://en.wikipedia.org/wiki/Sobel_operator) for horizontal and vertical edges detection.


In [None]:
# Defining tensors representing edge detection kernels

# Horizontal filter (w_h)
w_h = tf.constant([[-1,0,1],[-2,0,2],[-1,0,1]], dtype=tf.float32, shape=(3, 3, 1, 1))

# Transpose for vertical filter (w_v)
w_v = tf.transpose(w_h, [1,0,2,3])

# Display filters
print('Horizontal filter')
print(w_h[:,:,0,0].numpy())

print('Vertical filter')
print(w_v[:,:,0,0].numpy())

print("Edge detectors tensors successfully defined")

In [None]:
# Apply a convolution for edge detection

# Read image
img = plt.imread(path + "/train/hook/7013_16226.png")

# Convert the image to grayscale
img_gray = tf.image.rgb_to_grayscale(img)

# Expand to have 4 dimensional (4D) image tensor
# This is done because the library expects the input to the convolution to be in the shape 
# Batch x Height x Width x Channels
img_4d = tf.expand_dims(img_gray, 0)

# Convolution on the image with horizontal filter
conv_h = tf.nn.conv2d(input=img_4d, filters=w_h, strides=1, padding="SAME") # The filter is shifted of 1 pixel (strides = 1) at every step, zero padding is applied

# Convolution on the image with vertical filter
conv_v = tf.nn.conv2d(input=img_4d, filters=w_v, strides=1, padding="SAME") # The outputs will have the same spatial dimensions as its inputs (strides=1 , padding="SAME")

# Combine conv_h and conv_w. root(horizontal_edges^2 + vertical_edges^2)
conv_img = tf.sqrt(tf.pow(conv_h,2) + tf.pow(conv_v,2))

# Visualize  
fig = plt.figure(figsize=(30,5))

# Display input
fig.add_subplot(1, 5, 1); plt.imshow(img_4d.numpy()[0,:,:,0], 'gray'); plt.title("Input Gray"); plt.axis("off")

# Display detected horizontal edges
fig.add_subplot(1, 5, 2); plt.imshow(conv_h.numpy()[0,:,:,0], 'gray'); plt.title("Horizontal edges"); plt.axis("off")

# Display detected horizontal edges
fig.add_subplot(1, 5, 3); plt.imshow(conv_v.numpy()[0,:,:,0], 'gray'); plt.title("Vertical edges"); plt.axis("off")

# Display combined horizontal and vertical edges
fig.add_subplot(1, 5, 4); plt.imshow(conv_img.numpy()[0,:,:,0], 'gray'); plt.title("Combined"); plt.axis("off")

plt.show()

When building neural networks, filter weights will be learned to extract and aggregate information relevant to the target task.

## Convolution using `tf.keras`



Keras is a high-level interface library running on top of TensorFlow and other machine learning frameworks. It was designed to be more intuitive and user-friendly to enable fast experimentation with deep neural networks.

In [None]:
#Use keras for convolution and visualize the result

# Read image
img      = plt.imread(path + "/train/hook/7013_16226.png")

# Expand to have 4 dimensional (4D) image tensor
# Batch x Height x Width x Channels
img_4d   = tf.expand_dims(img, 0)

# Convolution operation
conv_img = tf.keras.layers.Conv2D(filters=3, kernel_size=(3, 3), strides=1, padding="SAME", input_shape=(1, 86, 128, 3))(img_4d)
conv_img = conv_img.numpy()[0]

# Normalize so that our output values lie within a valid range for display, e.g. 0-1
min, max = conv_img.min(), conv_img.max()
conv_img = (conv_img - min)/(max-min)

# Visualize
fig = plt.figure(figsize=(15,5))

# Display input
fig.add_subplot(1, 2, 1); plt.imshow(img);plt.title("Input"); plt.axis("off")

# Display convolution output
fig.add_subplot(1, 2, 2); plt.imshow(conv_img);plt.title("Output"); plt.axis("off")

# Display convolution output
fig.add_subplot(1, 2, 2); plt.imshow(conv_img);plt.title("Output"); plt.axis("off")

plt.show()

Here the kernel weights are randomly initiated and not trained to extract any particular feature, hence the output changes at every execution. It demonstrates the kind of information that can get filtered out or focussed on by a single convolution.


## Pooling


<div>
<center>
<img src="https://raw.githubusercontent.com/CAMMA-public/ai4surgery/master/figs/pooling.png" width="600" /> </center>
</div>


Pooling layers apply mathematical operations such as averaging and maximum to reduce the size of the inputs. Such matemathical operations are applied thorugh sliding filters, similar to convolutional filters. 

In [None]:
# Prepare and visualize an input image

# Read image
img = plt.imread(path + "/train/hook/7013_16226.png")

# Expand to have 4 dimensional (4D) image tensor
img_4d = tf.expand_dims(img, 0)

# Visualize input
img = img_4d.numpy().astype(np.int)[0]
plt.imshow(img_4d.numpy()[0])
plt.title("input")
plt.axis("off")
plt.show()

We will now apply max pooling and average pooling to visualize the different effects on the inputs and experiments with different filter sizes. Feel free to explore other filter sizes.

In [None]:
# Experimenting with max pooling filter sizes

# Max Pooling using 5x5 kernel, strides of 2 with padding
maxpooled_img5x5 =  tf.keras.layers.MaxPool2D(pool_size=(5, 5), strides=2, padding='valid')(img_4d)

# Max Pooling using 10x10 kernel, strides of 2 with padding
maxpooled_img10x10 = tf.keras.layers.MaxPool2D(pool_size=(10, 10), strides=2, padding='valid')(img_4d)

# Max Pooling using 20x20 kernel, strides of 2 with padding
maxpooled_img20x20 =  tf.keras.layers.MaxPool2D(pool_size=(20, 20), strides=2, padding='valid')(img_4d)

# Visualize max pooling output
fig = plt.figure(figsize=(10,5))

fig.add_subplot(1, 3, 1); plt.imshow(maxpooled_img5x5.numpy()[0]);plt.title("max pool 5x5"); plt.axis("off")

fig.add_subplot(1, 3, 2); plt.imshow(maxpooled_img10x10.numpy()[0]);plt.title("max pool 10x10"); plt.axis("off")

fig.add_subplot(1, 3, 3); plt.imshow(maxpooled_img20x20.numpy()[0]);plt.title("max pool 20x20"); plt.axis("off")

plt.show()

In [None]:
#Experimenting with average pooling filter sizes

# Average Pooling using 5x5 kernel, strides of 2 with padding
avgpooled_img5x5 = tf.keras.layers.AveragePooling2D(pool_size=(5, 5), strides=2, padding='valid')(img_4d)

# Average Pooling using 10x10 kernel, strides of 2 with padding
avgpooled_img10x10 = tf.keras.layers.AveragePooling2D(pool_size=(10, 10), strides=2, padding='valid')(img_4d)

# Average Pooling using 20x20 kernel, strides of 2 with padding
avgpooled_img20x20 = tf.keras.layers.AveragePooling2D(pool_size=(20, 20), strides=2, padding='valid')(img_4d)

# Visualize average pooling output
fig = plt.figure(figsize=(10,5))

fig.add_subplot(1, 3, 1); plt.imshow(avgpooled_img5x5.numpy()[0]);plt.title("avg pool 5x5"); plt.axis("off")

fig.add_subplot(1, 3, 2); plt.imshow(avgpooled_img10x10.numpy()[0]);plt.title("avg pool 10x10"); plt.axis("off")

fig.add_subplot(1, 3, 3); plt.imshow(avgpooled_img20x20.numpy()[0]);plt.title("avg pool 20x20"); plt.axis("off")

plt.show()

Note that the pooled figures have been scaled up for visualization.


# Tool Classification

It is now time to build a neural network for surgical tool classification. The classification model is defined below as a sequential stack of layers (picturized below).


<div>
<center>
<img src="https://raw.githubusercontent.com/CAMMA-public/ai4surgery/master/figs/cnn.png" width="750" /> </center>
</div>

In [None]:
#Defining the neural network architecture

model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(filters=16, kernel_size=5, activation="relu", input_shape=(86, 128, 3))) # Adding a convolution with 16 5x5 filters followed by a ReLU activation
model.add(tf.keras.layers.MaxPooling2D(pool_size=(5,5)))                        # Adding max pooling over 5x5 patches of the previous layers output
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation="relu")) # Adding a second convolution with 32 3x3 filters followed by a ReLU activation
model.add(tf.keras.layers.Flatten())                                            # Adding a layer to flatten the multidimensional (HxWxC) input
model.add(tf.keras.layers.Dense(units=4096, activation="relu"))                 # Adding a fully connected layer with 4096 outputs followed by a ReLU activation
model.add(tf.keras.layers.Dense(units=2048, activation="relu"))                 # Adding a fully connected layer with 2048 outputs followed by a ReLU activation
model.add(tf.keras.layers.Dense(units=NUMBER_CLASSES, activation="softmax"))    # Adding a fully connected layer with 4 outputs followed by a softmAx activation

print("Neural network architecture successfully defined!")

Notes:
*   The output of the convolutional layer is a 3 dimensional HxWxC for each element in the batch. This should be flattened to a single row of H*W*C elements before applying a fully connected layers.
*   Fully connected layers are sometimes called dense layers because each output (here referred to as units) is densely connected to the previous layer.
*   The model ends with a fully connected layer with 4 outputs and a softmax activation. This activation function is used to convert the output of the previous fully connected layer into a vector of 4 probabilities that sum to one, i.e. the probability of each surgical instrument to be in the image.


In [None]:
#Defining the optimization method, a loss function and a metric

opt = tf.keras.optimizers.SGD(learning_rate=0.01)                                   # The optimization method is stochasitc gradient descent(SGD)
model.build([1, 86, 128, 3])                                                        # We feed the a sample input shape to build our model batch size x height x width x channels
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) # Categorical cross entropy is commonly used loss for classification problems
model.summary()                                                                     # model.summary generates a neat summary listing the number of parameters in total and by layer

Note **None** indicates that any batch size can be passed through the same network using the same architecture.

Let's run the untrained network on a single image to see what the network input and output look like.

In [None]:
# Read image
img = plt.imread(path + "/train/hook/7013_16226.png")
print('Input:')
plt.imshow(img)
plt.show()

# Expand to have 4 dimensional (4D) image tensor
img_4d = tf.expand_dims(img, 0)

prediction = model.predict(img_4d)                                   # Uses the untrained model defined above to predict  
print('Output:')
print(prediction[0])
print(CLASS_NAMES)

The network predicts 4 probability values corresponding to every considered class. Note that since we used a softmax activation in our last layer, the 4 probabilities sum to 1. Right now, the predictions are random; we will train the network to learn parameters to make better predictions.

In [None]:
#Training our neural network
history = model.fit(train_set, validation_data=validation_set, epochs=15)       # The model will iterate over the training data 15 times (epochs)

#Plots the validation and test results for each training epoch
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'])
plt.show()

The above graphic plots the neural network accuracy on the training set and on the validation set over training epochs. When the two lines converge training is effective, when those diverge (i.e. higher accuracy on the training set then on the validation set) the model is overfitting to the training data. An overfitted model will fail to generalize to unseen data (i.e. will perform poorly on test data). 

In [None]:
 model.evaluate(test_set)

The **evaluate** function returns the model loss and accuracy (first and second number in squared brackets, respectively) on the test set.

# Design your surgical tool classifier

Here's a copy of the same model to play around with. Your aim shuld be to design a network architecture and pick its corresponding hyperparameters to maximize the validation accuracy.

*   See what effect increasing and decreasing the number of epochs has on training
*   Play around with the learning rate to see what's the optimal value
*   Try changing the parameters (number of filters, kernel size, etc...)
*   Try out different activation functions ("tanh", "relu", etc...)
*   Try adding or removing layers


In [None]:
my_model = tf.keras.Sequential()
my_model.add(tf.keras.layers.Conv2D(filters=16, kernel_size=5, activation="relu"))
my_model.add(tf.keras.layers.MaxPooling2D(pool_size=(5,5)))
my_model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation="relu"))
my_model.add(tf.keras.layers.Flatten())
my_model.add(tf.keras.layers.Dense(units=4096, activation="relu"))
my_model.add(tf.keras.layers.Dense(units=2048, activation="relu"))
my_model.add(tf.keras.layers.Dense(units=NUMBER_CLASSES, activation="softmax"))

opt = tf.keras.optimizers.SGD(learning_rate=0.01)
my_model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

history = my_model.fit(train_set, validation_data=validation_set, epochs=15)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'])
plt.show()

In [None]:
my_model.evaluate(test_set)