#Introduction
We are going to classify PET and CT images using a simple convolutional neural network. The data come from The Cancer Imaging Archive:
> _Martin Vallières, Emily Kay-Rivest, Léo Jean Perrin, Xavier Liem, Christophe Furstoss, Nader Khaouam, Phuc Félix Nguyen-Tan, Chang-Shu Wang, Khalil Sultanem. (2017). Data from Head-Neck-PET-CT. The Cancer Imaging Archive. doi: 10.7937/K9/TCIA.2017.8oje5q00_

Images have been cropped to the head and neck area and converted into .png axial slices. 

Note: make sure your session is connected to a GPU. From the dropdown menu: __Runtime > Change Runtime Type > GPU (or TPU)__

## Download data to Colab environment
First we need to download the data (hosted in cloud storage) into our Colab session. <br>
Then we untar and decompress it "locally" to our /home/ folder.

In [0]:
# download the data from the cloud
! wget -N https://f001.backblazeb2.com/file/snmmi-hands-on-ai/hn_tcia_classification.tar.gz -P /home

# this should take a few seconds, please be patient
print('decompressing...')
! tar xzf "/home/hn_tcia_classification.tar.gz" --directory /home
print('done!')

## Set up the image generators

There are some deprecation warnings that I want to ignore during this session (related to tensorflow version issues)

In [0]:
from tensorflow.python.util import deprecation
deprecation._PRINT_DEPRECATION_WARNINGS = False

Data is split into train (50%), validate (25%), and test (25%).
We'll use Kera's ImageDataGenerator method to read in the data. Data (.png files) needs to be sorted into folders with the following structure <br>
>__train__/<br>
&ensp;&ensp;Class1/<br>
&ensp;&ensp;&ensp;&ensp;xx1.png<br>
&ensp;&ensp;&ensp;&ensp;xx2.png<br>
&ensp;&ensp;&ensp;&ensp;...<br>
&ensp;&ensp;Class2/<br>
&ensp;&ensp;&ensp;&ensp;yy1.png<br>
&ensp;&ensp;&ensp;&ensp;yy2.png<br>
&ensp;&ensp;&ensp;&ensp;...<br>
__validation__/<br>
&ensp;&ensp;Class1/  ...<br>
&ensp;&ensp;Class2/  ...<br>
__test__/<br>
&ensp;&ensp;Class1/ ...<br>
&ensp;&ensp;Class2/ ...<br>

We tell Keras where the directories are. It counts the number of subfolders and makes each one a class.

Need to import the packages you'll need. From Keras, we'll first need a data generator package.

In [0]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

When we define the ImageDataGenerator object, we tell it to normalize the .png images by the max (255)

In [0]:
train_datagen = ImageDataGenerator(rescale=1./255)
valid_datagen = ImageDataGenerator(rescale=1./255)

Keras will read the files continuously from disk. We tell it where to read, how many to read at a time, what dimensions to resample the images to, and how many image channels there are. These generators will then generate batches of images. 

In [0]:
data_home_dir = '/home/hn_tcia_classification/'
train_dir = data_home_dir + 'train'
validation_dir = data_home_dir + 'validation'

In [0]:
dims = 144
batch_size = 32

train_generator =      train_datagen.flow_from_directory(train_dir, batch_size=batch_size, target_size=(dims,dims), class_mode='binary', color_mode='grayscale')
validation_generator = valid_datagen.flow_from_directory(validation_dir,batch_size=batch_size, target_size=(dims,dims), class_mode='binary',color_mode='grayscale')

Let's take a look at some example images, just to make sure they are what we expect them to be!

In [0]:
#import packages
import numpy as np
from matplotlib import pyplot as plt
# set plotting to be in-line
%matplotlib inline

#get a batch from the ImageDataGenerator, together with the labels for that batch
[random_batch, labels] = train_generator.next() 
#print first 5 labels
print("Labels: " + str(labels[0:5]))
#concatenate 5 images together and display
fig = plt.figure(figsize=(15,10))
plt.imshow(np.concatenate((random_batch[0,:,:,0], random_batch[1,:,:,0], random_batch[2,:,:,0], random_batch[3,:,:,0], random_batch[4,:,:,0],),axis=1), cmap='gray')
plt.show()


## Build network 
Import some packages needed to build and train the network.

In [0]:
from tensorflow.keras import optimizers
from tensorflow.keras import Model 
from tensorflow.keras import layers

First part of the graph is the input, which, at this point, we only need to tell it its shape (we'll define where the inputs come from when we build the model later)

In [0]:
img_input = layers.Input(shape=(dims,dims,1))

Now we build our layers of the network. The format is layer_name(_config_info_)(_input_to_layer_).
Try a simple layer with 1 convolution, max pooling, and a fully-connected layer (these are _not_ the best parameters).

In [0]:
x = layers.Conv2D(15, (3, 3), strides=(4,4), padding='same')(img_input)
x = layers.Activation("relu")(x)
x = layers.MaxPooling2D((2, 2), strides=None)(x)
x = layers.Flatten()(x)     #reshape to 1xN 
x = layers.Dense(20, activation='relu')(x)
x = layers.Dense(1, activation='sigmoid')(x)    #sigmoid for binary

## Configure and train model
We define our model, define the input(s) and output(s). 

In [0]:
model = Model(inputs=img_input, outputs=x)

We then compile it and determine our loss function, our optimizer, and the metrics we want to calculate. This builds the "graph" of our model and computes the functions needed to train it.

In [0]:
model.compile(loss = "binary_crossentropy", optimizer = optimizers.RMSprop(lr=1e-5), metrics=["accuracy"])

This next steps kicks off the network training. This is where we actually feed the compiled model the data (in batches).

In [0]:
history = model.fit_generator(train_generator, steps_per_epoch=len(train_generator), epochs=10, 
                              validation_data=validation_generator, validation_steps=len(validation_generator))

## Evaluate performance

Plot the results using matplotlib

In [0]:
from matplotlib import pyplot as plt
import numpy as np
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(1,len(acc)+1)
plt.plot(epochs,acc,'bo', label='Training acc')
plt.plot(epochs,val_acc,'b', label='Validation acc')
plt.legend()
plt.show()

Evaluate performance in test data set

In [0]:
test_dir = data_home_dir + 'test'
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(test_dir,batch_size=batch_size, target_size=(dims,dims), class_mode='binary',color_mode='grayscale')

#now evaluate the model using the generator
[test_loss, test_acc] = model.evaluate_generator(test_generator, steps=len(test_generator))
print("Test_acc: "+str(test_acc))