### Building the Image Dataset

In [None]:
!wget --no-check-certificate \
    https://github.com/SebastianTianying/medical-imaging-analytics/blob/main/train.zip?raw=true \
    -O /tmp/train.zip


In [None]:
!wget --no-check-certificate \
    https://github.com/SebastianTianying/medical-imaging-analytics/blob/main/test.zip?raw=true \
    -O /tmp/test.zip

The following python code will use the OS library to use Operating System libraries, giving you access to the file system, and the zipfile library allowing you to unzip the data. 

In [None]:
import os
import zipfile

local_zip = '/tmp/train.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp')

local_zip = '/tmp/test.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp')

zip_ref.close()

The contents of the .zip are extracted to the directory `/tmp/train`, which in turn each contain `positive` and `negative` subdirectories.

we will use the [ImageGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) class to automatically create our dataset from this train directory, using the subdirectories for classes.

Let's define each of these subdirectories:

In [None]:
train_positive_dir = os.path.join('/tmp/train/positive')

train_negative_dir = os.path.join('/tmp/train/negative')

valid_positive_dir = os.path.join('/tmp/test/positive')

valid_negative_dir = os.path.join('/tmp/test/negative')

Now, let's see what the filenames look like in the `positive` and `negative` training directories:

In [None]:
train_positive_names = os.listdir(train_positive_dir)
print(train_positive_names[:10])

train_negative_names = os.listdir(train_negative_dir)
print(train_negative_names[:10])

validation_positive_names = os.listdir(valid_positive_dir)
print(validation_positive_names[:10])

validation_negative_names = os.listdir(valid_negative_dir)
print(validation_negative_names[:10])

Let's find out the total number of positive and negative images in the directories:

In [None]:
print('total training positive images:', len(os.listdir(train_positive_dir)))
print('total training negative images:', len(os.listdir(train_negative_dir)))
print('total validation positive images:', len(os.listdir(valid_positive_dir)))
print('total validation negative images:', len(os.listdir(valid_negative_dir)))

Now let's take a look at a few pictures to get a better sense of what they look like. First, configure the matplot parameters:

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4

# Index for iterating over images
pic_index = 0

Now, display a batch of 8 negative and 8 positive pictures. You can rerun the cell to see a fresh batch each time:

In [None]:
# Set up matplotlib fig, and size it to fit 4x4 pics
fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 4)

pic_index += 8
next_positive_pic = [os.path.join(train_positive_dir, fname)
                for fname in train_positive_names[pic_index-8:pic_index]]
next_negative_pic = [os.path.join(train_negative_dir, fname)
                for fname in train_negative_names[pic_index-8:pic_index]]

for i, img_path in enumerate(next_positive_pic + next_negative_pic):
  # Set up subplot; subplot indices start at 1
  sp = plt.subplot(nrows, ncols, i + 1)
  sp.axis('Off') # Don't show axes (or gridlines)

  img = mpimg.imread(img_path)
  plt.imshow(img)

plt.show()


#### Data Preprocessing

Now, let's use `keras.preprocessing.image.ImageDataGenerator` class to create our train and validation dataset and normalize our data. 

It's important to normalize our data because data going into our CNN to improve its overall performance. We will use the `rescale` parameter to scale our image pixel values from [0, 255] to [0,1].

In each generator, we specify the source directory of our images, the classes, the input image size, the batch size (how many images to process at once), and class mode.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1/255)
validation_datagen = ImageDataGenerator(rescale=1/255)

# Flow training images in batches of 120 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
        '/tmp/train/',  # This is the source directory for training images
        classes = ['positive', 'negative'],
        target_size=(200, 200),  # All images will be resized to 200x200
        batch_size=120,
        # Use binary labels
        class_mode='binary')

# Flow validation images in batches of 19 using valid_datagen generator
validation_generator = validation_datagen.flow_from_directory(
        '/tmp/test/',  # This is the source directory for training images
        classes = ['positive', 'negative'],
        target_size = (200, 200),  # All images will be resized to 200x200
        batch_size = 19,
        # Use binary labels
        class_mode='binary',
        shuffle=False)

## Building the Model from Scratch

But before we continue, let's start defining the model:

Step 1 will be to import tensorflow.

In [None]:
import tensorflow as tf
import numpy as np
from itertools import cycle

from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp
from sklearn.metrics import roc_auc_score

Let's then add a Flatten layer that flattens the input image, which then feeds into the next layer, a Dense layer, or fully-connected layer, with 128 hidden units. Finally, because our goal is to perform binary classification, our final layer will be a sigmoid, so that the output of our network will be a single scalar between 0 and 1, encoding the probability that the current image is of class 1

In [None]:
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape = (200,200,3)), 
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)])

The model.summary() method call prints a summary of the NN 

In [None]:
model.summary()

The "output shape" column shows the transformation of the dimensions of each layer as a result of the convolution and max pooling - convolution will reduce the layer size by a bit due to padding, and max pooling will halve the output size.

Next, we'll configure the specifications for model training. We will train our model with the `binary_crossentropy` loss. We will use the `Adam` optimizer. [Adam](https://wikipedia.org/wiki/Stochastic_gradient_descent#Adam) is a sensible optimization algorithm because it automates learning-rate tuning for us (alternatively, we could also use [RMSProp](https://wikipedia.org/wiki/Stochastic_gradient_descent#RMSProp) or [Adagrad](https://developers.google.com/machine-learning/glossary/#AdaGrad) for similar results). We will add accuracy to `metrics` so that the model will monitor accuracy during training

In [None]:
model.compile(optimizer = tf.optimizers.Adam(),
              loss = 'binary_crossentropy',
              metrics=['accuracy'])

### Training

In [None]:
History = model.fit(train_generator,
      steps_per_epoch=2,
      epochs=20,
      verbose=1,
      validation_data = validation_generator,
      validation_steps= 8)

Save the model

Note: to load model on raspberry pi see: https://www.tensorflow.org/tutorials/keras/save_and_load

"Reload a fresh Keras model from the saved model:

new_model = tf.keras.models.load_model('saved_model/my_model') 

..."

In [None]:
from google.colab import files

!mkdir -p saved_model
model.save('saved_model/my_model')

#!zip -r /content/"saved_model.zip" . -i /content/"saved_model"

#files.download('saved_model.zip')

## Accuracy



Let's evaluate the accuracy of our model:

In [None]:
model.evaluate(validation_generator)

## Making Predictions

Now, let's use the model to make predictions! Upload an image to see if it's a positive or negative.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from google.colab import files
from keras.preprocessing import image

uploaded = files.upload()

for fn in uploaded.keys():
 
  # predicting images
  path = '/content/' + fn
  img = image.load_img(path, target_size=(200, 200))
  x = image.img_to_array(img)
  plt.imshow(x/255.)
  x = np.expand_dims(x, axis=0)
  images = np.vstack([x])
  classes = model.predict(images, batch_size=10)
  print(classes[0])
  if classes[0]<0.5:
    print(fn + " is positive")
  else:
    print(fn + " is negative")
 