###  <span style="color:red">**This Notebook can be run from Google Colab:**</span>

https://colab.research.google.com

# **<span style="color:red">Background:</span>**

#### First, we trained a model using positive patches only (only patches actually containing regions corresponding to growing bacterial colonies in the petri-dish), to specifically differentiate among the 8 bacterial species in our dataset. From the confusion matrix of that model, we could see that the model is having difficulty to differentiate between classes 'C1' and 'C2-3' and between classes 'C4-7' and 'C5'.<br>

#### As a next step, we then trained a model to specifically learn to differentiate 'C1' vs 'C2-3' vs 'all_other' classes. This is a model with 3 classes only.<br>

#### Similarly, we also trained a model to specifically learn to differentiate 'C4-7' vs 'C5' vs 'all_other' classes.<br>

#### WE also trained a model just to differentiate between positive bacterial colony patches (of any class) and negative patches (either petri-dish background, petri-dish border or white image background). For this, we just combined all positive patches (regardless of the bacterial species) in a single 'positive' class and all negative patches in a single 'negative' class.<br> 

#### We then had 4 models, the first one producing 8 predicted probabilities (one for each baterial species), the second one producing 3 predicted probabilities ('C1','C2-3','all_other'), the third one also producing 3 predicted probabilities ('C4-7','C5','all_other') and the fourth one producing 2 predicted probabilities (positive_patch, negative_patch), for a total of 16 predicted probabilities.<br>

#### With those 4 models, using an augmented validation dataset with 9 classes (8 bacterial species + negative_patches), we combined the 16 predicted probabilities as features and y_true, into a training dataset which we used to use to train a simple kernel SVM, to learn to predict either negative or the correct bacterial species, from the probabilities produced by the 4 models above.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
import zipfile
import shutil
import json
import pickle

import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model

from sklearn.metrics import accuracy_score, confusion_matrix, \
                            classification_report, balanced_accuracy_score

# Import PyDrive and associated libraries (to connect with GoogleDrive):
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# disable warnings
import warnings
warnings.simplefilter("ignore")
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

Using TensorFlow backend.


### **Check if we are using GPU:**

In [2]:
from keras import backend as K
if K.backend() == "tensorflow":
    import tensorflow as tf
    device_name = tf.test.gpu_device_name()
    if device_name == '':
        device_name = "None"
    print('Using TensorFlow version:', tf.__version__, ', GPU:', device_name)

Using TensorFlow version: 1.15.0 , GPU: /device:GPU:0


### **Download Validation ('Control') patches from GoogleDrive:**

### This dataset contains 9 classes in total (8 bacterial species + negative patches).

#### *Validation Patches were augmented with Patch_Generator, using 'stride=22' and rotations every 20 degrees until a full lap. Patches were then balanced by downsampling majority classes so we can compare accuracy of the model.*

###  **NOTE: Validation patches were generated from original, non-preprocessed images. In this way, we will ensure our model perform well at testing time when pre-processing may not be feasible. As example, being able to create masks/image annotation may not be feasible on testing data.**



In [3]:
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

#"Control_for_final_training_9_classes_128_vs22_minval_1024_rotate_every_20_full_balance.zip":
file_id = '1bebJEgFoWq04jX-6ZfqZPxAoxH5j4G9e' # Control only, rotate every 20, full balance

downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile(downloaded['title'])
print('Downloaded content: "{}"'.format(downloaded['title']))
print('Root dir content: {}'.format(os.listdir()))
patches_zip = downloaded['title']

Downloaded content: "Control_for_final_training_9_classes_128_vs22_minval_1024_rotate_every_20_full_balance.zip"
Root dir content: ['.config', 'adc.json', 'Control_for_final_training_9_classes_128_vs22_minval_1024_rotate_every_20_full_balance.zip', 'sample_data']


### **Unzip the Validation ('Control') patches:**

In [4]:
# Remove 'Patches' dir if it already exists
if 'Patches' in os.listdir():
  shutil.rmtree('./Patches')
with zipfile.ZipFile(patches_zip,"r") as zip:
    zip.extractall()
os.remove(downloaded['title'])
print('Root dir content: {}'.format(os.listdir()))

Root dir content: ['.config', 'Patches', 'adc.json', 'sample_data']


### **Let's count patches by type and class:**

In [5]:
classes = ['C1','C2-3','C4-7','C5','C6','C8','C9','C10','neg']
class_weights = {} # empty dictionary to store class weights

grand_total = 0
for type_ in ['Serial', 'Control', 'Streak']:
    print("\nTotal '{}' Patches per location:".format(type_))
    n_type = 0
    class_weights[type_] = {} # nested empty dictionary to store class weights
    for cls in classes:
        if cls != 'neg':
            pos_folder = './Patches/{}/{}_pos'.format(type_,cls)
        else:
            pos_folder = './Patches/{}/{}'.format(type_,cls)
        n_pos = len(os.listdir(pos_folder))
        n_type += n_pos
        #print(pos_folder, n_pos)
        print('total_{}: {}'.format(cls,n_pos))
        class_weights[type_]['{}'.format(cls)] = 1/n_pos if n_pos else 0
    print('Total {}: {}'.format(type_,n_type))
    for loc in class_weights[type_].keys():
        class_weights[type_][loc] *= n_type
    grand_total += n_type
print('\nGRAND TOTAL: {}'.format(grand_total))


Total 'Serial' Patches per location:
total_C1: 0
total_C2-3: 0
total_C4-7: 0
total_C5: 0
total_C6: 0
total_C8: 0
total_C9: 0
total_C10: 0
total_neg: 0
Total Serial: 0

Total 'Control' Patches per location:
total_C1: 6610
total_C2-3: 6610
total_C4-7: 6610
total_C5: 6610
total_C6: 6610
total_C8: 6610
total_C9: 6610
total_C10: 6610
total_neg: 6610
Total Control: 59490

Total 'Streak' Patches per location:
total_C1: 0
total_C2-3: 0
total_C4-7: 0
total_C5: 0
total_C6: 0
total_C8: 0
total_C9: 0
total_C10: 0
total_neg: 0
Total Streak: 0

GRAND TOTAL: 59490


#### **Let's build the validation generator, using keras.preprocessing.image.ImageDataGenerator, rescaling image pixel values from [0,  255] to [0, 1]:**

In [6]:
c1_pos_folder = './Patches/Control/C1_pos'
img = plt.imread(c1_pos_folder + '/' + os.listdir(c1_pos_folder)[:5][0])
img_size = img.shape
val_batch_size = 64

val_datagen = ImageDataGenerator(rescale=1./255)

val_generator = val_datagen.flow_from_directory(
        './Patches/Control',
        target_size=(img_size[0],img_size[1]),
        batch_size=val_batch_size,
        class_mode='categorical',
        shuffle=False)

Found 59490 images belonging to 9 classes.


#### **Let's check what is the data generators' index for each class:**

In [7]:
print('validation_generator.class_indices:', str(json.dumps(val_generator.class_indices, indent=2, default=str)))

validation_generator.class_indices: {
  "C10_pos": 0,
  "C1_pos": 1,
  "C2-3_pos": 2,
  "C4-7_pos": 3,
  "C5_pos": 4,
  "C6_pos": 5,
  "C8_pos": 6,
  "C9_pos": 7,
  "neg": 8
}


### **Let's download the 4 final CNN models and the 'combine' from GoogleDrive:**

---



In [10]:
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

eight_classes_id = '1w0u_EKaSG8zkMRtYkNjFd3IOnR3IpQsJ' # model_8_classes_0.8465
C1_C2_3_id = '18De1DbqyxD1JlNpue6LIUZXd73VgAN-f' # model C1 vs C2-3 vs all_other
C4_7_C5_id = '1-4e6W-yR13q3ckpgo8O9QVMTncWITHwg' # model_C4-7_vs_C5_vs_all-other
pos_vs_neg_id = '1-BxPnguFXE7PHmzKadW0AnwWO9VqTywR' # model pos-neg

files_ids_dict = {'model_eight_classes': eight_classes_id,
                  'model_C1_C2_3': C1_C2_3_id,
                  'model_C4_7_C5': C4_7_C5_id,
                  'model_pos_vs_neg': pos_vs_neg_id}

models_names_dict = {}
for model_name, file_id in files_ids_dict.items():
    downloaded = drive.CreateFile({'id': file_id})
    downloaded.GetContentFile(downloaded['title'])
    print('Downloaded content: "{}"'.format(downloaded['title']))
    models_names_dict[model_name] = downloaded['title']

print('\ncnn_models_names_dict:', str(json.dumps(models_names_dict, indent=2, default=str)))

file_id = '1aw3Bv1vXDlU_9ZTWeMTEzA44Jo3LlIzA' # 'combine' model_0903
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile(downloaded['title'])
print('\nDownloaded content: "{}"'.format(downloaded['title']))
print('\nRoot dir content: {}'.format(os.listdir()))
model_combine = downloaded['title']

Downloaded content: "model_8_classes_08465.h5"
Downloaded content: "model_C1_C2-3_08983.h5"
Downloaded content: "model_C4-7_C5_083.h5"
Downloaded content: "model_pos_neg_09973.h5"

cnn_models_names_dict: {
  "model_eight_classes": "model_8_classes_08465.h5",
  "model_C1_C2_3": "model_C1_C2-3_08983.h5",
  "model_C4_7_C5": "model_C4-7_C5_083.h5",
  "model_pos_vs_neg": "model_pos_neg_09973.h5"
}

Downloaded content: "combine_model_0.9203.sav"

Root dir content: ['.config', 'Patches', 'model_C1_C2-3_08983.h5', 'adc.json', 'model_C4-7_C5_083.h5', 'model_pos_neg_09973.h5', 'model_8_classes_08465.h5', 'combine_model_0.9203.sav', 'sample_data']


#### **Let's load all 4 CNN models and the 'combine' model from downloaded files:**

In [11]:
eight_classes_model = load_model(models_names_dict['model_eight_classes'])
#eight_classes_model.summary() # summarize model.















Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where




In [0]:
C1_C2_3_model = load_model(models_names_dict['model_C1_C2_3'])
#C1_C2_3_model.summary() # summarize model.

In [0]:
C4_7_C5_model = load_model(models_names_dict['model_C4_7_C5'])
#C4_7_C5_model.summary() # summarize model.

In [0]:
pos_vs_neg_model = load_model(models_names_dict['model_pos_vs_neg'])
#pos_vs_neg_C5_model.summary() # summarize model.

In [0]:
combine_model = pickle.load(open(model_combine, 'rb'))

## **Let's *'evaluate'* the performance of our combined model on the validation dataset:**

In [16]:
eight_classes_scores = eight_classes_model.predict_generator(val_generator)
C1_C2_3_scores = C1_C2_3_model.predict_generator(val_generator)
C4_7_C5_scores = C4_7_C5_model.predict_generator(val_generator)
pos_vs_neg_scores = pos_vs_neg_model.predict_generator(val_generator)

X = np.hstack((eight_classes_scores,C1_C2_3_scores,C4_7_C5_scores,pos_vs_neg_scores))
y_pred = combine_model.predict(X)
y_true = val_generator.classes

val_acc = accuracy_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)
class_names = [k for k in val_generator.class_indices]
c_report = classification_report(y_true, y_pred, target_names=class_names)

print('\nbalanced val_acc:\n', val_acc)
print('\nConfusion Matrix:\n', cm)
print('\nClassification Report:\n', c_report)


balanced val_acc:
 0.9401748192973609

Confusion Matrix:
 [[6610    0    0    0    0    0    0    0    0]
 [   0 6146  454    0    0    0   10    0    0]
 [   0  581 6020    0    0    8    1    0    0]
 [   0    0    0 4473 2128    0    9    0    0]
 [   0    0    0  338 6272    0    0    0    0]
 [   1    0    4    0    0 6605    0    0    0]
 [   0   20    0    0    0    0 6590    0    0]
 [   0    0    0    0    0    0    0 6610    0]
 [   5    0    0    0    0    0    0    0 6605]]

Classification Report:
               precision    recall  f1-score   support

     C10_pos       1.00      1.00      1.00      6610
      C1_pos       0.91      0.93      0.92      6610
    C2-3_pos       0.93      0.91      0.92      6610
    C4-7_pos       0.93      0.68      0.78      6610
      C5_pos       0.75      0.95      0.84      6610
      C6_pos       1.00      1.00      1.00      6610
      C8_pos       1.00      1.00      1.00      6610
      C9_pos       1.00      1.00      1.00      6

# **Next Steps:**



#### Evaluate the performance of our combined model on the test dataset.