# Modelling and Evaluation Notebook

## Objectives

*   Answer business requirement 2: 
    * The client is interested to tell whether or not a given cell is parasitized with malaria or not.


## Inputs

* inputs/malaria_dataset
* image_shape

## Outputs

* model
* class_indices
* predictions from train, validation and test sets
* Training loss and accuracy plot


## Additional Comments | Insights | Conclusions




---

# Install Packages

In [None]:
! pip install plotly==4.14.0
! pip install tensorflow==2.6.0

# Code for restarting the runtime, that will restart colab session
# It is a good practice after you install a package in a colab session
import os
os.kill(os.getpid(), 9)

# Setup GPU

* Go to Edit → Notebook Settings
* In the Hardware accelerator menu, selects GPU
* note: when you select an option, either GPU, TPU or None, you switch among kernels/sessions

---
* How to know if I am using the GPU?
  * run the code below, if the output is different than '0' or null/nothing, you are using GPU in this session
  * Typically the output will be /device:GPU:0


In [None]:
import tensorflow as tf
tf.test.gpu_device_name()

# **Connection between: Colab Session and your GitHub Repo**

### Insert your **credentials**

* The variable's content will exist only while the session exists. Once this session terminates, the variable's content will be erased permanently.

In [None]:
from getpass import getpass
import os
from IPython.display import clear_output 

print("=== Insert your credentials === \nType in and hit Enter")
os.environ['UserName'] = getpass('GitHub User Name: ')
os.environ['UserEmail'] = getpass('GitHub User E-mail: ')
os.environ['RepoName'] = getpass('GitHub Repository Name: ')
os.environ['UserPwd'] = getpass('GitHub Account Password: ')
clear_output()
print("* Thanks for inserting your credentials!")
print(f"* You may now Clone your Repo to this Session, "
      f"then Connect this Session to your Repo.")

---

### **Clone** your GitHub Repo to your current Colab session

* So you can have access to your project's files

In [None]:
! git clone https://github.com/{os.environ['UserName']}/{os.environ['RepoName']}.git
! rm -rf sample_data   # remove content/sample_data folder, since we dont need it for this project

import os
if os.path.isdir(os.environ['RepoName']):
  print("\n")
  %cd /content/{os.environ['RepoName']}
  print(f"\n\n* Current session directory is:{os.getcwd()}")
  print(f"* You may refresh the session folder to access {os.environ['RepoName']} folder.")
else:
  print(f"\n* The Repo {os.environ['UserName']}/{os.environ['RepoName']} was not cloned."
        f" Please check your Credentials: UserName and RepoName")

---

### **Connect** this Colab session to your GitHub Repo

* So if you need, you can push files generated in this session to your Repo.

In [None]:
! git config --global user.email {os.environ['UserEmail']}
! git config --global user.name {os.environ['UserName']}
! git remote rm origin
! git remote add origin https://{os.environ['UserName']}:{os.environ['UserPwd']}@github.com/{os.environ['UserName']}/{os.environ['RepoName']}.git

# the logic is: create a temporary file in the sessions, update the repo. Delete this file, update the repo
# If it works, it is a signed that the session is connected to the repo.
import uuid
file_name = "session_connection_test_" + str(uuid.uuid4()) # generates a unique file name
with open(f"{file_name}.txt", "w") as file: file.write("text")
print("=== Testing Session Connectivity to the Repo === \n")
! git add . ; ! git commit -m {file_name + "_added_file"} ; ! git push origin main 
print("\n\n")
os.remove(f"{file_name}.txt")
! git add . ; ! git commit -m {file_name + "_removed_file"}; ! git push origin main

# delete your Credentials (username and password)
os.environ['UserName'] = os.environ['UserPwd'] = os.environ['UserEmail'] = ""

* If output above indicates there was a **failure in the authentication**, please insert again your credentials.

---

# Set Data Directory

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.image import imread

In [None]:
# my_data_dir = '/content/WalkthroughProject01/inputs/datasets/cell_images/cell_images'
my_data_dir = '/content/WalkthroughProject01/inputs/little_train_set/cell_images'
# my_data_dir = '/content/WalkthroughProject01/inputs/few_train_data/cell_images'
# my_data_dir = f"/content/{os.environ['RepoName']}/inputs/malaria_dataset/cell_images"

labels_train = os.listdir(my_data_dir+ '/train')
labels_val = os.listdir(my_data_dir+ '/validation')
labels_test = os.listdir(my_data_dir+ '/test')
labels = list(set(labels_train + labels_test))

print(
    f"Labels on train set: {labels_train}\n"
    f"Labels on validation set: {labels_val}\n"
    f"Labels on test set: {labels_test}\n"
    f"Project Labels: {labels}"
    )

In [None]:
train_path = my_data_dir + '/train'
val_path = my_data_dir + '/validation'
test_path = my_data_dir + '/test'
train_path

---

## Check Labels Frequencies

In [None]:
df_freq = pd.DataFrame([]) 
for folder in ['train', 'validation', 'test']:
  for label in labels:
    df_freq = df_freq.append(
        pd.Series(data={'Set': folder,
                        'Label': label,
                        'Frequency':int(len(os.listdir(my_data_dir+'/'+ folder + '/' + label)))}
                  ),
                  ignore_index=True
        )
    
    print(f"* {folder} - {label}: {len(os.listdir(my_data_dir+'/'+ folder + '/' + label))} images")

print("\n")
sns.set_style("whitegrid")
plt.figure(figsize=(8,5))
sns.barplot(data=df_freq, x='Set', y='Frequency', hue='Label')
plt.show()

---

# Modelling

## Image shape

In [None]:
image_shape = (132,132,3)

## Create model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D

def create_tf_model():
  model = Sequential()

  model.add(Conv2D(filters=32, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
  model.add(MaxPooling2D(pool_size=(2, 2)))

  model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
  model.add(MaxPooling2D(pool_size=(2, 2)))

  model.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=image_shape, activation='relu',))
  model.add(MaxPooling2D(pool_size=(2, 2)))

  model.add(Flatten())
  model.add(Dense(128))
  model.add(Activation('relu'))

  model.add(Dropout(0.5))
  model.add(Dense(1))
  model.add(Activation('sigmoid'))

  model.compile(loss='binary_crossentropy',
                optimizer='adam',
                metrics=['accuracy'])
  
  return model

In [None]:
create_tf_model().summary()

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_accuracy', mode='max', patience=2)

---

## Image Data Generator

Does it need scaling?
* We check the max value from the array from a given image. If the value is greater than 1, it is an indication that the data is in the 0 - 255 range. If not, the data is in the range of 0 - 1 and is already scaled
* If we needed to scale, we would to `ImageDataGenerator()` the argument and value: `rescale=1./255` 

In [None]:
from matplotlib.image import imread
pointer = 0
img = imread(train_path + '/' + labels[0]+ '/' + os.listdir(train_path + '/' + labels[0])[pointer])
sns.set_style("white")
plt.imshow(img)
plt.show()
print(f'The max value from the array is {img.max()}')

Train set

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
train_datagen = ImageDataGenerator(rotation_range=20,
                                   width_shift_range=0.10, 
                                   height_shift_range=0.10,
                                   shear_range=0.1,
                                   zoom_range=0.1,
                                   horizontal_flip=True,
                                   fill_mode='nearest',     
                              )


In [None]:
batch_size = 16

train_set = train_datagen.flow_from_directory(train_path,
                                              target_size=image_shape[:2],
                                              color_mode='rgb',
                                              batch_size=batch_size,
                                              class_mode='binary',
                                              shuffle=True
                                              )

train_set.class_indices

Validation set

In [None]:
validation_set = ImageDataGenerator().flow_from_directory(val_path,
                                                          target_size=image_shape[:2],
                                                          color_mode='rgb',
                                                          batch_size=batch_size,
                                                          class_mode='binary',
                                                          shuffle=False
                                                          )

validation_set.class_indices

Test set

In [None]:
test_set = ImageDataGenerator().flow_from_directory(test_path,
                                                    target_size=image_shape[:2],
                                                    color_mode='rgb',
                                                    batch_size=batch_size,
                                                    class_mode='binary',
                                                    shuffle=False
                                                    )

test_set.class_indices

## Fit model 

In [None]:
model = create_tf_model()
model.fit(train_set,
          epochs=25,
          steps_per_epoch = len(train_set.classes) // batch_size,
          validation_data=validation_set,
          callbacks=[early_stop]
          )



# from google.colab import output
# output.eval_js('new Audio("https://upload.wikimedia.org/wikipedia/commons/0/05/Beep-09.ogg").play()')

# model.save('/content/WalkthroughProject01/outputs/model/model_wt01_full_train_set.h5')

# CommitMsg = "update"
# !git add .
# !git commit -m {CommitMsg}
# !git push origin main

# started at 
# at 19, reached 90% acc

## Model training history

In [None]:
losses = pd.DataFrame(model.history.history)

sns.set_style("whitegrid")
losses[['loss','val_loss']].plot(style='.-')
plt.title("Loss")
plt.show()

print("\n")
losses[['accuracy','val_accuracy']].plot(style='.-')
plt.title("Accuracy")
plt.show()

https://towardsdatascience.com/convolutional-neural-network-feature-map-and-filter-visualization-f75012a5a49c

---

## Model Evaluation

In [None]:
# model.metrics_names

In [None]:
# model.evaluate(train_set)

In [None]:
# model.evaluate(validation_set)

In [None]:
# model.evaluate(test_set)

In [None]:
# from tensorflow.keras.models import load_model
# model1 = load_model('outputs/model/malaria_detector.h5')

In [None]:
from sklearn.metrics import classification_report,confusion_matrix
def model_evaluation(model, image_generator):
  pred_probabilities = model.predict(image_generator)
  predictions = pred_probabilities > 0.5

  pd.Series(pred_probabilities.flatten()).hist()
  plt.show()
  print("\n")

  Map = image_generator.class_indices
  print(pd.DataFrame(confusion_matrix(predictions,image_generator.classes),
        columns=[ ["Actual " + sub for sub in Map] ], 
        index = [ ["Prediction " + sub for sub in Map ]]))
  print("\n")
  
  print(classification_report(image_generator.classes,predictions))
  print("\n")



In [None]:
train_set_shuffle_false = ImageDataGenerator().flow_from_directory(train_path,
                                                          target_size=image_shape[:2],
                                                          color_mode='rgb',
                                                          batch_size=batch_size,
                                                          class_mode='binary',
                                                          shuffle=False
                                                          )

In [None]:
import datetime
print(f'Classes indices: {train_set_shuffle_false.class_indices}\n\n')

for image_set in [train_set_shuffle_false, validation_set, test_set]:
  
  print(f'{image_set.directory.split("/")[-1]} set')
  a = datetime.datetime.now()

  model_evaluation(model=model, image_generator=image_set)
  print(f'\nTime for evaluation: {datetime.datetime.now() - a}')
  print("\n\n")

---

## Predict on new data

In [None]:
from tensorflow.keras.models import load_model
model = load_model('outputs/model/malaria_detector.h5')

In [None]:
labels

In [None]:
pointer = 66
label = labels[0]
para_img = imread(test_path + '/'+ label + '/'+ os.listdir(test_path+'/'+ label)[pointer])
print(para_img.shape)
sns.set_style("white")
plt.imshow(para_img)
plt.show()

---

Pick random image 

In [None]:
from tensorflow.keras.preprocessing import image

pointer = 66
label = labels[0]

pil_image = image.load_img(test_path + '/'+ label + '/'+ os.listdir(test_path+'/'+ label)[pointer],
                          target_size=image_shape, color_mode='rgb')
print(f'Image shape: {pil_image.size}, Image mode: {pil_image.mode}')
pil_image

In [None]:
my_image = image.img_to_array(pil_image)
my_image = np.expand_dims(my_image, axis=0)
print(my_image.shape)

In [None]:
pred_proba = model.predict(my_image)[0,0]

target_map = {v: k for k, v in train_set.class_indices.items()}
pred_class =  target_map[pred_proba > 0.5]  

if pred_class == target_map[0]: pred_proba = 1 - pred_proba

print(pred_proba)
print(pred_class)


In [None]:
prob_per_class= pd.DataFrame(data=[0,0],index=train_set.class_indices.keys(), columns=['Probability'])

prob_per_class.loc[pred_class] = pred_proba

for x in prob_per_class.index.to_list():
  if x not in pred_class: prob_per_class.loc[x] = 1 - pred_proba

prob_per_class = prob_per_class.round(3)
print(prob_per_class)

import plotly.express as px
fig = px.bar(prob_per_class, x = prob_per_class.index, y = prob_per_class['Probability'],range_y=[0,1],
             labels=dict(x="Diagnosis"), width=400, height=500)
fig.show()

# Push files to Repo

In [None]:
import joblib
import os

version = 'v1'
file_path = f'outputs/model/{version}'

try:
  os.makedirs(name=file_path)
except Exception as e:
  print(e)

## Training loss and accuracy plot

In [None]:
sns.set_style("whitegrid")
losses[['loss','val_loss']].plot(style='.-')
plt.title("Loss")
plt.savefig(f'{file_path}/training_losses.png', bbox_inches='tight', dpi=150)

print("\n")
losses[['accuracy','val_accuracy']].plot(style='.-')
plt.title("Accuracy")
plt.savefig(f'{file_path}/training_acc.png', bbox_inches='tight', dpi=150)

## Save predictions from train, validation and test sets

In [None]:
for image_set in [train_set_shuffle_false, validation_set, test_set]: 
  pred_probabilities = model.predict(image_set)
  predictions = pred_probabilities > 0.5

  (pd.Series(predictions.flatten())
  .to_csv(f'{file_path}/predictions_{image_set.directory.split("/")[-1]}_set.csv', index=False)
  )

## Save model

In [None]:
model.summary()

In [None]:
model.save(f'{file_path}/malaria_detector_model.h5')  # creates a HDF5 file 'my_model.h5'

## Save class_indices

In [None]:
train_set.class_indices

In [None]:

joblib.dump(value=train_set.class_indices ,
            filename=f"{file_path}/class_indices.pkl")

## Push generated/new files from this Session to GitHub repo

* Git status

In [None]:
! git status

* Git commit

In [None]:
CommitMsg = "update"
!git add .
!git commit -m {CommitMsg}

* Git Push

In [None]:
!git push origin main

---