# Modelling and Evaluation Notebook

## Objectives

*   Answer business requirement 2: 
    * The client is interested to tell whether a given cell contains a malaria parasite or not.


## Inputs

* inputs/malaria_dataset/cell_images/train
* inputs/malaria_dataset/cell_images/test
* inputs/malaria_dataset/cell_images/validation
* image shape embeddings.

## Outputs
* Images distribution plot in train, validation, and test set.
* Image augmentation.
* Class indices to change prediction inference in labels.
* Machine learning model creation and training.
* Save model.
* Learning curve plot for model performance.
* Model evaluation on pickle file.
* Prediction on the random image file.




## Additional Comments | Insights | Conclusions




---

---

# Import regular packages

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.image import imread

---

# Set Working Directory

In [2]:
cwd= os.getcwd()

In [3]:
os.chdir('/workspaces/WalkthroughProject01')
print("You set a new current directory")

FileNotFoundError: [Errno 2] No such file or directory: '/workspaces/WalkthroughProject01'

In [4]:

work_dir = os.getcwd()
work_dir

'/workspaces/project-5-mildew-detection-in-cherry-leaves/jupyter_notebooks'

---

## Set input directories

Set train, validation and test paths

In [5]:
my_data_dir = 'inputs/malaria_dataset/cell_images'
train_path = my_data_dir + '/train'
val_path = my_data_dir + '/validation'
test_path = my_data_dir + '/test'

## Set output directory

In [6]:
version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(work_dir) and version in os.listdir(work_dir + '/outputs'):
    print('Old version is already available create a new version.')
    pass
else:
    os.makedirs(name=file_path)


## Set labels

In [7]:

labels = os.listdir(train_path)

print(
    f"Project Labels: {labels}"
)


FileNotFoundError: [Errno 2] No such file or directory: 'inputs/malaria_dataset/cell_images/train'

## Set image shape

In [8]:
## Import saved image shape embedding
import joblib
version = 'v1'
image_shape = joblib.load(filename=f"outputs/{version}/image_shape.pkl")
image_shape

FileNotFoundError: [Errno 2] No such file or directory: 'outputs/v1/image_shape.pkl'

---

# Number of images in train, test and validation data

In [9]:
df_freq = pd.DataFrame([])
for folder in ['train', 'validation', 'test']:
    for label in labels:
        df_freq = df_freq.append(
            pd.Series(data={'Set': folder,
                            'Label': label,
                            'Frequency': int(len(os.listdir(my_data_dir + '/' + folder + '/' + label)))}
                      ),
            ignore_index=True
        )

        print(
            f"* {folder} - {label}: {len(os.listdir(my_data_dir+'/'+ folder + '/' + label))} images")

print("\n")
sns.set_style("whitegrid")
plt.figure(figsize=(8, 5))
sns.barplot(data=df_freq, x='Set', y='Frequency', hue='Label')
plt.savefig(f'{file_path}/labels_distribution.png',
            bbox_inches='tight', dpi=150)
plt.show()


NameError: name 'labels' is not defined

---

# Image data augmentation

---

### ImageDataGenerator

In [10]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

* ### Initialize ImageDataGenerator

In [11]:
augmented_image_data = ImageDataGenerator(rotation_range=20,
                                          width_shift_range=0.10,
                                          height_shift_range=0.10,
                                          shear_range=0.1,
                                          zoom_range=0.1,
                                          horizontal_flip=True,
                                          vertical_flip=True,
                                          fill_mode='nearest',
                                          rescale=1./255
                                          )


* ### Augment training image dataset

In [12]:
batch_size = 20  # Set batch size
train_set = augmented_image_data.flow_from_directory(train_path,
                                                     target_size=image_shape[:2],
                                                     color_mode='rgb',
                                                     batch_size=batch_size,
                                                     class_mode='binary',
                                                     shuffle=True
                                                     )

train_set.class_indices


NameError: name 'image_shape' is not defined

* ### Augment validation image dataset

In [13]:
validation_set = ImageDataGenerator(rescale=1./255).flow_from_directory(val_path,
                                                                        target_size=image_shape[:2],
                                                                        color_mode='rgb',
                                                                        batch_size=batch_size,
                                                                        class_mode='binary',
                                                                        shuffle=False
                                                                        )

validation_set.class_indices


NameError: name 'image_shape' is not defined

* ### Augment test image dataset

In [14]:
test_set = ImageDataGenerator(rescale=1./255).flow_from_directory(test_path,
                                                                  target_size=image_shape[:2],
                                                                  color_mode='rgb',
                                                                  batch_size=batch_size,
                                                                  class_mode='binary',
                                                                  shuffle=False
                                                                  )

test_set.class_indices


NameError: name 'image_shape' is not defined

## Plot augmented training image

In [15]:
for _ in range(3):
    img, label = train_set.next()
    print(img.shape)  # (1,256,256,3)
    plt.imshow(img[0])
    plt.show()


NameError: name 'train_set' is not defined

## Plot augmented validation and test images

In [16]:
for _ in range(3):
    img, label = validation_set.next()
    print(img.shape)  # (1,256,256,3)
    plt.imshow(img[0])
    plt.show()


NameError: name 'validation_set' is not defined

In [17]:
for _ in range(3):
    img, label = test_set.next()
    print(img.shape)  # (1,256,256,3)
    plt.imshow(img[0])
    plt.show()


NameError: name 'test_set' is not defined

## Save class_indices

In [18]:
joblib.dump(value=train_set.class_indices,
            filename=f"{file_path}/class_indices.pkl")


NameError: name 'train_set' is not defined

---

# Model creation

---

## ML model

* ### Import model packages

In [19]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D

* ### Model 

In [20]:

def create_tf_model():
    model = Sequential()

    model.add(Conv2D(filters=32, kernel_size=(3, 3),
              input_shape=image_shape, activation='relu', ))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(filters=64, kernel_size=(3, 3),
              input_shape=image_shape, activation='relu', ))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(filters=64, kernel_size=(3, 3),
              input_shape=image_shape, activation='relu', ))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Flatten())
    model.add(Dense(128, activation='relu'))

    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

    return model


* ### Model Summary 

In [21]:
create_tf_model().summary()

2023-06-15 18:55:01.779764: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


NameError: name 'image_shape' is not defined

* ### Early Stopping 

In [22]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=3)


## Fit model for model training

In [23]:
model = create_tf_model()
model.fit(train_set,
          epochs=25,
          steps_per_epoch=len(train_set.classes) // batch_size,
          validation_data=validation_set,
          callbacks=[early_stop],
          verbose=1
          )


NameError: name 'image_shape' is not defined

## Save model

In [24]:
model.save('outputs/v1/malaria_detector_model.h5')

NameError: name 'model' is not defined

---

# Model Performace

---

## Model learning curve

In [25]:
losses = pd.DataFrame(model.history.history)

sns.set_style("whitegrid")
losses[['loss', 'val_loss']].plot(style='.-')
plt.title("Loss")
plt.savefig(f'{file_path}/model_training_losses.png',
            bbox_inches='tight', dpi=150)
plt.show()

print("\n")
losses[['accuracy', 'val_accuracy']].plot(style='.-')
plt.title("Accuracy")
plt.savefig(f'{file_path}/model_training_acc.png',
            bbox_inches='tight', dpi=150)
plt.show()


NameError: name 'model' is not defined

## Model Evaluation

Load saved model

In [26]:
from keras.models import load_model
model = load_model('outputs/v1/malaria_detector_model.h5')

OSError: SavedModel file does not exist at: outputs/v1/malaria_detector_model.h5/{saved_model.pbtxt|saved_model.pb}

Evaluate model on test set

In [27]:
evaluation = model.evaluate(test_set)


NameError: name 'model' is not defined

### Save evaluation pickle

In [28]:
joblib.dump(value=evaluation,
            filename=f"outputs/v1/evaluation.pkl")


NameError: name 'evaluation' is not defined

## Predict on new data

Load a random image as PIL

In [29]:
from tensorflow.keras.preprocessing import image

pointer = 66
label = labels[0]  # select Uninfected or Parasitised

pil_image = image.load_img(test_path + '/' + label + '/' + os.listdir(test_path+'/' + label)[pointer],
                           target_size=image_shape, color_mode='rgb')
print(f'Image shape: {pil_image.size}, Image mode: {pil_image.mode}')
pil_image


NameError: name 'labels' is not defined

Convert image to array and prepare for prediction

In [30]:
my_image = image.img_to_array(pil_image)
my_image = np.expand_dims(my_image, axis=0)/255
print(my_image.shape)

NameError: name 'pil_image' is not defined

Predict class probabilities

In [31]:
pred_proba = model.predict(my_image)[0, 0]

target_map = {v: k for k, v in train_set.class_indices.items()}
pred_class = target_map[pred_proba > 0.5]

if pred_class == target_map[0]:
    pred_proba = 1 - pred_proba

print(pred_proba)
print(pred_class)


NameError: name 'model' is not defined

---

# Push files to Repo

## Push generated/new files from this Session to your GitHub repo

* .gitignore

In [32]:
!cat .gitignore

cat: .gitignore: No such file or directory


* Git status

In [38]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean


* Git add

In [34]:
!git add .

* Git commit

In [35]:
!git commit -am " Add new plots"

uts/malaria_dataset/cell_images/validation/Uninfected/C217ThinF_IMG_20151106_141500_cell_30.png
 create mode 100644 inputs/malaria_dataset/cell_images/validation/Uninfected/C217ThinF_IMG_20151106_141649_cell_161.png
 create mode 100644 inputs/malaria_dataset/cell_images/validation/Uninfected/C217ThinF_IMG_20151106_141649_cell_63.png
 create mode 100644 inputs/malaria_dataset/cell_images/validation/Uninfected/C217ThinF_IMG_20151106_142147_cell_137.png
 create mode 100644 inputs/malaria_dataset/cell_images/validation/Uninfected/C218ThinF_IMG_20151106_143940_cell_147.png
 create mode 100644 inputs/malaria_dataset/cell_images/validation/Uninfected/C218ThinF_IMG_20151106_143940_cell_17.png
 create mode 100644 inputs/malaria_dataset/cell_images/validation/Uninfected/C218ThinF_IMG_20151106_143940_cell_23.png
 create mode 100644 inputs/malaria_dataset/cell_images/validation/Uninfected/C218ThinF_IMG_20151106_144001_cell_223.png
 create mode 100644 inputs/malaria_dataset/cell_images/validation/U

* Git Push

In [36]:
!git push

Enumerating objects: 1157, done.
Counting objects: 100% (1157/1157), done.
Delta compression using up to 4 threads
Compressing objects: 100% (1145/1145), done.
Writing objects: 100% (1152/1152), 34.70 MiB | 4.38 MiB/s, done.
Total 1152 (delta 2), reused 1130 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.[K
To https://github.com/Dylangroome/WalkthroughProject01
   12581040..833a3e79  main -> main


---