# Modelling and Evaluation Notebook

### Objectives

Answer business requirement 2:
+ The client wants to be able to know whether a cherry leaf is healthy or contains powdery mildew. 

### Inputs

+ inputs/cherry_leaves_dataset/cherry_leaves_image/train
+ inputs/cherry_leaves_dataset/cherry_leaves_image/test
+ inputs/cherry_leaves_dataset/cherry_leaves_image/validation
+ image shape embeddings.

### Outputs

+ Images distribution plot in train, validation, and test set.
+ Image augmentation.
+ Class indices to change prediction inference in labels.
+ Machine learning model creation and training.
+ Save model.
+ Learning curve plot for model performance.
+ Model evaluation on pickle file.
+ Prediction on the random image file.

# Import regular packages

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.image import imread

### Set Working Directory

In [2]:
cwd= os.getcwd()

In [3]:
os.chdir('/workspaces/mildew-detector')
print("'mildew-detector' has been set as the new current directory")

'mildew-detector' has been set as the new current directory


In [4]:
work_dir = os.getcwd()
work_dir

'/workspaces/mildew-detector'

### Set input directories

+ Set train, validation and test paths

In [5]:
my_data_dir = 'inputs/cherry_leaves_dataset/cherry_leaves_images'
train_path = my_data_dir + '/train'
val_path = my_data_dir + '/validation'
test_path = my_data_dir + '/test'

### Set output directory

In [6]:
version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(work_dir) and version in os.listdir(work_dir + '/outputs'):
    print('Old version is already available create a new version.')
    pass
else:
    os.makedirs(name=file_path)

Old version is already available create a new version.


### Set labels

In [7]:
labels = os.listdir(train_path)

print(
    f"Project Labels: {labels}"
)

Project Labels: ['healthy', 'powdery_mildew']


### Set image shape

In [8]:
## Import saved image shape embedding
import joblib
version = 'v1'
image_shape = joblib.load(filename=f"outputs/{version}/image_shape.pkl")
image_shape

(100, 100, 3)

### Number of images in train, test and validation data