# **Data Modelling and Evaluation**
---

## Objectives

* Answer Business Requirement 2 & 3:
  * We will create and fit an ML model to predict if a leaf is healthy or infected with powdery mildew. This will be a binary classification task as there are only 2 categories to identify.
  * We want to generate reports of the model that is accessible to all users so that the data can be interpreted and understood.

## Inputs

* inputs/cherry_leaves_dataset/cherry-leaves/train
* inputs/cherry_leaves_dataset/cherry-leaves/validation
* inputs/cherry_leaves_dataset/cherry-leaves/test
* Image Shape Embeddings

## Outputs

* Plot distribution of images in train, validation and test sets
* Image augmentation
* Class indices to change prediction inference in labels
* Build ML Model and train it
* Save model
* Plot a learning curve for model performance
* Model evaluation on pickle file
* Prediction on the random image file

## Additional Comments

* No comments
---

### Import Packages

In [2]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.image import imread

### Set working directory

In [3]:
current_dir = os.getcwd()
current_dir

'/workspaces/mildew-detection-in-cherry-leaves-p5/jupyter_notebooks'

In [4]:
os.chdir('/workspaces/mildew-detection-in-cherry-leaves-p5')
print('You set a new current directory')

You set a new current directory


In [5]:
current_dir = os.getcwd()
current_dir

'/workspaces/mildew-detection-in-cherry-leaves-p5'

### Set input directories

In [6]:
my_data_dir = current_dir + '/' + 'inputs/cherry_leaves_dataset/cherry-leaves'
train_path = my_data_dir + '/train'
val_path = my_data_dir + '/validation'
test_path = my_data_dir + '/test'

### Set output directories

In [7]:
version = 'v1'
file_path = f'outputs/{version}'

# Checks to see if a specified version already exists in the outputs folder in the workspace
if 'outputs' in os.listdir(current_dir) and version in os.listdir(current_dir + '/outputs'):
    print('Old version is already available. Create a new version.')
    pass
else:
    os.makedirs(name=file_path)

Old version is already available. Create a new version.


### Set Labels

In [8]:
# Set the labels for the images
labels = os.listdir(train_path)
print('The labels for the images are:', labels)

The labels for the images are: ['healthy', 'powdery_mildew']


### Set Image Shape

In [10]:
# Import saved image shape embedding
import joblib

version = 'v1'
img_shape = joblib.load(filename=f"outputs/{version}/img_shape.pkl")
img_shape

(256, 256, 3)