# Modeling and Evaluation
## Objectives
* Answer business the requirement 2:
    * An ML system that is capable of predicting whether a cherry leaf is healthy or contains powdery mildew.
## Inputs
* inputs/cherry-leaves/train
* inputs/cherry-leaves/validation
* inputs/cherry-leaves/test
## Outputs
* Images distribution plot in train, validation and test set
* Image augmentation
* Class indices to change prediction inference in labels
* Machine learning model creation and training
* Save model
* Learning curve plot for model performance
* Model evaluation on pickle file
* Prediction on the random image file
## Import packages

In [9]:
import joblib
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.image import imread
from tensorflow.keras.preprocessing.image import ImageDataGenerator

## Change the working directory

In [2]:
os.chdir('/workspace/Project_five')
current_directory = os.getcwd()
print(f"Your current directory is '{current_directory}'")

Your current directory is '/workspace/Project_five'


## Set input directories

In [3]:
getdirectory = os.getcwd()
directory = 'inputs/cherry-leaves'
train_path = directory + '/train'
val_path = directory + '/validation'
test_path = directory + '/test'

## Set output directories

In [4]:
# To make a new set of outputs change the version variable
version = 'v1'
file_path = f'outputs/{version}'
if version not in os.listdir("outputs"):
  os.makedirs(name=file_path)
  print('Version made')
else:
    print('Version ready')

Version ready


## Set labels

In [5]:
labels = os.listdir(train_path)

## Set images shape

In [8]:
# Make sure that the version matches the version you want
version = 'v1'
images_shape = joblib.load(filename=f"outputs/{version}/image_shape.pkl")

## Intiatizing the ImageDataGenerator

In [16]:
# These are the augments we will aply to our images
augmented_image_data = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.10, 
    height_shift_range=0.10,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest',
    rescale=1./255
    )

# set the batch size for iteration
batch_size = 20

## Augment our training image dataset

In [22]:
train_set = augmented_image_data.flow_from_directory(
    train_path,
    target_size=image_shape[:2],
    color_mode='rgb',
    batch_size=batch_size,
    class_mode='binary',
    shuffle=True
    )

Found 2944 images belonging to 2 classes.


## Augment validation image dataset

In [24]:
validation_set = ImageDataGenerator(rescale=1./255).flow_from_directory(
    val_path,
    target_size=image_shape[:2],
    color_mode='rgb',
    batch_size=batch_size,
    class_mode='binary',
    shuffle=False
    )

Found 420 images belonging to 2 classes.


## Augment test image dataset

In [25]:
test_set = ImageDataGenerator(rescale=1./255).flow_from_directory(
    test_path,
    target_size=image_shape[:2],
    color_mode='rgb',
    batch_size=batch_size,
    class_mode='binary',
    shuffle=False
    )

Found 844 images belonging to 2 classes.


## Save class_indices

In [29]:
joblib.dump(
    value=train_set.class_indices,
    filename=f"{file_path}/class_indices.pkl"
    )

['outputs/v1/class_indices.pkl']