
### Modeling and Evaluation Notebook

**Objectives**
- Answer business requirement 2:
    + The client is interested in predicting if a cherry tree is healthy or contains powdery mildew.

**Inputs**
- inputs/cherry_leaves_dataset/cherry-leaves/train/
- inputs/cherry_leaves_dataset/cherry-leaves/test/
- inputs/cherry_leaves_dataset/cherry-leaves/validation/
- image_embeddings

**Outputs**
- Images distribution plot in train, validation, and test set
- Image augmentation
- Class indices to change prediction inference in labels
- Machine learning model creation and training
- Save model
- Learning curve plot for model performance
- Model evaluation on pickle file
- Prediction on the random image file

**Additional Comments | Insights | Conclusions**

____________________________________________________________________________________________________


### **Set Data Directory**
__________________________________________________________________________________________

**Import libraries**

In [30]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from matplotlib.image import imread

**Set Working Directory**

In [31]:
cwd= os.getcwd()

In [20]:
os.chdir('/workspace/mildew-detection-in-cherry-leaves')
print("You set a new current directory")

You set a new current directory


In [32]:
work_dir = os.getcwd()
work_dir

'/workspace/mildew-detection-in-cherry-leaves'

**Set input directories**

Set train, validation and test paths

In [33]:
my_data_dir = 'inputs/cherry_leaves_dataset/cherry-leaves'
train_path = my_data_dir + '/train'
val_path = my_data_dir + '/validation'
test_path = my_data_dir + '/test'

**Set output directory**


- Organize files in a version-controlled manner
- This code is checking if a specific version of output files or directories already exists, and if it does, it suggests that a new version should be created. 
- If the version does not exist, it creates the necessary directory structure to store the files for that version.

In [34]:
version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(work_dir) and version in os.listdir(work_dir + '/outputs'):
  print('Old version is already available create a new version.')
  pass
else:
  os.makedirs(name=file_path)

Old version is already available create a new version.


**Set labels names**

In [35]:
labels = os.listdir(train_path)
print('Label for the images are',labels)

Label for the images are ['healthy', 'powdery_mildew']


**Set image file size embeddings**

In [36]:
version = 'v1'
image_size = joblib.load('outputs/v1/image_embeddings.joblib')
image_size

{'inputs/cherry_leaves_dataset/cherry-leaves/train/healthy/0008f3d3-2f85-4973-be9a-1b520b8b59fc___JR_HL 4092.JPG': array([5.4298244e+00, 2.7398005e-04, 7.2061890e-01, ..., 3.8319865e-01,
        9.5334125e-01, 3.4293011e-01], dtype=float32),
 'inputs/cherry_leaves_dataset/cherry-leaves/train/healthy/0008f3d3-2f85-4973-be9a-1b520b8b59fc___JR_HL 4092_flipTB.JPG': array([6.928287  , 0.0500761 , 0.8589183 , ..., 0.5745407 , 0.35898665,
        0.03467656], dtype=float32),
 'inputs/cherry_leaves_dataset/cherry-leaves/train/healthy/002efba9-09b3-43de-93b7-5c2460185cde___JR_HL 9655.JPG': array([3.088735  , 0.01270527, 1.296147  , ..., 1.0502157 , 0.88481504,
        0.03241634], dtype=float32),
 'inputs/cherry_leaves_dataset/cherry-leaves/train/healthy/0048afb8-b950-4c57-9e72-7e26282327ee___JR_HL 9765.JPG': array([1.3175539 , 0.00485275, 1.8822621 , ..., 0.5286861 , 0.1618286 ,
        0.20590004], dtype=float32),
 'inputs/cherry_leaves_dataset/cherry-leaves/train/healthy/0048afb8-b950-4c57-9

____________________________________________________________________________________________________

1. **Images distribution plot in train, validation, and test set**