# **Model and Evaluation**

## Objectives

* Answer Business Requirement 2: Develop an ML model that classifies leaf images into these two categories; healthy or containing powdery mildew.

## Inputs

* inputs/cherryleaves_dataset/cherry-leaves/test
* inputs/cherryleaves_dataset/cherry-leaves/train
* inputs/cherryleaves_dataset/cherry-leaves/validation
* image shape embeddings.

```plaintext
├── inputs
│ └── cherryleaves_dataset
│ └── cherry-leaves
│   ├── test
│   │ ├── healthy
│   │ └── powdery_mildew
│   ├── train
│   │ ├── healthy
│   │ └── powdery_mildew
│   └── validation
│   │ ├── healthy
│   │ └── powdery_mildew
└── ...
``` 

## Outputs

* Image distribution plot in train, validation, and test set
* Image augementation for each set
* Class indices to change prediction inference in labels
* Creation of an ML model
* Display ML model summary
* Train ML model
* Save ML model
* Create Learning Curve Plot for model performance
* Model Evaluation on pickle file; determine accuracy, plot ROC curve, and calculate classification report 
* Plot Confusion Matrix
* Prediction on the random image file


---

# Import packages

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns 
import tensorflow as tf
from matplotlib.image import imread

---

# Set working directory

In [None]:
cwd = os.getcwd()

In [None]:
os.chdir('/workspaces/mildew-detector')
print("You set a new current directory")

In [None]:
work_dir = os.getcwd()
work_dir

---

## Set input directories
Set train, validation and test paths

In [None]:
my_data_dir = 'inputs/cherryleaves_dataset/cherry-leaves'
train_path = my_data_dir + '/train' 
val_path = my_data_dir + '/validation'
test_path = my_data_dir + '/test'

## Set output directory

In [None]:
version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(work_dir) and version in os.listdir(work_dir + '/outputs'):
    print('Old version is already available create a new version.')
    pass
else:
    os.makedirs(name=file_path)

## Set label names

In [None]:
labels = os.listdir(train_path)

print(
    f"Project Labels: {labels}"
)

## Set image shape

In [None]:
## Import saved image shape embedding
import joblib
version = 'v1'
image_shape = joblib.load(filename=f"outputs/{version}/image_shape.pkl")
image_shape

---

# Images distribution

These plots will give you a comprehensive view of your dataset's distribution across different labels and sets, which is essential for understanding data balance and preparing for model training.

## Count number of images per set and label

In [None]:
import plotly.express as px

df_freq = pd.DataFrame([])
for folder in ['train', 'test', 'validation']:
    for label in labels:
        df_freq = df_freq.append(
            pd.Series(data={'Set': folder,
                            'Label': label,
                            'Count': int(len(os.listdir(my_data_dir + '/' + folder + '/' + label)))}
                      ),
            ignore_index=True
        )

        print(
            f"* {folder} - {label}: {len(os.listdir(my_data_dir+'/'+ folder + '/' + label))} images")

print("\n")

## Label Distribution

In [None]:
plt.figure(figsize=(10, 6))
sns.barplot(data=df_freq, x='Label', y='Count', hue='Set')
plt.title('Label Distribution per Set')
plt.xlabel('Label')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.legend(title='Set')
plt.show()

## Set Distribution

In [None]:
plt.figure(figsize=(10, 6))
sns.barplot(data=df_freq, x='Set', y='Count', estimator=sum, ci=None)
plt.title('Total Images per Set')
plt.xlabel('Set')
plt.ylabel('Count')
plt.show()

## Set and Label Combined Distribution

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=df_freq, x='Set', hue='Label')
plt.title('Set and Label Combined Distribution')
plt.xlabel('Set')
plt.ylabel('Count')
plt.legend(title='Label')
plt.show()

---