# **Modelling and Evaluating**

## Objectives

To answer business requirement 2:

* The client is interested in predicting if a cherry leaf is healthy or contains a posdery mildew.

## Inputs

* inputs/cherry-leaves/cherry-leaves/test
* inputs/cherry-leaves/cherry-leaves/train
* inputs/cherry-leaves/cherry-leaves/validation
* image shape embeddings created in DataVisualisation jupyter notebook

## Outputs

* Write here which files, code or artefacts you generate by the end of the notebook 

---

## Import packages

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from matplotlib.image import imread

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [None]:
import os
current_dir = os.getcwd()
current_dir

## Set working directory

In [None]:
cwd= os.getcwd()

In [None]:
os.chdir('/Users/lukenicklin/mildew-detection-in-cherry-leaves')
print("You set a new current working directory")

In [None]:
work_dir = os.getcwd()
work_dir

---

## Set input directories

Set train, validation and test paths

In [None]:
my_data_dir = 'inputs/cherry-leaves/cherry-leaves'
train_path = my_data_dir + '/train'
val_path = my_data_dir + '/validation'
test_path = my_data_dir + '/test'

## Set output directory

In [None]:
version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(work_dir) and version in os.listdir(work_dir + '/outputs'):
    print('Old version is already avialable. Please create a new version.')
    pass
else:
    os.makedirs(name=file_path)

## Set label names

In [None]:
# Set label names
labels = os.listdir(train_path)
print('Label for the images are', labels)

## Set image shape

In [None]:
import joblib
version = 'v1'
image_shape = joblib.load(filename=f"outputs/{version}/image_shape.pkl")
image_shape

---

## Images distribution

In [None]:
import plotly.express as px

df_freq = pd.DataFrame([]
for folder in ['train', 'test', 'validation']):
    for label in labels:
        df_freq = df_freq.append(
            pd.Series(data={'Set': folder,
                            'Label': label,
                            'Count': int(len(os.listdir(my_data_dir + '/' + folder + '/' + label)))}
                      ),
            ignore_index=True
        )

        print(
            f"* {folder} - {label}: {len(os.listdir(my_data_dir + '/' + folder + '/' + label))} images")
        
print("\n")

## Label distribution - bar chart

In [None]:
fig = px.bar(df_freq,
             x="Set",
             y="Count",
             color="Label",
             title="Cherry Leaves Dataset",
             text_auto=True)
fig.update_layout(
    autosize=False,
    width=800,
    height=600,
    title_font_size=20,
    )
fig.show()
fig.write_image(f'{file_path}/label_distribution_bar.png')

## Set distribution - pie chart

In [None]:
fodlers = os.listdir(my_data_dir)
data=[]
for folder in folders:
    for label in labels:
        n=int(len(os.listdir(my_data_dir + '/' + folder + '/' + label)))
        n+=n
    data.append(n)

px = 1/plt.rcParmas['figure.dpi']
plt.subplots(figsize=(800*px, 300*px))
colors = sns.color_palette("deep")[0:5]
plt.pie(data, labels = folders, colors = colors, autopct='%.0f%%')
plt.title('Cherry Leaves Dataset Distribution')
plt.savefig(f'{file_path}/set_distribution_pie.png', dpi=300, bbox_inches='tight')
plt.show()

---

## Image data augmentation

## Import ImageDataGenerator

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
augmented_image_data = ImageDataGenerator(rotation_range=20,
                                          width_shift_range=0.10,
                                          height_shift_range=0.10,
                                          shear_range=0.1,
                                          zoom_range=0.1,
                                          horizontal_flip=True,
                                          vertical_flip=True,
                                          fill_mode='nearest',
                                          rescale=1./255
                                          )

## Set batch size

In [None]:
batch_size = 20

# Push files to Repo

* If you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
    # create here your folder
    # os.makedirs(name='')
except Exception as e:
    print(e)
