# **Data Visualisation Notebook**

## Objectives

* Fulfill Business Requirement 1 - differentiate between healthy cherry leaves and those with powdery mildew. 

## Inputs

* inputs/cherry_leaves_dataset/cherry_leaves/train
* inputs/cherry_leaves_dataset/cherry_leaves/validation
* inputs/cherry_leaves_dataset/cherry_leaves/test

## Outputs

* Scatterplot of image height and width.
* Image shape embeddings in a pickle file.
* Standard deviation and mean of images per label.
* A plot to show image differences between the two classes.
* Image montage. 


---

# Set Working Directories

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspace/cherry-leaves-mildew-detection/jupyter_notebooks'

In [2]:
os.chdir("/workspace/cherry-leaves-mildew-detection")
print("You set a new current directory.")

You set a new current directory.


In [3]:
current_dir = os.getcwd()
current_dir

'/workspace/cherry-leaves-mildew-detection'

### Set Input Directories

In [4]:
data_dir = "inputs/cherry_leaves_dataset/cherry_leaves"
train_dir = data_dir + "/train"
val_dir = data_dir + "/validation"
test_dir = data_dir + "/test"

### Set Output Directory

In [5]:
version = "v1"
file_path = f"outputs/{version}"
if "outputs" in os.listdir(current_dir) and version in os.listdir(current_dir + "/outputs"):
    print("This directory exists, create a new version.")
else:
    os.makedirs(name=file_path)

### Set Label Names

In [6]:
labels = os.listdir(train_dir)
print("The image labels are", labels)

The image labels are ['healthy', 'powdery_mildew']


# Image Shape

### Plot Height and Width

In [7]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from matplotlib.image import imread

def plot_height_width():
    """
    Function to plot height and width in a scatterplot and
    return the mean height and mean width.
    """

    image_height, image_width = [], []
    for label in labels:
        for image_file in os.listdir(train_path + "/" + label):
            image = imread(train_path + "/" + label + "/" + image_file)
            height, width, color = image.shape
            image_height.append(height)
            image_width.append(width)

    sns.set_theme(style="dark_grid", palette="bright")
    fig, axes = plt.subplots()
    sns.scatterplot(x=image_width, y=image_height, hue=labels, alpha=0.5)
    axes.set_xlabel("Width (pixels)")
    axes.set_ylabel("Height (pixels)")
    axes.set_title("Height and Width of Cherry Leaves Images")

    # To calculate the mean height and width
    height_mean = int(np.array(image_height).mean())
    width_mean = int(np.array(image_width).mean())

    # Save image if it is not in directory
    if "height_width_plot.png" not in file_path:
        plt.savefig(f"{file_path}/height_width_plot.png")
    else:
        plt.show()
    print(f"The average image height is {height_mean} and the average image width is {width_mean}.")

    return height_mean, width_mean

In [None]:
height_mean, height_width = plot_height_width()


---

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* If you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
    # create here your folder
    # os.makedirs(name='')
except Exception as e:
    print(e)
