# Data Visualisation

---

## Objectives

### Explore the Dataset:

- Understand the distribution of healthy and mildew-labeled cherry leaf images.
- Explore key characteristics of the images, such as size, color distribution, and any noticeable patterns.

### Visualize Annotations:

- Visualize the annotations or labels associated with each image (e.g., healthy or mildew).
- Ensure that the annotations align with the actual content of the images.

### Data Quality Check:

- Check for any anomalies or issues in the dataset that may affect model training.
- Identify and handle any missing or corrupted images.

## Inputs

### Cherry Leaf Dataset:

- The dataset containing annotated images of cherry leaves, specifically labeled as healthy or containing powdery mildew.
- Annotations or labels associated with each image.

### DataCollection.ipynb Output:

- The output generated from the "DataCollection.ipynb" notebook, which includes the collected and organized dataset.

## Outputs

### Descriptive Statistics:

- Descriptive statistics summarizing key features of the dataset, such as the number of healthy and mildew-labeled images.

### Visualizations:

- Histograms or bar charts to visualize the distribution of healthy and mildew-labeled images.
- Sample visualizations of cherry leaf images, showcasing both healthy and mildew-labeled examples.
- Any other relevant visualizations that provide insights into the dataset.

### Data Quality Report:

- A report highlighting any issues or anomalies in the dataset, along with potential solutions or actions.

## Additional Comments

---

## Set data directory

---

## Import libraries

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
sns.set_style("white")
from matplotlib.image import imread

## Set working directory

In [2]:
cwd= os.getcwd()

In [3]:
os.chdir('/workspace/Portfolio-project-5-Milldew-detection-in-Cherry-Leaves/notebooks')
print("You set a new current directory")

You set a new current directory


In [4]:
work_dir = os.getcwd()
work_dir

'/workspace/Portfolio-project-5-Milldew-detection-in-Cherry-Leaves/notebooks'

## Set input directories

Set train, validation and test paths.

In [5]:
my_data_dir = 'inputs/cherry_leaves_dataset/cherry-leaves'
train_path = my_data_dir + '/train'
val_path = my_data_dir + '/validation'
test_path = my_data_dir + '/test'

## Set output directory

In [6]:
version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(work_dir) and version in os.listdir(work_dir + '/outputs'):
    print('Old version is already available create a new version.')
    pass
else:
    os.makedirs(name=file_path)

### Set label names

In [7]:
# Set the labels
labels = os.listdir(train_path)
print('Label for the images are', labels)

Label for the images are ['healthy', 'powdery_mildew']


---

## Data Visualisation of the image data

---