# Data Collection

## Objectives:

1. **Load and Preprocess Cheery Leaves Images:**
    - Read and load cherry leaf images from provided dataset.
    - Perform any necessary preprocessing steps, such as resizing and normalization.

2. **Annotate images with Labels:**
    - Assign labels to each image indicating, whether the cherry leaf is healthy or contains powdery milldew.

## Inputs:

1. **Dataset:**
   - A dataset containing cherry leaf images.
   - The dataset should have a structure where images are organized in folders or directories.
   - Each image should be associated with a specific class or label (healthy or mildew).

2. **Paths:**
   - Path to the local dataset on your computer (`dataset_path`).
   - Path to the Gitpod workspace (`workspace_path`).

## Outputs:

1. **Processed Images:**
   - The preprocessed cherry leaf images ready for training.

2. **Labels:**
   - A mapping between image filenames and their corresponding labels (healthy or mildew).

## Additional Comments:

   - no additional comments




---

# Import packages

In [2]:
%pip install -r /workspace/Portfolio-project-5-Milldew-detection-in-Cherry-Leaves/requirements.txt

Collecting pandas==1.1.2
  Using cached pandas-1.1.2-cp38-cp38-manylinux1_x86_64.whl (10.4 MB)
Collecting numpy==1.19.2
  Using cached numpy-1.19.2-cp38-cp38-manylinux2010_x86_64.whl (14.5 MB)
Collecting matplotlib==3.3.1
  Using cached matplotlib-3.3.1-cp38-cp38-manylinux1_x86_64.whl (11.6 MB)
Collecting seaborn==0.11.0
  Using cached seaborn-0.11.0-py3-none-any.whl (283 kB)
Collecting scikit-learn==0.24.2
  Using cached scikit_learn-0.24.2-cp38-cp38-manylinux2010_x86_64.whl (24.9 MB)
Collecting tensorflow==2.6.0
  Using cached tensorflow-2.6.0-cp38-cp38-manylinux2010_x86_64.whl (458.4 MB)
Collecting streamlit==0.85.0
  Using cached streamlit-0.85.0-py2.py3-none-any.whl (7.9 MB)
Collecting keras==2.6.0
  Using cached keras-2.6.0-py2.py3-none-any.whl (1.3 MB)
Collecting protobuf==3.20
  Using cached protobuf-3.20.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
Collecting altair<5
  Using cached altair-4.2.2-py3-none-any.whl (813 kB)
Collecting pytz>=2017.2
  Using cache

In [3]:
import os
import numpy

### Change the working directory

In [4]:
current_dir = os.getcwd()
current_dir

'/workspace/Portfolio-project-5-Milldew-detection-in-Cherry-Leaves/notebooks'

In [5]:
os.chdir('/workspace/Portfolio-project-5-Milldew-detection-in-Cherry-Leaves/notebooks')
print("You set a new current directory")

You set a new current directory


In [6]:
current_dir = os.getcwd()
current_dir

'/workspace/Portfolio-project-5-Milldew-detection-in-Cherry-Leaves/notebooks'

### Install Kaggle

In [7]:
# install kaggle package
%pip install kaggle==1.5.12

Collecting kaggle==1.5.12
  Downloading kaggle-1.5.12.tar.gz (58 kB)
[K     |████████████████████████████████| 58 kB 1.8 MB/s eta 0:00:011
Collecting tqdm
  Downloading tqdm-4.66.1-py3-none-any.whl (78 kB)
[K     |████████████████████████████████| 78 kB 2.6 MB/s eta 0:00:01
[?25hCollecting python-slugify
  Downloading python_slugify-8.0.1-py2.py3-none-any.whl (9.7 kB)
Collecting text-unidecode>=1.3
  Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
[K     |████████████████████████████████| 78 kB 11.0 MB/s eta 0:00:01
Building wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... [?25ldone
[?25h  Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73049 sha256=ceb46959e974d5a786dfe44b18bc28293e10f99c6a73f9eb9f55c70832aed4f8
  Stored in directory: /workspace/.pyenv_mirror/pip_cache/wheels/29/da/11/144cc25aebdaeb4931b231e25fd34b394e6a5725cbb2f50106
Successfully built kaggle
Installing collected packages: text-unidecode, tqdm, 

---

### Set Kaggle API key path

In [8]:
kaggle_key_path = '/workspace/Portfolio-project-5-Milldew-detection-in-Cherry-Leaves/kaggle.json'

### Set Kaggle environment variable

In [9]:
os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(kaggle_key_path)

### Change permissions

In [11]:
! chmod 600 {kaggle_key_path}

Set kaggle dataset and download it

In [13]:
KaggleDatasetPath = "codeinstitute/cherry-leaves"
DestinationFolder = "inputs/cherry_leaves_dataset"   
! kaggle datasets download -d {KaggleDatasetPath} -p {DestinationFolder}

Downloading cherry-leaves.zip to inputs/cherry_leaves_dataset
 95%|███████████████████████████████████▉  | 52.0M/55.0M [00:02<00:00, 37.6MB/s]
100%|██████████████████████████████████████| 55.0M/55.0M [00:02<00:00, 26.1MB/s]


Unzip the downloaded file and delete the zip file

In [14]:
import zipfile
with zipfile.ZipFile(DestinationFolder + '/cherry-leaves.zip', 'r') as zip_ref:
    zip_ref.extractall(DestinationFolder)

os.remove(DestinationFolder + '/cherry-leaves.zip')

---