# **1 – Data Collection**


## Objectives

* Authenticate with Kaggle and download the cherry leaf image dataset  

## Inputs

* kaggle.json authentication token 
* Cherry leaf image dataset from Kaggle

## Outputs
 
* Dataset saved to raw data folder
* folder structure with train, validation and test data


---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [3]:
import os
current_dir = os.getcwd()
current_dir

'c:\\Users\\amyno\\OneDrive\\Documents\\CherryLeafProject\\milestone-project-mildew-detection-in-cherry-leaves\\jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [4]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [5]:
current_dir = os.getcwd()
current_dir

'c:\\Users\\amyno\\OneDrive\\Documents\\CherryLeafProject\\milestone-project-mildew-detection-in-cherry-leaves'

### Downloading the Cherry Leaves Dataset


This section will:

* Authenticate the data with Kaggle using the kaggle.json API token  
* Download the cherry leaves image dataset 
* Unzip the dataset into the directory  
* Clean up by deleting the zip file and the Kaggle token for security


Install Kaggle to fetch data

In [6]:
!pip install kaggle

Collecting kaggle
  Downloading kaggle-1.7.4.2-py3-none-any.whl.metadata (16 kB)
Collecting bleach (from kaggle)
  Downloading bleach-6.2.0-py3-none-any.whl.metadata (30 kB)
Collecting python-slugify (from kaggle)
  Downloading python_slugify-8.0.4-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting text-unidecode (from kaggle)
  Downloading text_unidecode-1.3-py2.py3-none-any.whl.metadata (2.4 kB)
Collecting tqdm (from kaggle)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting webencodings (from kaggle)
  Downloading webencodings-0.5.1-py2.py3-none-any.whl.metadata (2.1 kB)
Downloading kaggle-1.7.4.2-py3-none-any.whl (173 kB)
Downloading bleach-6.2.0-py3-none-any.whl (163 kB)
Downloading python_slugify-8.0.4-py2.py3-none-any.whl (10 kB)
Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
Downloading webencodings-0.5.1-py2.py3-none-any.whl (11 kB)
Installing collected packages: webencodings, text-unidecode, tqdm


[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Set environment variable to direct Kaggle to API key

In [7]:
import os

os.environ['KAGGLE_CONFIG_DIR'] = os.getcwd()

Use Kaggle command to download the cherry leaf dataset

In [8]:
!kaggle datasets download -d codeinstitute/cherry-leaves

Dataset URL: https://www.kaggle.com/datasets/codeinstitute/cherry-leaves
License(s): unknown


Unzip downloaded dataset into the appropriate folder

In [9]:
import zipfile

with zipfile.ZipFile("cherry-leaves.zip", "r") as zip_ref:
    zip_ref.extractall("inputs/dataset/raw")

Delete the zip file and kaggle.json token after use for security

In [10]:
os.remove("cherry-leaves.zip")
os.remove("kaggle.json")

Check to see expected contents of folder

In [11]:
import os

os.listdir("inputs/dataset/raw")


['cherry-leaves']

#### Credits
* The dataset used was from code institute on Kaggle and can be found [here](https://www.kaggle.com/datasets/codeinstitute/cherry-leaves)
* code from code blocks 6 and 7 was helpfully provided by code institute 
* code from code block 9 was inspired by Kaggle (ref. in readme)

---

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* If you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
    # create here your folder
    # os.makedirs(name='')
except Exception as e:
    print(e)
