# **Preparations**

## Kaggle Accessability

Before collecting the data, you need to ensure you have access to Kaggle, a registered account is required. When downloading data from Kaggle using Python, you need to authenticate, which means you need to provide your credentials.

You can generate and download from the Kaggle site to your computer a json file that is an authentication token
We highly recommend you generate, download and keep this file in an accessible folder since we will need this in our walkthrough projects. Eventually, if you accept your dataset suggestions for the milestone project, you will again need this token

The process is as follows:

1. Once you are logged in to your Kaggle account, click on your user profile picture on the top right of the page and then “Account” from the dropdown menu. This will take you to your account settings

2. Scroll down to the section of the page called API

3. Click Expire API Token to remove any previous tokens

4. To create a new token, click on the “Create New API Token” button. It will generate a fresh authentication token and will download a kaggle.json file to your machine.

In case you have any difficulty, go to the "Authentication" section in this [link](https://www.kaggle.com/docs/api).

Finally, you should have this file saved locally to your machine.

**Please make sure this file is named kaggle.json**

## Correct requirements

Before collecting any data you need to ensure you have the right requirements installed.

In [16]:
# installs all required packages for the project
%pip install -r /workspaces/cherry-leaves-mildew-detection/requirements.txt

Note: you may need to restart the kernel to use updated packages.


---

# **(DATA COLLECTION)**

## Objectives

* Fetch data from Kaggle and prepare it for further processes.

## Inputs

* Kaggle JSON file - the authentication token.

## Outputs

* Generate Dataset: inputs/datasets/cherry_leaves 

## Additional Comments

* The output dataset should contain correct images of cherry leaves with or without a mildew infection. These images should be usable for creating models that can classify cherry leaves based on the presence of mildew.



---

## Import packages:

In [11]:
import numpy
import os

# Use correct working directory

* The notebook can be found in a subfolder, therefore when running the notebook in the editor, you will need to change to the correct working directory

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspaces/cherry-leaves-mildew-detection/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [7]:
os.chdir('/workspaces/cherry-leaves-mildew-detection')
print("Work directory changed to:", os.getcwd())

Work directory changed to: /workspaces/cherry-leaves-mildew-detection


Confirm the new current directory

In [9]:
current_dir = os.getcwd()
print("current directory is:", current_dir)

current directory is: /workspaces/cherry-leaves-mildew-detection


### Prepare for download dataset

In [17]:
# Use the downloaded Kaggle API key to ensure download capabilities.
os.environ['KAGGLE_CONFIG_DIR'] = os.getcwd()
! chmod 600 kaggle.json

"* Get the dataset path from the [Kaggle URL](https://www.kaggle.com/datasets/codeinstitute/cherry-leaves).
"* Set your destination folder."

Set the Kaggle Dataset and Download it.

In [18]:
KaggleDatasetPath = "codeinstitute/cherry-leaves"
DestinationFolder = "inputs/cherry-leaf-dataset" 
! kaggle datasets download -d {KaggleDatasetPath} -p {DestinationFolder}

Dataset URL: https://www.kaggle.com/datasets/codeinstitute/cherry-leaves
License(s): unknown
Downloading cherry-leaves.zip to inputs/cherry-leaf-dataset
  0%|                                               | 0.00/55.0M [00:00<?, ?B/s]
100%|██████████████████████████████████████| 55.0M/55.0M [00:00<00:00, 1.58GB/s]


# Section 1

Section 1 content

---

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* If you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
    # create here your folder
    # os.makedirs(name='')
except Exception as e:
    print(e)
