# AWS Machine Learning Engineer Nano Degree Capstone Project 

## Plants Disease Detection Using Deep Learning

### Project  Overview

Plant diseases are one of the major factors responsible for substantial losses in yield of
plants, leading to huge economic losses. According to a study by the Associated Chambers
of Commerce and Industry of India, annual crop losses due to diseases and pest’s amount to
Rs.50,000 crore in India alone, which is significant in a country where the
farmers are responsible for feeding a population of close to 1.3 billion people. The value of
plant science is therefore huge.</br>
Accurate identification and diagnosis of plant diseases are very important in the era of
climate change and globalization for food security. Accurate and early identification of plant
diseases could help in the prevention of spread of invasive pests/pathogens. In addition, for an
efficient and economical management of plant diseases accurate, sensitive and specific
diagnosis is necessary.</br>
The growth of GPU’s ( Graphical Processing Units ) has aided academics and business
in the advancement of Deep Learning methods, allowing them to explore deeper and more
sophisticated Neural Networks. Using concepts of Image Classification and Transfer
Learning we could train a Deep Learning model to categorize Plant leaf’s images to predict
whether the plant is healthy or has any diseases. This could help in the early detection of any
diseases in plants and could help take preventive measures to prevent huge crop losses

### Data Prepation

####  Installing Libraries
* We will be using **split-folders** to split our dataset into train, val and test sets.
* **tqdm** will help give us a visual status of the progress while copying folders.

In [18]:
!pip install split-folders tqdm



#### Download the zipped dataset file from S3.

In [7]:
!aws s3 cp s3://sagemaker-us-east-1-970845818811/CapstoneProposal.zip ./

download: s3://sagemaker-us-east-1-970845818811/CapstoneProposal.zip to ./CapstoneProposal.zip


In [11]:
# lets write a small utility functions to unzip our folder's contents.
import zipfile

# Function below unzips the archive to the local directory. 
def unzip_data(input_data_path):
    with zipfile.ZipFile(input_data_path, 'r') as input_data_zip:
        input_data_zip.extractall('.')

In [12]:
zipped_filename = "CapstoneProposal.zip"
unzip_data(zipped_filename)

In [None]:
input_folder_path = "./CapstoneProposal/dataset/Plant_leave_diseases_dataset_with_augmentation"

##### Quick Overview of the plant diesease dataset.
The total plant disease dataset of that will be used for this project consists of **9644 images** . All the images
vary in dimensions, they are not standardized, and they are all coloured images. So the model
will be trained on the above 9 plant image classes, for our use-case.

#### Splitting Dataset into Train, Validation and Test Sets

The Dataset consists of **9 classes** and the dataset is more or less **balanced**. Thus we can split the dataset into train , validation and test sets in the ratio/proportion of 
**80:10:10**. Meaning 80% training dataset, 10% validation dataset and 10% test dataset.

In [14]:
import splitfolders  # or import split_folders
# Split with a ratio.
splitfolders.ratio(input_folder_path, output="plant_disease_dataset", seed=1357, ratio=(.8, .1, .1), group_prefix=None) # default valuesb

Copying files: 9644 files [00:01, 7039.60 files/s]


#### Uploading the split datset onto S3 bucket.

In [21]:
!aws s3 cp plant_disease_dataset s3://sagemaker-us-east-1-970845818811/plant_disease_dataset --recursive > /dev/null 