# Training a PyTorch deep learning model that detects and classifies plant diseases
We use the [plant village](https://www.plantvillage.org. ) dataset for training our network. The dataset is open sourced and available [here](https://github.com/spMohanty/PlantVillage-Datasethttps://github.com/spMohanty/PlantVillage-Dataset).

We run our notebook in Google colaboratory.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

ModuleNotFoundError: No module named 'google.colab'

In [None]:
! pip install skorch

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import os
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Subset
from torchvision import datasets, models, transforms


from skorch import NeuralNetClassifier
from skorch.helper import predefined_split

torch.manual_seed(10);

In [None]:
torch.__version__

## The Problem

We are going to train a neutral network to classify 14 crop species. The dataset consist of 14 crop species subdivided into 38 classes corresponding to plant disease status.

**Please make sure that you have a GPU**

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

Clone the [Github repository](https://github.com/spMohanty/PlantVillage-Dataset) to obtain the plant village dataset

Then **Split the dataset(color) into 80% training set and 20% validation set**

In [None]:
%cd '/content/drive/My Drive/Datasets/PlantVillage-Dataset/raw'
data_dir = './'
# data processing pipeline
data_transforms = transforms.Compose([transforms.Resize((224,224)), 
                                      transforms.ToTensor(),
                                      transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                           std=[0.229, 0.224, 0.225])
                                     ])


orig_set = datasets.ImageFolder(os.path.join(data_dir, 'color'), data_transforms)  # my dataset

validation_split = .2
shuffle_dataset = True
random_seed= 42

# Creating data indices for training and validation splits:  
dataset_size = len(orig_set)
indices = list(range(dataset_size))
split = int(np.floor(validation_split * dataset_size))
if shuffle_dataset :
    np.random.seed(random_seed)
    np.random.shuffle(indices)
train_indices, val_indices = indices[split:], indices[:split]

# split the dataset into train and test:
train_ds = Subset(orig_set, train_indices)
valid_ds= Subset(orig_set, val_indices)

print("train_leng:",len(train_ds))
print("valid_leng:",len(valid_ds))

In [None]:
orig_set.idx_to_class = {v: k for k, v in orig_set.class_to_idx.items()}
labels = orig_set.idx_to_class
n_classes = len(labels)

## Loading a pretrained neural network model

I use a pretrained `ResNet50` neutral network model with its final layer (classifier) replaced with a new one to help in identifying plant diseases.

In [None]:
class PretrainedModel(nn.Module):
    def __init__(self, output_features):
        super().__init__()
        model = models.resnet50(pretrained=True)
        num_ftrs = model.fc.in_features
        model.fc = nn.Linear(num_ftrs, output_features)
        self.model = model
        
    def forward(self, x):
        return self.model(x)

**Hyperparameters**

In [None]:
n_epochs = 25
optimizer = optim.SGD
l_rate = 0.001
bs = 4

**Define some callbacks**

First, we create a `LRScheduler` callback which is a learning rate scheduler that uses `torch.optim.lr_scheduler.StepLR` to scale learning rates by `gamma=0.1` every 7 steps:

In [None]:
from skorch.callbacks import LRScheduler

lrscheduler = LRScheduler(
    policy='StepLR', step_size=7, gamma=0.1)

Next, we create a `Checkpoint` callback which saves the best model by by monitoring the validation accuracy. 

In [None]:
from skorch.callbacks import Checkpoint

checkpoint = Checkpoint(monitor='valid_acc_best', 
                        f_params='model_params.pt', 
                        f_optimizer='model_optimizer.pt', 
                        f_history='model_history.json')

Lastly, we create a `Freezer` used to fine-tune the model by freezing all weights besides the final layer named `model.fc`:

In [None]:
from skorch.callbacks import Freezer

freezer = Freezer(lambda x: not x.startswith('model.fc'))

In [None]:
net = NeuralNetClassifier(
    PretrainedModel, 
    criterion=nn.CrossEntropyLoss,
    lr=l_rate,
    batch_size=bs,
    max_epochs=n_epochs,
    module__output_features=n_classes,
    optimizer=optimizer,
    optimizer__momentum=0.9,
    iterator_train__shuffle=True,
    iterator_train__num_workers=4,
    iterator_valid__shuffle=False,
    iterator_valid__num_workers=4,
    train_split=predefined_split(valid_ds),
    callbacks=[lrscheduler, checkpoint, freezer],
    device=device # comment to train on cpu
)

In [None]:
net.fit(train_ds, y=None);