**Image classification Final Project LSML2**

The goal of this model is to predict the class of plants. 

For this we will use PLANT CLEF dataset (https://drive.google.com/file/d/14pUv-ZLHtRR-zCYjznr78mytFcnuR_1D/view?usp=sharing). 

It has almost 10,5 thousand training images with 20 class labels. And it weights almost 1,5 GB

We will use transfer learning (re-training ResNet-18 BIG Transfer model for image classification)

Training loop based on https://github.com/dusty-nv/jetson-inference/tree/master/python/training and https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

In [2]:
! pip install timm

Collecting timm
  Downloading timm-0.4.12-py3-none-any.whl (376 kB)
[?25l[K     |▉                               | 10 kB 37.7 MB/s eta 0:00:01[K     |█▊                              | 20 kB 39.2 MB/s eta 0:00:01[K     |██▋                             | 30 kB 21.5 MB/s eta 0:00:01[K     |███▌                            | 40 kB 16.0 MB/s eta 0:00:01[K     |████▍                           | 51 kB 7.9 MB/s eta 0:00:01[K     |█████▏                          | 61 kB 8.1 MB/s eta 0:00:01[K     |██████                          | 71 kB 7.3 MB/s eta 0:00:01[K     |███████                         | 81 kB 8.1 MB/s eta 0:00:01[K     |███████▉                        | 92 kB 8.3 MB/s eta 0:00:01[K     |████████▊                       | 102 kB 7.5 MB/s eta 0:00:01[K     |█████████▋                      | 112 kB 7.5 MB/s eta 0:00:01[K     |██████████▍                     | 122 kB 7.5 MB/s eta 0:00:01[K     |███████████▎                    | 133 kB 7.5 MB/s eta 0:00:01[K    

In [83]:
!pip install mlflow
!databricks configure --host https://community.cloud.databricks.com/

Username: anikitin982011@yandex.ru
Password: 
Repeat for confirmation: 


In [84]:
import mlflow
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Users/anikitin982011@yandex.ru/experiment2")

In [17]:
import torch
import torch.nn as nn
import torch.optim as optim
import json
import os
import tqdm
import cv2
from matplotlib import pyplot as plt
import numpy as np
import copy
from torch.nn.modules.loss import _Loss
from torch.autograd import Variable
from torch.utils.data import DataLoader
import torch.nn.functional as F
import matplotlib.colors
import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import time

import timm
from timm.data import create_dataset, create_loader, resolve_data_config, Mixup, FastCollateMixup, AugMixDataset
from timm.models import create_model, safe_model_name, resume_checkpoint, load_checkpoint,\
    convert_splitbn_model, model_parameters
from timm.utils import *
from timm.loss import LabelSmoothingCrossEntropy, SoftTargetCrossEntropy, JsdCrossEntropy
from timm.optim import create_optimizer_v2, optimizer_kwargs
from timm.scheduler import create_scheduler
from timm.utils import ApexScaler, NativeScaler

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
!mkdir -p /data/plantclef
#!unzip /content/drive/MyDrive/PlantCLEF_Subset.tar.gz -d /data/plantclef
!tar -C /data/plantclef -xvf /content/drive/MyDrive/PlantCLEF_Subset.tar.gz

[1;30;43mВыходные данные были обрезаны до нескольких последних строк (5000).[0m
PlantCLEF_Subset/train/cedar/248440.jpg
PlantCLEF_Subset/train/sycamore/305024.jpg
PlantCLEF_Subset/train/tulip_tree/259295.jpg
PlantCLEF_Subset/train/cattail/369450.jpg
PlantCLEF_Subset/train/sweetgum/258847.jpg
PlantCLEF_Subset/train/fern/149456.jpg
PlantCLEF_Subset/train/clover/365012.jpg
PlantCLEF_Subset/train/sweetgum/258436.jpg
PlantCLEF_Subset/val/poison_ivy/360246.jpg
PlantCLEF_Subset/train/fern/149550.jpg
PlantCLEF_Subset/train/fern/283365.jpg
PlantCLEF_Subset/train/maple/126804.jpg
PlantCLEF_Subset/train/fig/220702.jpg
PlantCLEF_Subset/train/dandelion/354818.jpg
PlantCLEF_Subset/train/daisy/254228.jpg
PlantCLEF_Subset/val/elm/370229.jpg
PlantCLEF_Subset/train/daisy/254322.jpg
PlantCLEF_Subset/train/sycamore/304592.jpg
PlantCLEF_Subset/train/trout_lily/215422.jpg
PlantCLEF_Subset/train/dandelion/355046.jpg
PlantCLEF_Subset/train/cedar/248571.jpg
PlantCLEF_Subset/train/clover/364991.jpg
PlantCLEF_

In [66]:
#Defining datasets and dataloaders
#Based on https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

# Data augmentation and normalization for training
# Just normalization for validation

data_dir = '/data/plantclef/PlantCLEF_Subset'
input_size = 224 #Size for RESNET model
batch_size = 8

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

print("Initializing Datasets and Dataloaders...")

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=2) for x in ['train', 'val']}

# Detect if we have a GPU available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Initializing Datasets and Dataloaders...


In [65]:
#Num classes is 20 in our dataset
num_classes = 20

# Number of epochs to train for
num_epochs = 15

#Learning rate
lr = 0.001

In [67]:
model = timm.create_model('resnetv2_101x1_bitm', pretrained=True, num_classes=num_classes)

In [69]:
def train_model(model, dataloaders, criterion, optimizer, num_epochs=25):
    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in tqdm.tqdm(dataloaders[phase]):
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)


            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

In [70]:
model = model.to(device)

In [71]:
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
criterion = nn.CrossEntropyLoss()

In [73]:
with mlflow.start_run():
  model, hist = train_model(model, dataloaders_dict, criterion, optimizer, num_epochs=num_epochs)
  mlflow.log_metric("history", hist)
  mlflow.pytorch.log_model(model, "model")

Epoch 0/14
----------


100%|██████████| 1307/1307 [03:44<00:00,  5.82it/s]


train Loss: 1.3903 Acc: 0.5974


100%|██████████| 142/142 [00:11<00:00, 11.86it/s]


val Loss: 0.9332 Acc: 0.7163

Epoch 1/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.69it/s]


train Loss: 0.9105 Acc: 0.7206


100%|██████████| 142/142 [00:12<00:00, 11.76it/s]


val Loss: 1.0032 Acc: 0.7101

Epoch 2/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.69it/s]


train Loss: 0.7397 Acc: 0.7692


100%|██████████| 142/142 [00:11<00:00, 11.88it/s]


val Loss: 0.8315 Acc: 0.7383

Epoch 3/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.69it/s]


train Loss: 0.6626 Acc: 0.7899


100%|██████████| 142/142 [00:12<00:00, 11.64it/s]


val Loss: 0.7757 Acc: 0.7692

Epoch 4/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.70it/s]


train Loss: 0.5702 Acc: 0.8181


100%|██████████| 142/142 [00:12<00:00, 11.82it/s]


val Loss: 0.7323 Acc: 0.7815

Epoch 5/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.69it/s]


train Loss: 0.5303 Acc: 0.8285


100%|██████████| 142/142 [00:12<00:00, 11.80it/s]


val Loss: 0.6233 Acc: 0.8035

Epoch 6/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.70it/s]


train Loss: 0.4958 Acc: 0.8455


100%|██████████| 142/142 [00:12<00:00, 11.66it/s]


val Loss: 0.6202 Acc: 0.7956

Epoch 7/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.70it/s]


train Loss: 0.4421 Acc: 0.8595


100%|██████████| 142/142 [00:11<00:00, 11.86it/s]


val Loss: 0.6811 Acc: 0.8062

Epoch 8/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.70it/s]


train Loss: 0.4276 Acc: 0.8628


100%|██████████| 142/142 [00:12<00:00, 11.72it/s]


val Loss: 0.6384 Acc: 0.8185

Epoch 9/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.70it/s]


train Loss: 0.3794 Acc: 0.8810


100%|██████████| 142/142 [00:12<00:00, 11.80it/s]


val Loss: 0.7173 Acc: 0.7921

Epoch 10/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.70it/s]


train Loss: 0.3567 Acc: 0.8871


100%|██████████| 142/142 [00:11<00:00, 11.88it/s]


val Loss: 0.6917 Acc: 0.8000

Epoch 11/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.69it/s]


train Loss: 0.3379 Acc: 0.8924


100%|██████████| 142/142 [00:12<00:00, 11.80it/s]


val Loss: 0.7014 Acc: 0.8018

Epoch 12/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.69it/s]


train Loss: 0.3275 Acc: 0.8958


100%|██████████| 142/142 [00:12<00:00, 11.78it/s]


val Loss: 0.7201 Acc: 0.8044

Epoch 13/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.69it/s]


train Loss: 0.2970 Acc: 0.9051


100%|██████████| 142/142 [00:12<00:00, 11.79it/s]


val Loss: 0.7790 Acc: 0.7947

Epoch 14/14
----------


100%|██████████| 1307/1307 [03:49<00:00,  5.69it/s]


train Loss: 0.2957 Acc: 0.9088


100%|██████████| 142/142 [00:12<00:00, 11.64it/s]

val Loss: 0.7415 Acc: 0.8062

Training complete in 60m 19s
Best val Acc: 0.818502





In [77]:
#Saving the model
torch.save(model.state_dict(), '/content/drive/MyDrive/model.tar')

In [85]:
#Save also in pickle format
import pickle

raw_data = pickle.dumps(model)

with open('/content/drive/MyDrive/model.pickle', 'wb') as f:
    f.write(raw_data)