## Fashion Classification with Monk and Densenet

## Blog Post -- [LINK]()

#### Explanation of Dense blocks and Densenets

[BLOG LINK](https://towardsdatascience.com/review-densenet-image-classification-b6631a8ef803)

This is an excellent read comparing Densenets with other architectures and why Dense blocks achieve better accuracy while training lesser parameters 

#### Setup Monk

We begin by setting up monk and installing dependencies for colab

In [0]:
!git clone https://github.com/Tessellate-Imaging/monk_v1

In [0]:
cd monk_v1

In [0]:
!pip install -r installation/requirements_cu10.txt

In [0]:
cd ..

#### Prepare Dataset

Next we grab the dataset. 
Credits to the original dataset -- [Kaggle](https://www.kaggle.com/paramaggarwal/fashion-product-images-small)

In [0]:
!wget https://www.dropbox.com/s/wzgyr1dx4sejo5u/dataset.zip

In [0]:
%%capture
!unzip dataset.zip

**Note** : Pytorch backend requires the images to have 3 channels when loading. We prepare a modified dataset for the same.

In [0]:
!mkdir mod_dataset
!mkdir mod_dataset/images

In [0]:
import cv2
import numpy as np
from glob import glob
from tqdm import tqdm

def convert23channel(imagePath):
  #gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  img = cv2.imread(imagePath)
  img2 = np.zeros_like(img)
  b,g,r = cv2.split(img)
  img2[:,:,0] = b
  img2[:,:,1] = g
  img2[:,:,2] = r
  return img2

imageList = glob("./dataset/images/*.jpg")
for i in tqdm(imageList):
  inPath = i
  out = convert23channel(inPath)
  outPath = "./mod_dataset/images/{}".format(inPath.split('/')[-1])
  cv2.imwrite(outPath,out)

#### Data exploration [DOCUMENTATION](https://clever-noyce-f9d43f.netlify.com/#/compare_experiment)

In [0]:
import pandas as pd
gt = pd.read_csv("./dataset/styles.csv",error_bad_lines=False)
gt.head()

The dataset labels have multiple classification categories. We will train the sub category labels. 

Extract the sub category labels for images. The image id fields require image names with extension.

In [0]:
label_gt = gt[['id','subCategory']]
label_gt['id'] = label_gt['id'].astype(str) + '.jpg'

label_gt.to_csv('./mod_dataset/subCategory.csv',index=False)

# Pytorch with Monk

## Create an Experiment [DOCS](https://clever-noyce-f9d43f.netlify.com/#/quick_mode/quickmode_pytorch)

Import Monk library

In [0]:
import os
import sys
sys.path.append("./monk_v1/monk/");
import psutil
from pytorch_prototype import prototype

## Experiment 1 with Densenet121

Create a new experiment

In [0]:
ptf = prototype(verbose=1);
ptf.Prototype("fashion", "exp1");

Load the training images and ground truth labels for sub category classification.
We select **densenet121** as our neural architecture and set number of epochs to **5**

In [0]:
ptf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory.csv", model_name="densenet121", freeze_base_network=True, num_epochs=5);

**Note** : The dataset has a few missing images. We can find the missing and corrupt images by performing EDA

## EDA documentation [DOCS](https://clever-noyce-f9d43f.netlify.com/#/aux_functions)

In [0]:
ptf.EDA(check_missing=True, check_corrupt=True);

#### Clean the labels file

In [0]:
corruptImageList = ['39403.jpg','39410.jpg','39401.jpg','39425.jpg','12347.jpg']

In [0]:
def cleanCSV(csvPath,labelColumnName,imageIdColumnName,appendExtension=False,extension = '.jpg',corruptImageList = []):
  gt = pd.read_csv(csvPath, error_bad_lines=False)
  print("LABELS\n{}".format(gt["{}".format(labelColumnName)].unique()))
  label_gt = gt[["{}".format(imageIdColumnName),"{}".format(labelColumnName)]]
  if appendExtension == True:
    label_gt['id'] = label_gt['id'].astype(str) + extension
  for i in corruptImageList:
    label_gt = label_gt[label_gt.id != i]
  print("Total images : {}".format(label_gt.shape[0]))
  return label_gt
  

In [0]:
subCategory_gt = cleanCSV('./dataset/styles.csv','subCategory','id',True,'.jpg',corruptImageList)
subCategory_gt.to_csv("./mod_dataset/subCategory_cleaned.csv",index=False)

## Update the experiment [DOCS](https://clever-noyce-f9d43f.netlify.com/#/update_mode/update_dataset)
Now that we have a clean ground truth labels file and modified images, we can update the experiment to take these as our inputs.
**Note** Remember to reload the experiment after any updates. Check out the docs -- [DOCUMENTATION](https://clever-noyce-f9d43f.netlify.com/#/update_mode/update_dataset)

In [0]:
ptf.update_dataset(dataset_path="./mod_dataset/images",path_to_csv="./mod_dataset/subCategory_cleaned.csv");
ptf.Reload()

#### Start Training

In [0]:
ptf.Train()

After training for 5 epochs we reach a validation accuracy of 89% which is quite good. Lets see if other densenet architectures can help improve this performance

## Experiment 2 with Densenet169

In [0]:
ptf = prototype(verbose=1);
ptf.Prototype("fashion", "exp2");
ptf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory_cleaned.csv", model_name="densenet169", freeze_base_network=True, num_epochs=5);
ptf.Train()

We do improve the validation accuracy but not much. Next we run the experiment with densenet201

## Experiment 3 with Densenet201

In [0]:
ptf = prototype(verbose=1);
ptf.Prototype("fashion", "exp3");
ptf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory_cleaned.csv", model_name="densenet201", freeze_base_network=True, num_epochs=5);
ptf.Train()

We can see that the 3 versions of densenet give us quite similar results. We can quickly compare the experiments to see variations in losses and training times to choose a fitting experiment

## Compare experiments [DOCS](https://clever-noyce-f9d43f.netlify.com/#/compare_experiment)

In [0]:
from compare_prototype import compare

In [0]:
ctf = compare(verbose=1);
ctf.Comparison("Fashion_Pytorch_Densenet");

In [0]:
ctf.Add_Experiment("fashion", "exp1");
ctf.Add_Experiment("fashion", "exp2");
ctf.Add_Experiment("fashion", "exp3");

In [0]:
ctf.Generate_Statistics();

# Gluon with Monk

Lets repeat the same experiments but while using a different backend framework **Gluon**

In [0]:
from gluon_prototype import prototype

#### Experiment 4 with Densenet121

In [0]:
%%capture
gtf = prototype(verbose=1);
gtf.Prototype("fashion", "exp4");
gtf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory_cleaned.csv", model_name="densenet121", freeze_base_network=True, num_epochs=5);
gtf.Train()

#### Experiment 5 with Densenet169

In [0]:
%%capture
gtf = prototype(verbose=1);
gtf.Prototype("fashion", "exp5");
gtf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory_cleaned.csv", model_name="densenet169", freeze_base_network=True, num_epochs=5);
gtf.Train()

#### Experiment 6 with Densenet201

In [0]:
%%capture
gtf = prototype(verbose=1);
gtf.Prototype("fashion", "exp6");
gtf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory_cleaned.csv", model_name="densenet201", freeze_base_network=True, num_epochs=5);
gtf.Train()

Lets compare the performance of Gluon backend and Densenet architecture.

In [0]:
ctf = compare(verbose=1);
ctf.Comparison("Fashion_Gluon_Densenet");

In [0]:
ctf.Add_Experiment("fashion", "exp4");
ctf.Add_Experiment("fashion", "exp5");
ctf.Add_Experiment("fashion", "exp6");

In [0]:
ctf.Generate_Statistics();

We can also compare how Pytorch and Gluon fared with our training, but before that lets use Keras backend to train densenets and compare all three frameworks together.

# Keras with Monk



In [0]:
from keras_prototype import prototype

#### Experiment 7 with Densenet121

In [0]:
%%capture
ktf = prototype(verbose=1);
ktf.Prototype("fashion", "exp7");
ktf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory_cleaned.csv", model_name="densenet121", freeze_base_network=True, num_epochs=5);
ktf.Train()

#### Experiment 8 with Densenet169

In [0]:
%%capture
ktf = prototype(verbose=1);
ktf.Prototype("fashion", "exp8");
ktf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory_cleaned.csv", model_name="densenet169", freeze_base_network=True, num_epochs=5);
ktf.Train()

#### Experiment 9 with Densenet201

In [0]:
%%capture
ktf = prototype(verbose=1);
ktf.Prototype("fashion", "exp9");
ktf.Default(dataset_path="./mod_dataset/images", path_to_csv="./mod_dataset/subCategory_cleaned.csv", model_name="densenet201", freeze_base_network=True, num_epochs=5);
ktf.Train()

# Compare experiments


After using different architectures and backend frameworks, lets compare their performance on accuracy, losses and resource usage.

In [0]:
ctf = compare(verbose=1);
ctf.Comparison("Fashion_Densenet_Compare");

In [0]:
ctf.Add_Experiment("fashion", "exp1");
ctf.Add_Experiment("fashion", "exp2");
ctf.Add_Experiment("fashion", "exp3");
ctf.Add_Experiment("fashion", "exp4");
ctf.Add_Experiment("fashion", "exp5");
ctf.Add_Experiment("fashion", "exp6");
ctf.Add_Experiment("fashion", "exp7");
ctf.Add_Experiment("fashion", "exp8");
ctf.Add_Experiment("fashion", "exp9");

In [0]:
ctf.Generate_Statistics();

You can find the generated plots inside **workspace/comparison/Fashion_Densenet_Compare**

Lets visualise the training accuracy and GPU utilisation plots

In [0]:
from IPython.display import Image
Image('workspace/comparison/Fashion_Densenet_Compare/train_accuracy.png')

In [0]:
Image('workspace/comparison/Fashion_Densenet_Compare/stats_training_time.png')