# How to train a Classifier on Cifar-10 using Darknet on Colab notebook using a Resnet model's config file.

## Welcome!

This Colab notebook will demonstrate how to:

* Train a **34-Layer Resnet** model using **Darknet** on the **Cifar-10 dataset** using the Colab **12GB-RAM GPU**.
* Turn Colab notebooks into an effective tool to work on real projects. 
  * Configure your notebook to install everything you need and start training in a few minutes. 
  * Store your trained weights directly on your computer or your drive so that you use the trained model whenever needed.
  * Convert a model from a graph to a .cfg file and pass it to darknet in order to train your dataset. (We  use a 34-layer resnet model with the Cifar-10 Dataset for object Classification, same mathod can be adapted for custom datasets and different models)
  * Check the accuracy of your model and print a graph to see the behaviour of the hyperparameters throughout the training


  #### This notebook is part of the github repo [Enter_Darknet](https://github.com/Utkarsh2401/Enter_Darknet) We encourage you to visit! You will find a deeper explanation along with the resources and references we used while understanding and exploring Darknet via colab.

  ---


  

## STEP 1. Configure runtime to work with GPU

We want to use the **12GB-RAM GPU** hardware acceleration!

Go to **> Menu > Runtime > Configure Runtime Type** And select **GPU** From the **Hardware accelerator** drop down menu.

---




## STEP 2. Check CUDA release version

Nvidia CUDA is pre-installed on Colab notebooks. Now we'll check the version installed.



In [None]:
# This cell can be commented once you checked the current CUDA version
# CUDA: Let's check that Nvidia CUDA is already pre-installed and which version is it. In some time from now maybe you 
!/usr/local/cuda/bin/nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0


---

## STEP 3. Installing Darknet
Great!! We have all the necessary to start working with Darknet. 

This notebook works with a slightly modified version of darknet, which is based on the [AlexeyAB Darknet repo](https://github.com/AlexeyAB/darknet/)

* To start, we will:
  * Clone and compile the darknet project.


In [None]:
!rm -rf darknet
!git clone https://github.com/AlexeyAB/darknet.git

Cloning into 'darknet'...
remote: Enumerating objects: 15316, done.[K
remote: Counting objects: 100% (4/4), done.[K
remote: Total 15316 (delta 3), reused 3 (delta 3), pack-reused 15312[K
Receiving objects: 100% (15316/15316), 13.72 MiB | 10.06 MiB/s, done.
Resolving deltas: 100% (10406/10406), done.


In [None]:
cd darknet

/content/darknet


In [None]:
ls

[0m[01;34m3rdparty[0m/               darknet_video.py        [01;32mnet_cam_v4.sh[0m*
[01;34mbuild[0m/                  [01;34mdata[0m/                   README.md
[01;32mbuild.ps1[0m*              [01;32mimage_yolov3.sh[0m*        [01;34mresults[0m/
[01;34mcfg[0m/                    [01;32mimage_yolov4.sh[0m*        [01;34mscripts[0m/
[01;34mcmake[0m/                  [01;34minclude[0m/                [01;34msrc[0m/
CMakeLists.txt          [01;32mjson_mjpeg_streams.sh[0m*  vcpkg.json
DarknetConfig.cmake.in  LICENSE                 [01;32mvideo_yolov3.sh[0m*
darknet_images.py       Makefile                [01;32mvideo_yolov4.sh[0m*
darknet.py              [01;32mnet_cam_v3.sh[0m*


## Change settings to use GPU

Remember to go to Makefile.txt in the darknet repository and make changes as follows

```
GPU=1  
CUDNN=1  
CUDNN_HALF=0  
OPENCV=1  
AVX=0  
OPENMP=0  
LIBSO=0  
ZED_CAMERA=0  
ZED_CAMERA_v2_8=0 
``` 

When compiling it, your output last line has to be something like this:

`g++ -std=c++11 -Iinclude/ -I3rdparty/stb/include -DOPENCV `pkg-config --cflags opencv` -DGPU (...)`

In [None]:
!make

mkdir -p ./obj/
mkdir -p backup
chmod +x *.sh
g++ -std=c++11 -std=c++11 -Iinclude/ -I3rdparty/stb/include -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -c ./src/image_opencv.cpp -o obj/image_opencv.o
g++ -std=c++11 -std=c++11 -Iinclude/ -I3rdparty/stb/include -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -c ./src/http_stream.cpp -o obj/http_stream.o
[01m[K./src/http_stream.cpp:[m[K In member function ‘[01m[Kbool JSON_sender::write(const char*)[m[K’:
                 int [01;35m[Kn[m[K = _write(client, outputbuf, outlen);
                     [01;35m[K^[m[K
[01m[K./src/http_stream.cpp:[m[K In function ‘[01m[Kvoid set_track_id(detection*, int, float, float, float, int, int, int)[m[K’:
         for (int i = 0; [01;35m[Ki < v.size()[m[K; ++i) {
                         [01;35m[K~~^~~~~~~~~~[m[K
     for (int old_id = 0; [01;35m[Kold_id < old_dets.size()[m[K; ++old_id) {
                          [0

---

## Step 4.Loading the dataset and Backup directory to store our weights

In [None]:
mkdir backup

mkdir: cannot create directory ‘backup’: File exists


In [None]:
cd data

/content/darknet/data


* Load the Cifar-10 dataset into the data folder of darknet
  * Clone it from the website

In [None]:
!wget https://pjreddie.com/media/files/cifar.tgz

--2021-10-22 21:19:09--  https://pjreddie.com/media/files/cifar.tgz
Resolving pjreddie.com (pjreddie.com)... 128.208.4.108
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 168584360 (161M) [application/octet-stream]
Saving to: ‘cifar.tgz’


2021-10-22 21:19:17 (20.6 MB/s) - ‘cifar.tgz’ saved [168584360/168584360]




* Unzip the files, and create the train.list and test.list files

In [None]:
!tar xzf cifar.tgz

In [None]:
cd cifar

/content/darknet/data/cifar


In [None]:
!find `pwd`/train -name \*.png > train.list

In [None]:
!find `pwd`/test -name \*.png > test.list

In [None]:
ls

labels.txt  [0m[01;34mtest[0m/  test.list  [01;34mtrain[0m/  train.list


---

## STEP 5. Preparing your data and configuration files 

Before going further let's take a look at what configuration files you need to have in your local drive _`darknet`_

![Yolov3 configuration files cheat sheet.jpg](http://blog.ibanyez.info/download/B20190410T000000072.png)

You can download the cheat sheet [here](http://blog.ibanyez.info/download/B20190410T000000072.png)

This gives a rough idea of how our .data and .cfg files work.


In [None]:
cd ../..

/content/darknet


In [None]:
cd cfg

/content/darknet/cfg


* Once in the cfg folder in the root darknet directory:
  * Click on the three dots and create a new file named cifar.data. Then double click on it and paste the ext below and then save it.

 ``` 
classes=10  
train  = data/cifar/train.list  
valid  = data/cifar/test.list  
labels = data/cifar/labels.txt  
backup = backup/  
top=2
```  

This is a classic .data file for the Cifar-10 dataset. It can be understood as follows:
* classes=10: the dataset has 10 different classes
* train = ...: where to find the list of training files
* valid = ...: where to find the list of validation files
* labels = ...: where to find the list of possible classes
* backup = ...: where to save backup weight files during training
* top = 2: calculate top-n accuracy at test time (in addition to top-1)

* Next we will create a .cfg file in the same way. We refer a 34 layer resnet model for this.
[34-layer resnet model ](http://blog.ibanyez.info/download/B20190410T000000072.png)
  * Similarly as above, create a resnet34.cfg file in the cfg folder in the darknet root directory, and copy paste the following contents into it.
  * This is the network we will be training.




```
[net]
# how many images are in each batch to average the loss over?
batch=32 
# into how many sub-batches shall each batch be divided to handle images in each sub-batch in parallel? 
subdivisions=1
height=28			#can adjust according to dataset
width=28
channels=3			#using rgb images
max_crop=32
min_crop=32

#parameters for data augmentation
hue=.1
saturation=.75
exposure=.75

#parameters for learning
learning_rate=0.1	
policy=poly
power=4
max_batches = 5000		#max number of iterations, corresponding to scale
momentum=0.9
decay=0.0005

[convolutional]			
batch_normalize=1     #purple 1
filters=64
size=7
stride=2
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]			
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky


[convolutional]			
batch_normalize=1     #purple 2
filters=64
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky


[convolutional]			
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky


[convolutional]			
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]			
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky


[convolutional]			
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky


[convolutional]			
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]		
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]			
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]			
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]			
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]			
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]			
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]			
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]		
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky


[convolutional]			
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=linear

[shortcut]
from=-3
activation=leaky

[convolutional]
filters=10
size=1
stride=1
pad=1
activation=leaky

[avgpool]

[softmax]
```





Our model is inspired by the ‘Deep Residual Learning for Image Recognition’ research paper written by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun of Microsoft Research. 

* It has  34 convolutional layers with one max pooling layer and multiple shortcuts
* The last convolutional layer has 10 filters because we have 10 classes. It outputs a 7 x 7 x 10 image. 
* We just want 10 predictions total so we use an average pooling layer to take the average across the image for each channel. This will give us our 10 predictions. 
* We use a softmax to convert the predictions into a probability distribution. This layer also calculates our error as cross-entropy loss.

In [None]:
#Just check that we are in the correct location
pwd

'/content/darknet/cfg'

In [None]:
#Ensure that the two new files we added are saved correctly
ls

9k.labels                                   strided.cfg
9k.names                                    t1.test.cfg
9k.tree                                     tiny.cfg
alexnet.cfg                                 tiny-yolo.cfg
cd53paspp-gamma.cfg                         tiny-yolo-voc.cfg
cifar.cfg                                   tiny-yolo_xnor.cfg
cifar.test.cfg                              vgg-16.cfg
coco9k.map                                  vgg-conv.cfg
coco.data                                   voc.data
coco.names                                  writing.cfg
combine9k.data                              yolo.2.0.cfg
crnn.train.cfg                              yolo9000.cfg
csdarknet53-omega.cfg                       yolo.cfg
cspx-p7-mish.cfg                            [0m[01;34myolov1[0m/
cspx-p7-mish_hp.cfg                         yolov2.cfg
cspx-p7-mish-omega.cfg                      yolov2-tiny.cfg
csresnext50-panet-spp.cfg                   yolov2-tiny-voc.cfg
csresnext50-panet

In [None]:
### Let us all make some util functions that we can use later


In [None]:
#utility function
def imShow(path):
  import cv2
  import matplotlib.pyplot as plt
  %matplotlib inline

  image = cv2.imread(path)
  height, width = image.shape[:2]
  resized_image = cv2.resize(image,(3*width, 3*height), interpolation = cv2.INTER_CUBIC)

  fig = plt.gcf()
  fig.set_size_inches(18, 10)
  plt.axis("off")
  #plt.rcParams['figure.figsize'] = [10, 5]
  plt.imshow(cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB))
  plt.show()



---

## Step 6. Train the model 
When you execute the following command, your model will start training.

You will have a log line per epoch. On each iteration you will see how your training is going.

> **TRICK: Darknet copies a backup of your trained weights every 1000 iterations. It also stores the last weight file as cfg_filename_last.weights and the final correct weights file as 'cfg_filename'_final.weights. Once this is generated you should download this file and keep as now the model can be used even without training.


In [None]:
cd ..

/content/darknet


In [None]:
!pwd

/content/darknet


In [None]:
!./darknet classifier train cfg/cifar.data cfg/resnet34.cfg -dont_show
imShow("/content/darknet/chart_resnet34.png")

/bin/bash: ./darknet: No such file or directory


NameError: ignored

---

## Step 7 : Done, Congratulations, you have successfully trained the model!

Now, you can use the following command along with the weights file that was automatically saved in the conmtent/darknet/backup folder, or if you it saved seperately, you can upload it to the current notebook and specify the path relative to content/darknet.

#### Once the cell runs, you will be asked to provide an input image path, and you can then use the trained model to help classify your image as per Cifar's 10 classes.

In [None]:
!./darknet classifier predict cfg/cifar.data cfg/resnet34.cfg backup/resnet34_final.weights

## PERFORMANCE TIPS & TRICKS

* **Speed up load times of the runtime:** When everything is checked that works, you can remove cells or comment unnecessary lines of code to make your loading time lower on every run. 
Also, once your model is trained, you can skip Step 6 completely, and provide the correct path to saved weights in Step 7

* **How to keep your notebook alive for more time?:** Keep you browser with your notebook open. If you close your browser, your notebook will reach the iddle time and will be removed from Colab cloud Service. (90 minutes)
  
* **Re-run your training after reaching the limitation time for Colab runtimes (12 hours):** 
  * Open a new notebook or reconnect the current one.
  * Comment the cell above and uncomment the cell below.
  * In your local computer, copy the file **backup/yolov3_last.weights** to your local computer **weights/** folder. 
  * Execute Run all in the **> menu > Runtime > Run All**
  * _The copy step is not absolutely necessary, but it is good to save the trouble of training the model again and again._