## Problems related to open-source Darknet repository
1. Which factor is **dominant** in the uncertainty of CNN object detection applications? The **inherent inaccuracy** of Neural Network or the degradation caused by **faults** in the computing hardware.


2. Extract feature maps in a reasonable way to model **fault propagation**.


3. Pure Technical Questions: Fit Darknet into NVBitFI

## A. Train Darknet for Kitti
To study RQ1, one possible way is also considering the accuray of an CNN application. In addition to traditional "golden ref" vs "fault injected DUT" comparison, the mean Average Precision (mAPs) of "golden ref", "faulty" should also be measured. 

Additionaly, to emphasize the primary concern of this study is autonomous driving senarios. Darknet is trained for Kitti 2D object detection dataset

This section introduces how to train Darknet for KITTI step by step.

**Note** : Unless otherwise specified, the parent directory of this notebook is ```darknet/```

### 0. Get KITTI 2d object detection dataset
Register and download the evaluation datsets at: http://www.cvlibs.net/datasets/kitti/eval_object.php

### a. Convert KITTI labels to YOLO style

KITTI label format looks like: 

```Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01```

YOLO label format is: 

```4 0.622 0.609 0.08 0.446```

Luckily, there are several handy open source tools can help do this: https://github.com/ssaru/convert2Yolo

The steps of label conversion is straight forward, the only user-defined file is ```.name```, in which users should type in concerned class names. 

Then users just need to run python script with necessary arguments, the rest will be handled by convert2Yolo tool

### b. Network configuration, training/validation datasets preparation

Following the instructions of AlexDarknet https://github.com/AlexeyAB/darknet, and valuable discussion of how to train YOLO with KITTI more accurately http://yizhouwang.net/blog/2018/07/29/train-yolov2-kitti/. I sucessfully got everything right and trained YOLOv3, obtained a weights file specifically for KITTI dataset.

1. Prepare your own configuration (```.cfg```) file, which defines the size of network, several tweakable hyper-parameters. Depending how many classes are concerned in **(a)**, the number of classes, and the filter size before each yolo layer should be adjusted accordingly.


2. Create ```.data``` and ```.name``` file in directory ```data/```

***Example of ```.data``` file***

```classes = 6
train  = data/train.txt
valid  = data/test.txt
names = data/kitti.names
backup = backup/```

***Example of ```.name``` file***

```Car
 Van
 Truck
 Tram
 Pedestrian
 Cyclist```

3. Put ```image-file``` and ```label-files``` obtained through **(a)** into directories: ```data/training``` and ```data/testing```

4. Use handy script ```traingen.py``` to create ```train.txt``` and ```test.txt```, put these files in  ```data/``` directory

5. Prepare pre-trained ```weights-file```. Training will start from a pretrained weights file, I guess the pretrained weights serves as a good start point for the transfer learning


6. Start Training using the command below

In [None]:
!./darknet detector train data/kitti.data cfg/yolov3-kitti.cfg yolov3-kitti_last.weights

Like most CNN training, training yolov3 is time consuming even for newer GPUs, I stopped training after approximately 6 hours, with a loss of 0.8

### c. Evaluate the accuray of the trained model

#### 1. Test the YOLOv3 weights we trained for KITTI

KITTI dataset is splitted into the training set and the validation set. Training set contains the first **5001 images**, and the validation set contains the rest **(2480 images)**.

In [6]:
!./darknet detector map data/kitti.data cfg/yolov3-kitti.cfg yolov3-kitti_last.weights -iou_thresh 0.25

 CUDA-version: 11040 (11040), cuDNN: 8.3.1, GPU count: 1  
 OpenCV version: 4.2.0
 0 : compute_capability = 860, cudnn_half = 0, GPU: NVIDIA GeForce RTX 3070 Ti 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    256 x1216 x   3 ->  256 x1216 x  32 0.538 BF
   1 conv     64       3 x 3/ 2    256 x1216 x  32 ->  128 x 608 x  64 2.869 BF
   2 conv     32       1 x 1/ 1    128 x 608 x  64 ->  128 x 608 x  32 0.319 BF
   3 conv     64       3 x 3/ 1    128 x 608 x  32 ->  128 x 608 x  64 2.869 BF
   4 Shortcut Layer: 1,  wt = 0, wn = 0, outputs: 128 x 608 x  64 0.005 BF
   5 conv    128       3 x 3/ 2    128 x 608 x  64 ->   64 x 304 x 128 2.869 BF
   6 conv     64       1 x 1/ 1     64 x 304 x 128 ->   64 x 304 x  64 0.319 BF
   7 conv    128       3 x 3/ 1     64 x 304 x  64 ->   64 x 304 x 128 2.869 BF
   8 Shortcut Layer: 5,  wt = 0, wn = 0, outputs:  64 x 304 x 128 0.002 BF
   9 conv     64       1 x 1/ 1     64 x 304 x 128 ->   64 x

Dectection mAP at IOU=0.25

```
class_id = 0, name = Car, ap = 91.49%   	 (TP = 7592, FP = 583)
class_id = 1, name = Van, ap = 85.19%   	 (TP = 679, FP = 143) 
class_id = 2, name = Truck, ap = 94.30%   	 (TP = 287, FP = 13) 
class_id = 3, name = Tram, ap = 87.09%   	 (TP = 116, FP = 13) 
class_id = 4, name = Pedestrian, ap = 49.28%   	 (TP = 474, FP = 170) 
class_id = 5, name = Cyclist, ap = 63.27%   	 (TP = 255, FP = 85)
```

As shown above, the retrained model performs well in recoginizing vehicles but its performance is worse regarding pedestrians and cyclists. 

#### 2. Test other pre-trained YOLOv3 weights with the KITTI dataset

One possible way to test other pre-trained model is setting the concerned classes names to be consistent with KITTI labels. All other labels are considered "don't care". For example, we can compute mAP with default weights provided in Darknet Repo:

In [4]:
!./darknet detector map data/kitti_coco.data cfg/yolov3.cfg weights/yolov3.weights -iou_thresh 0.25

 CUDA-version: 11040 (11040), cuDNN: 8.3.1, GPU count: 1  
 OpenCV version: 4.2.0
 0 : compute_capability = 860, cudnn_half = 0, GPU: NVIDIA GeForce RTX 3070 Ti 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   3 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF
   4 Shortcut Layer: 1,  wt = 0, wn = 0, outputs: 208 x 208 x  64 0.003 BF
   5 conv    128       3 x 3/ 2    208 x 208 x  64 ->  104 x 104 x 128 1.595 BF
   6 conv     64       1 x 1/ 1    104 x 104 x 128 ->  104 x 104 x  64 0.177 BF
   7 conv    128       3 x 3/ 1    104 x 104 x  64 ->  104 x 104 x 128 1.595 BF
   8 Shortcut Layer: 5,  wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
   9 conv     64       1 x 1/ 1    104 x 104 x 128 ->  104 x

```
class_id = 0, name = Pedestrian, ap = 0.51%   	 (TP = 24, FP = 1230) 
class_id = 1, name = Cyclist, ap = 0.00%   	 (TP = 0, FP = 451) 
class_id = 2, name = Car, ap = 0.22%   	 (TP = 13, FP = 11640) 
class_id = 3, name = Cyclist, ap = 0.00%   	 (TP = 0, FP = 48)
class_id = 5, name = Van, ap = 0.00%   	 (TP = 0, FP = 312) 
class_id = 6, name = Tram, ap = 0.00%   	 (TP = 0, FP = 191)
class_id = 7, name = Truck, ap = 0.00%   	 (TP = 0, FP = 1497)
```

The performance of pre-trained weight is horrible, let's try yolov4:

In [None]:
!./darknet detector map data/kitti_coco.data cfg/yolov4.cfg weights/yolov4.weights -iou_thresh 0.01

It is still horrible, further adjustments may be required

### d. Summary of retraining and accuracy evaluation
Comparing with previous studies by Prof. Paolo Rech's team, the **inherent accuracy of YOLOv3 detector** is emphasized. YOLOv3 is **retrained** for the **KITTI 2d object dataset** and achieved an acceptable accuracy.


Further experiment will be conducted to compare the mAP of YOLOv3 application under different setups **fault injected** with **golden ref**. The anticipated results should be mAP response to hardware faults.

## B. Extract Feature Map

### a. Extract feature map when using GPU
#### 1. Extract in a fast way, extract each layer's average
Uncomment the following lines within the function ```forward_network_gpu``` in ```src/network_kernels.cu``` to get some sort of average values of each layer in YOLO CNN forward network. The performance overhead for computing layer average is acceptable, I suppose it is the preferred way to store information of feature maps.

```c
        cuda_pull_array(l.output_gpu, l.output, l.outputs);
        cudaStreamSynchronize(get_cuda_stream());
        float avg_val = 0;
        int k;
        for (k = 0; k < l.outputs; ++k) avg_val += l.output[k];
        printf(" i: %d - avg_val = %f \n", i, avg_val / l.outputs);
```

In [18]:
!./darknet detector test data/kitti.data cfg/yolov3-kitti.cfg yolov3-kitti_last.weights -ext_output -dont_show data/street.png

 CUDA-version: 11040 (11040), cuDNN: 8.3.1, GPU count: 1  
 OpenCV version: 4.2.0
 0 : compute_capability = 860, cudnn_half = 0, GPU: NVIDIA GeForce RTX 3070 Ti 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    256 x1216 x   3 ->  256 x1216 x  32 0.538 BF
   1 conv     64       3 x 3/ 2    256 x1216 x  32 ->  128 x 608 x  64 2.869 BF
   2 conv     32       1 x 1/ 1    128 x 608 x  64 ->  128 x 608 x  32 0.319 BF
   3 conv     64       3 x 3/ 1    128 x 608 x  32 ->  128 x 608 x  64 2.869 BF
   4 Shortcut Layer: 1,  wt = 0, wn = 0, outputs: 128 x 608 x  64 0.005 BF
   5 conv    128       3 x 3/ 2    128 x 608 x  64 ->   64 x 304 x 128 2.869 BF
   6 conv     64       1 x 1/ 1     64 x 304 x 128 ->   64 x 304 x  64 0.319 BF
   7 conv    128       3 x 3/ 1     64 x 304 x  64 ->   64 x 304 x 128 2.869 BF
   8 Shortcut Layer: 5,  wt = 0, wn = 0, outputs:  64 x 304 x 128 0.002 BF
   9 conv     64       1 x 1/ 1     64 x 304 x 128 ->   64 x

#### 2. Extract the feature map itself ?

Uncomment the following lines within the function ```forward_network_gpu``` in ```src/network_kernels.cu``` to save the feature maps of each layer. 

**Caution:** Enable saving feature map will lead to significant performance and storage overheads. Besides, in the case of processing a series of test images, the source code should be overhauled to save feature maps to certain directories.


```c
        cuda_pull_array(l.output_gpu, l.output, l.batch*l.outputs);
        if (l.out_w >= 0 && l.out_h >= 1 && l.c >= 3) {
            int j;
            for (j = 0; j < l.out_c; ++j) {
                image img = make_image(l.out_w, l.out_h, 3);
                memcpy(img.data, l.output + l.out_w*l.out_h*j, l.out_w*l.out_h * 1 * sizeof(float));
                memcpy(img.data + l.out_w*l.out_h * 1, l.output + l.out_w*l.out_h*j, l.out_w*l.out_h * 1 * sizeof(float));
                memcpy(img.data + l.out_w*l.out_h * 2, l.output + l.out_w*l.out_h*j, l.out_w*l.out_h * 1 * sizeof(float));
                char buff[256];
                sprintf(buff, "layer-%d slice-%d", i, j);
                // show_image(img, buff);
                save_image(img, buff);
            }
            // cvWaitKey(0); // wait press-key in console
            // cvDestroyAllWindows();
        }
```

## C. Run Darknet detection for a series of images

Now the Darknet repo is set to properly print/save the information required for fault injection campaign for **single** image. To process a **list of images** and save the **classifications with the coordinates of bounding boxes**, using the following command:

In [6]:
!./darknet detector test data/kitti.data cfg/yolov3-kitti.cfg yolov3-kitti_last.weights -dont_show -ext_output < data/test.txt > kitti_results.txt

In [7]:
!cat kitti_results.txt

net.optimized_memory = 0 
mini_batch = 1, batch = 64, time_steps = 1, train = 0 
Create CUDA-stream - 0 
 Create cudnn-handle 0 

 seen 64, trained: 448 K-images (7 Kilo-batches_64) 
Enter Image Path:  Detection layer: 82 - type = 28 
 Detection layer: 94 - type = 28 
 Detection layer: 106 - type = 28 
data/kitti_test/005142.png: Predicted in 15.233000 milli-seconds.
Pedestrian: 75%	(left_x:  468   top_y:  164   width:   62   height:  135)
Enter Image Path:  Detection layer: 82 - type = 28 
 Detection layer: 94 - type = 28 
 Detection layer: 106 - type = 28 
data/kitti_test/006877.png: Predicted in 15.061000 milli-seconds.
Car: 81%	(left_x:    2   top_y:  203   width:   65   height:   35)
Car: 58%	(left_x:  127   top_y:  197   width:   66   height:   29)
Car: 80%	(left_x:  333   top_y:  188   width:   41   height:   21)
Car: 55%	(left_x:  381   top_y:  183   width:   32   height:   22)
Car: 95%	(left_x:  425   top_y:  181   width:   38   height:   21)
Car: 99%	(left_x:  503   top_y:  1

## Modifications to fit Darknet in NVBitFI framework

#### The notes of patching Darknet is lost, I will go through the steps again and finish this section