# **DIVE INTO CODE COURSE**
## **Faster R-CNN and YOLO v3**
**Student Name**: Doan Anh Tien<br>
**Student ID**: 1852789<br>
**Email**: tien.doan.g0pr0@hcmut.edu.vn

Run the implementation of Faster R-CNN [1].

[1] Ren, S., He, K., Girshick, R., Sun, J .: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems. (2015 ) 91â€“99

https://arxiv.org/pdf/1506.01497.pdf

Please use the following. It is an implementation using Keras.

duckrabbits / ObjectDetection at master

### **[Problem 1] Learning and Estimation**
Please refer to the README to run the above implementation.

In [2]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1


In [None]:
# This is the script that I used with the terminal since using it in the notebook won't print out the messages.
!python train.py -p annotation.txt

<img src="run_faster_rcnn.png" style="width: 60%;">

In [None]:
!python predict.py -i ./kaggle_simpson_testset -c ./save/train_20220126-205657_config.pickle

Since my computer does not have CUDA and the required libraries, the predict part run very slow without the GPU-based iteration. However, I still able to make it run and the proof is provided as in the figure below.

<img src="run_faster_rcnn2.png" style="width: 60%;">

In [17]:
# The dataframe present the scores for each epoch
score = pd.read_csv('out.csv', header=0)

Unnamed: 0,Accuracy,RPN classifier,RPN regression,Detector classifier,Detector regression,Total
0,0.425,3.537379,0.195096,7.582271,4.09385,15.408596
1,0.525,3.860771,0.29368,6.85821,1.82279,12.835451
2,0.5,1.416993,0.116357,2.772097,0.79033,5.095777
3,0.5,0.838187,0.182856,2.327368,0.644791,3.993202
4,0.5,3.396416,0.152709,2.410058,0.738735,6.697919
5,0.5,1.377696,0.140327,2.194178,0.595407,4.307608
6,0.5,2.586178,0.096328,2.120901,0.505435,5.308843
7,0.5,1.919045,0.117681,2.260537,0.536625,4.833888
8,0.5,0.594891,0.133975,2.058122,0.378183,3.165171
9,0.525,2.241241,0.08424,2.041632,0.52815,4.895264


### **[Problem 2] Code reading**

**Where is the code that realizes the classifier?**

The classifier is implemented in the <span style='color:cyan'>get_model</span> function of faster_rcnn.py (line 21), and in the <span style='color:cyan'>get_models</span> function of the predict.py (line 58).

faster_rcnn.py (Line 21)
```python
classifier = nn.classifier(shared_layers, roi_input, C.num_rois, nb_classes=len(classes_count), trainable=True)
```

predict.py (Line 58)
```python
classifier = nn.classifier(feature_map_input, roi_input, C.num_rois, nb_classes=len(C.class_mapping), trainable=True)
```

The code that implement the model with classifier and loaded weights (start from Line 60)
```python
model_classifier = Model([feature_map_input, roi_input], classifier)
model_classifier.load_weights(C.model_path, by_name=True)
model_classifier.compile(optimizer='sgd', loss='mse')
```


**Where is the anchor box drawing implemented?**

The anchor is configured in the <span style='color:cyan'>detect_predict</span> function of the predict.py (Line 72)
```python
cv2.rectangle(img, (textOrg[0] - 5, textOrg[1]+baseLine - 5), (textOrg[0]+retval[0] + 5, textOrg[1]-retval[1] - 5), (0, 0, 0), 2)
cv2.rectangle(img, (textOrg[0] - 5,textOrg[1]+baseLine - 5), (textOrg[0]+retval[0] + 5, textOrg[1]-retval[1] - 5), (255, 255, 255), -1)
cv2.putText(img, textLabel, textOrg, cv2.FONT_HERSHEY_DUPLEX, 1, (0, 0, 0), 1)
```

**Where is the code that realizes RPN?**

The RPN is defined in the resnet.py file, and it is called in the faster_rcnn.py.

resnet.py's definition (Line 196)
```python
def rpn(base_layers,num_anchors):

    x = Convolution2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)

    x_class = Convolution2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
    x_regr = Convolution2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)

    return [x_class, x_regr, base_layers]
```

Function called in faster_rcnn.py (Line 18):
```python
rpn = nn.rpn(shared_layers, num_anchors)
```

**Where is the code that implements RoI pooling?**

The RoI pooling is implemented as a class in RoiPoolingConv.py, which is also used during the definition of the classifer of the Faster R-CNN model.

RoiPoolingConv class (Line 7)
```python
class RoiPoolingConv(Layer):
```

RoI pooling being called (Line 209):
```python
out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])
out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)
```

### **[Problem 3] Estimation by learned weights**

In [None]:
!python convert.py yolov3.cfg yolov3.weights model_data/yolo.h5

The output results

<img src="yolo.png" style="width: 60%;">

The keras model that has been saved after the process

<img src="saved_model_yolo.png" style="width: 30%;">

In [None]:
!python yolo_video.py --input Battefield4.mp4 --output Battefield4_yolo.mp4

The link for the output is attached here. Since the size is so large, I upload it to YouTube for storing.

In [None]:
!python yolo_video.py --model model_data/yolo.h5 --classes model_data/coco_classes.txt --image

Here I choose a film picture that I took, it has a car and a pottedplant. Unfortunately, the model does not regconize the dar as it was covered by leaves (perhaps). But luckily it did regconize the pottedplant

The original picture

<img src="car.jpg" style="width: 30%;">

The detected picture

<img src="yolo_car.png" style="width: 30%;">


### **[Problem 4] Create a file for learning**

In [11]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

simpson = pd.read_csv('annotation.txt', header=None)
n_feature, n_col = simpson.shape
print(n_feature, n_col)
simpson.head()


7889 6


Unnamed: 0,0,1,2,3,4,5
0,simpsons_dataset/abraham_grampa_simpson/pic_00...,57,72,52,72,abraham_grampa_simpson
1,simpsons_dataset/abraham_grampa_simpson/pic_00...,80,31,337,354,abraham_grampa_simpson
2,simpsons_dataset/abraham_grampa_simpson/pic_00...,128,48,285,407,abraham_grampa_simpson
3,simpsons_dataset/abraham_grampa_simpson/pic_00...,72,126,158,275,abraham_grampa_simpson
4,simpsons_dataset/abraham_grampa_simpson/pic_00...,123,61,294,416,abraham_grampa_simpson


In [12]:
le = LabelEncoder()

classes = simpson[5].unique()
simpson.iloc[:,5] = le.fit_transform(simpson.iloc[:,5])
simpson.head()

Unnamed: 0,0,1,2,3,4,5
0,simpsons_dataset/abraham_grampa_simpson/pic_00...,57,72,52,72,0
1,simpsons_dataset/abraham_grampa_simpson/pic_00...,80,31,337,354,0
2,simpsons_dataset/abraham_grampa_simpson/pic_00...,128,48,285,407,0
3,simpsons_dataset/abraham_grampa_simpson/pic_00...,72,126,158,275,0
4,simpsons_dataset/abraham_grampa_simpson/pic_00...,123,61,294,416,0


In [13]:
input = 'annotation.txt'
anno_output = 'output/simpson_train.txt'
classes_name = 'output/classes_name.txt'

with open(input) as f:
    lines = f.readlines()
    for i, line in enumerate(lines):
        split_line = line.split(',')
        img_path = split_line[0]
        split_line[0] = '../../faster-rcnn/' + img_path
        split_line[-1] = str(simpson.iloc[i,5]) + '\n'

        with open(anno_output, mode='a') as out_f:
            join_line = ','.join(split_line)
            join_line = join_line.replace('.jpg,', '.jpg ')
            out_f.write(join_line)

with open(classes_name, mode='a') as class_f:
    for name in classes:
        name +='\n'
        class_f.write(name)

### **[Problem 5] Confirmation that learning can be done**

In [None]:
!python train.py

After modify the train.py of YOLOv3, the train.py has been able to be executed and the learning can be performed. However, even with the CUDA installed and the GPU is used, the learning process still take significant long time to be finished, so I have put the terminal output in this cell below (up to the point of the below terminal's output, it has been 2 hours in my computer).

(base) F:\CODE\DIVEINTOCODE\diveintocode-ml\Week6\keras-yolo3-master\keras-yolo3-master>python train.py
2022-01-27 13:28:06.183136: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-27 13:28:06.797939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4486 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:20:00.0, compute capability: 6.1
Create YOLOv3 model with 9 anchors and 18 classes.
WARNING:tensorflow:Skipping loading of weights for layer conv2d_58 due to mismatch in shape ((1, 1, 1024, 69) vs (255, 1024, 1, 1)).
WARNING:tensorflow:Skipping loading of weights for layer conv2d_58 due to mismatch in shape ((69,) vs (255,)).
WARNING:tensorflow:Skipping loading of weights for layer conv2d_66 due to mismatch in shape ((1, 1, 512, 69) vs (255, 512, 1, 1)).
WARNING:tensorflow:Skipping loading of weights for layer conv2d_66 due to mismatch in shape ((69,) vs (255,)).
WARNING:tensorflow:Skipping loading of weights for layer conv2d_74 due to mismatch in shape ((1, 1, 256, 69) vs (255, 256, 1, 1)).
WARNING:tensorflow:Skipping loading of weights for layer conv2d_74 due to mismatch in shape ((69,) vs (255,)).
Load weights model_data/yolo_weights.h5.
Freeze the first 249 layers of total 252 layers.
2022-01-27 13:28:09.524593: I tensorflow/core/profiler/lib/profiler_session.cc:110] Profiler session initializing.
2022-01-27 13:28:09.528251: I tensorflow/core/profiler/lib/profiler_session.cc:125] Profiler session started.
2022-01-27 13:28:09.531430: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1630] Profiler found 1 GPUs
2022-01-27 13:28:09.611018: I tensorflow/core/profiler/lib/profiler_session.cc:143] Profiler session tear down.
2022-01-27 13:28:09.614811: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1764] CUPTI activity buffer flushed
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
C:\Users\Bin\anaconda3\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:367: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  warnings.warn(
Train on 7101 samples, val on 788 samples, with batch size 20.
C:\Users\Bin\anaconda3\lib\site-packages\tensorflow\python\keras\utils\generic_utils.py:494: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  warnings.warn('Custom mask layers require a config and must override '
Epoch 1/50
2022-01-27 13:28:22.893145: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201
  1/355 [..............................] - ETA: 1:21:57 - loss: 9223.09962022-01-27 13:28:27.171232: I tensorflow/core/profiler/lib/profiler_session.cc:110] Profiler session initializing.
2022-01-27 13:28:27.175546: I tensorflow/core/profiler/lib/profiler_session.cc:125] Profiler session started.
  2/355 [..............................] - ETA: 7:57 - loss: 8593.6660   2022-01-27 13:28:28.054954: I tensorflow/core/profiler/lib/profiler_session.cc:67] Profiler session collecting data.
2022-01-27 13:28:28.061347: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1764] CUPTI activity buffer flushed
2022-01-27 13:28:28.205979: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:526]  GpuTracer has collected 2913 callback api events and 2927 activity events.
2022-01-27 13:28:28.322951: I tensorflow/core/profiler/lib/profiler_session.cc:143] Profiler session tear down.
2022-01-27 13:28:28.501442: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logs/000/train\plugins\profile\2022_01_27_06_28_28

2022-01-27 13:28:28.605392: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to logs/000/train\plugins\profile\2022_01_27_06_28_28\BIN.trace.json.gz
2022-01-27 13:28:28.789619: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logs/000/train\plugins\profile\2022_01_27_06_28_28

2022-01-27 13:28:28.832345: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for memory_profile.json.gz to logs/000/train\plugins\profile\2022_01_27_06_28_28\BIN.memory_profile.json.gz
2022-01-27 13:28:29.005446: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: logs/000/train\plugins\profile\2022_01_27_06_28_28
Dumped tool data for xplane.pb to logs/000/train\plugins\profile\2022_01_27_06_28_28\BIN.xplane.pb
Dumped tool data for overview_page.pb to logs/000/train\plugins\profile\2022_01_27_06_28_28\BIN.overview_page.pb
Dumped tool data for input_pipeline.pb to logs/000/train\plugins\profile\2022_01_27_06_28_28\BIN.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logs/000/train\plugins\profile\2022_01_27_06_28_28\BIN.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logs/000/train\plugins\profile\2022_01_27_06_28_28\BIN.kernel_stats.pb

355/355 [==============================] - 912s 3s/step - loss: 337.8531 - val_loss: 43.3795
Epoch 2/50
355/355 [==============================] - 859s 2s/step - loss: 33.8784 - val_loss: 28.4187
Epoch 3/50
355/355 [==============================] - 928s 3s/step - loss: 25.9111 - val_loss: 24.2163
Epoch 4/50
355/355 [==============================] - 942s 3s/step - loss: 23.0570 - val_loss: 22.4284
Epoch 5/50
355/355 [==============================] - 909s 3s/step - loss: 21.6512 - val_loss: 21.3101
Epoch 6/50
355/355 [==============================] - 918s 3s/step - loss: 20.8120 - val_loss: 20.7309
Epoch 7/50
355/355 [==============================] - 1190s 3s/step - loss: 20.2807 - val_loss: 20.2473
Epoch 8/50
151/355 [===========>..................] - ETA: 8:40 - loss: 19.9065