# YOLOv3

Source: https://machinelearningspace.com/yolov3-tensorflow-2-part-1/

## Arhitektura

YOLOv3 je sacinjen od 53 sloja nazvanim Darknet-53.

<figure style="display: inline-block">
  <img src="img/yolo_structure.png" width="400" height="500">
  <figcaption style="text-align: center"></figcaption>
</figure>

## Princip rada

YOLOv3 secka sliku na SxS resetku i predvidja granicne okvire i verovatnocu klase za svaku resetku. Za svaku resetku se predvidja B granicnih okvira i C verovatnoca klase za objekte <b>ciji se centar nalazi u resetci</b>. Svaka resetka ima (5 + C) atributa. Broj 5 predstavlja atribute granicnog okvira a to su: koordinate centra $(b_x,b_y)$ i oblik $(b_h, b_w)$ granicnog okvira i objektnost. Vrednost C je broj klasa. Objektnost predstavlja koliko je model siguran da okvir sadrzi objekat.<br>
YOLOv3 kreira 3D tensor oblika [S, S, B * (5 + C)].

<figure style="display: inline-block">
  <img src="img/yolo_work.png" width="800" height="700">
  <figcaption style="text-align: center"></figcaption>
</figure>

## Anchor Box algoritam

Problem koji zelimo da resimo je kada se u resetci nalazi vise od jednog centra objekta. To znaci da imamo vise objekta koji se preklapaju. Da bi prevazisao ovaj problem YOLOv3 koristi 3 drugacija anchor boxa za svaku skalu detekcije.

Anchor boxovi su set predefinisanih granicnih okvira odredjene visine i sirine koje koristimo da modelujemo drugaciju skalu i ascept ratio objekata koje zelimo da detektujemo.

## Predvidjanje kroz druge skale

YOLOv3 pravi detekcije u 3 drugacije skale da bi se prilagodio drugacijim velicima objekta tako sto koristi korake od 32, 16 i 8. To znaci da kada bi uneli sliku rezolucije 416x416, YOLOv3 bi pravio detekcija na velicinama 13x13, 26x26 i 52x52. 

<figure style="display: inline-block">
  <img src="img/yolo_scaling.png" width="800" height="600">
  <figcaption style="text-align: center"></figcaption>
</figure>

## Predvidjanje granicnih okvira

Za svaki granicni okvir, YOLO predvidja 3 koordinate $t_x, t_y, t_w, t_h$.Vrednosti| $t_x, t_y$ su koordinate centra granicnog okvira relativno na resetku gde se centar nalazi, $t_w, t_h$ su sirina i visina okvira.

Konacno predvidjanje okvira se dobija preko formule:

$b_x = \sigma(t_x) + c_x$ <br>
$b_y = \sigma(t_y) + c_y$ <br>
$b_w = p_w*e^{t_w}$ <br>
$b_h = p_h*e^{t_h}$

Vrednosti $p_w, p_h$ su sirina i visina anchor boxa, a $c_x, c_y$ koordinate resetke.

## Odbacivanje ne maksimalnih

1. Na izlaz CNN se dodaje dopunska objektnost koja mora koristiti sigmoidnu funkciju i gubitak binarna unakrsna entropija. Zatim se odbacuju svi granicni okviri cija vrednost objektnost je manja od odredjenog praga: tako ce nestati svi granicni okviri koji ne sadrze cvetove.
2. Nadje se granicni okvir sa najvisom vrednoscui ovjektnosti i odbace se svi drugi granicni okviri koji se znacajno prekrivaju sa njim (IoU > 60%).
3. Korak 2 se ponavlja sve dok vise ne bude granicnih okvira za odbacivanje

# Implementacija

## Parsovanje konfiguracija

YOLOv3 ima 2 bitna konfiguracijska fajla yolov3.cfg i yolov3.weights, prvi sadrzi informacije o arhitekturi dok drugi sadrzi parametre.

<a href="https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg">yolov3.cfg</a><br>
<a href="https://pjreddie.com/media/files/yolov3.weights">yolov3.weights</a><br>
<a href="https://github.com/pjreddie/darknet/blob/master/data/coco.names">coco.names</a>

In [1]:
import tensorflow as tf
from tensorflow import keras
from keras import Model
from keras.layers import BatchNormalization, LeakyReLU, Conv2D, Input, ZeroPadding2D, UpSampling2D

In [3]:
def parse_cfg(cfgfile):
  with open(cfgfile, 'r') as file:
    lines = [line.rstrip('\n') for line in file if line != '\n' and line[0] != '#']
  holder = {}
  blocks = []
  for line in lines:
    if line[0] == '[':
      line = 'type=' + line[1:-1].rstrip()
      if len(holder) != 0:
        blocks.append(holder)
        holder = {}
    key, value = line.split('=')
    holder[key.rstrip()] = value.lstrip()
  blocks.append(holder)
  return blocks

In [None]:
parse_cfg('cfg/yolov3.cfg')

## Kreiranje YOLOv3 mreze

In [5]:
def YOLOv3Net(cfgfile, model_size, num_classes):
  blocks = parse_cfg(cfgfile)
  
  outputs = {}
  output_filters = []
  filters = []
  out_pred = []
  scale = 0
  
  inputs = input_image = Input(shape=model_size)
  inputs /= 255.0

YOLOv3 ima 5 vrsta slojeva: 
1. konvolucioni sloj
2. upsample sloj
3. route sloj
4. shortcut sloj
5. yolo sloj

### Konvolucioni sloj

Imamo dve vrste konvolucionih slojeva: sa i bez BN sloja. Konvolucio sloj sa BN slojem koristi LeakyReLU, bez BN koristi linearnu aktivaciju. 

In [7]:
def parse_conv(block, inputs, name):
  activation = block['activation']
  filters = int(block['filters'])
  kernel_size = int(block['size'])
  strides = int(block['stride'])
  
  if strides > 1:
    inputs = ZeroPadding2D(((1, 0), (1, 0)))(inputs)
    
  inputs = Conv2D(filters,
                  kernel_size,
                  strides=strides,
                  padding='valid' if strides > 1 else 'same',
                  name='conv_' + str(name),
                  use_bias=False if ('batch_normalize' in block) else True)(inputs)
  if 'batch_normalize' in block:
    inputs = BatchNormalization(name='bnorm_' + str(name))(inputs)
  if activation == 'leaky':
    inputs = LeakyReLU(alpha=0.1, name='leaky_' + str(name))(inputs)
    
  return inputs

### Upsample sloj

In [8]:
def parse_upsample(block, inputs):
  stride = int(block['stride'])
  return UpSampling2D(stride)(inputs)

### Route sloj

Route sloj moze da ima jednu ili dve vrednosti. Kada ima jednu vrednost, npr. -4 to znaci da moramo da se vratimo 4 sloja unazad i izbacimo mapu osobina tog sloja. Kada imamo dva atributa, npr -1 i 61 to znaci da moramo da spojimo mape osobina proslog sloja i sloja 61.

In [9]:
def parse_route(block, current, output_filters, outputs):
  block['layers'] = block['layers'].split(',')
  start = int(block['layers'][0])
  
  if len(block['layers']) > 1:
    end = int(block['layers'][1]) - current
    filters = output_filters[current + start] + output_filters[end]
    inputs = tf.concat([outputs[current + start], outputs[current + end]], axis = -1)
  else:
    filters = output_filters[current + start]
    inputs = outputs[current + start]
    
  return (filters, inputs)

### Shortcut sloj

In [11]:
def parse_shortcut(block, outputs, current):
  from_ = int(block['from'])
  return outputs[current - 1] + outputs[current + from_]

### Yolo sloj

Kod za yolo sloj kao i ceo kod ovog dela se nalaze u fajlu yolov3.py

## Parsovanje tezina

Tezine su sacuvane u binarnom fajlu yolov3.weights kao float podatak. Tezine pripadaju konvolucionim slojevima, ali posto YOLOv3 koristi 2 konvoluciona sloja moramo biti pazljivi. Posto citamo samo float podatke ne znamo gde sta pripada, zbog toga je shvatanje strukture jako bitno.

In [14]:
import numpy as np
from yolov3 import YOLOv3Net
from yolov3 import parse_cfg

In [15]:
def load_weights(model, cfgfile, weightfile):
  fp = open(weightfile, 'rb')
  
  # Prvih 5 vrednosti su zaglavlje
  np.fromfile(fp, dtype=np.int32, count=5)
  
  block = parse_cfg(cfgfile)
  
  for i, block in enumerate(block[1:]):
    if(block['type'] == 'convolutional'):
      conv_layer = model.get_layer('conv_' + str(i))
      print('layer: ', i+1, conv_layer)
      
      filters = conv_layer.filters
      k_size = conv_layer.kernel_size[0]
      in_dim = conv_layer.input_shape[-1]
      
      if 'batch_normalize' in block:
        norm_layer = model.get_layer('bnorm_' + str(i))
        print('layer: ', i+1, norm_layer)
        size = np.prod(norm_layer.get_weights()[0].shape)
        
        bn_weights = np.fromfile(fp, dtype=np.float32, count=4 * filters)
        
        # tezine u tf i u fajlu su drugacije rasporedjene 
        bn_weights = bn_weights.reshape((4, filters))[[1, 0, 2, 3]]
      else:
        conv_bias = np.fromfile(fp, dtype=np.float32, count=filters)
      
      conv_shape = (filters, in_dim, k_size, k_size)
      conv_weights = np.fromfile(fp, dtype=np.float32, count=np.product(conv_shape))
      
      conv_weights = conv_weights.reshape(conv_shape).transpose([2, 3, 1, 0])
      
      if 'batch_normalize' in block:
        norm_layer.set_weights(bn_weights)
        conv_layer.set_weights([conv_weights])
      else:
        conv_layer.set_weights([conv_weights, conv_bias])
        
  assert len(fp.read()) == 0, 'failed to read all data'
  fp.close()
        
        

In [17]:
def main():
  weightfile = 'cfg/yolov3.weights'
  cfgfile = 'cfg/yolov3.cfg'
  
  model_size = (416, 416, 3)
  num_classes = 80
  
  model = YOLOv3Net(cfgfile, model_size, num_classes)
  load_weights(model, cfgfile, weightfile)
  
  try:
    model.save_weights('weights/yolov3_weights.tf')
    print('\nThe file \'yolov3_weights.tf\' has been saved successfully')
  except IOError:
    print("Couldn't write the file \'yolov3_weights.tf\'.")

In [18]:
main()

2022-03-25 15:52:52.195718: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-25 15:52:52.445729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-25 15:52:52.445936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-25 15:52:52.446302: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 416, 416, 3  0           []                               
                                )]                                                                
                                                                                                  
 tf.math.truediv (TFOpLambda)   (None, 416, 416, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv_0 (Conv2D)                (None, 416, 416, 32  864         ['tf.math.truediv[0][0]']        
                                )                                                                 
                                                                                              

## Pomocne funkcije

In [19]:
import tensorflow as tf
import numpy as np
import cv2 as cv

### Non-max suppression

In [21]:
def non_max_suppression(inputs, model_size, max_output_size,
                        max_output_size_per_class, iou_threshold, confidence_threshold):
  bbox, confs, class_probs = tf.split(inputs, [4, 1, -1], axis=-1)
  bbox /= model_size[0]
  
  scores = confs * class_probs
  boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
    boxes=tf.reshape(bbox, (tf.shape(bbox)[0], -1, 1, 4)),
    scores=tf.reshape(scores, (tf.shape(scores)[0], -1, tf.shape(scores)[-1])),
    max_output_size_per_class=max_output_size_per_class,
    max_total_size=max_output_size,
    iou_threshold=iou_threshold,
    score_threshold=confidence_threshold
  )
  
  return boxes, scores, classes, valid_detections

### resize_image()

In [22]:
def resize_image(inputs, modelsize):
  inputs = tf.image.resize(inputs, modelsize)
  return inputs

### load_class_names()

In [23]:
def load_class_names(file_name):
  with open(file_name, 'r') as f:
    class_names = f.read().splitlines()
  return class_names

### output_boxes()

In [24]:
def output_boxes(inputs, model_size, max_output_size, max_output_size_per_class,
                 iou_threshold, confidence_threshold):
  
  center_x, center_y, width, height, confidence, classes = \
    tf.split(inputs, [1, 1, 1, 1, 1, -1], axis=-1)
    
  top_left_x = center_x - width / 2.0
  top_left_y = center_y - height / 2.0
  bottom_right_x = center_x + width / 2.0
  bottom_right_y = center_y + height / 2.0
  
  inputs = tf.concat([top_left_x, top_left_y, bottom_right_x,
                      bottom_right_y, confidence, classes], axis=-1)
  
  boxes_dicts = non_max_suppression(inputs, model_size, max_output_size,
                                    max_output_size_per_class, iou_threshold, confidence_threshold)
  
  return boxes_dicts

In [26]:
def draw_outputs(img, boxes, objectness, classes, nums, class_names):
  boxes, objectness, classes, nums = boxes[0], objectness[0], classes[0], nums[0]
  boxes = np.array(boxes)
  
  for i in range(nums):
    x1y1 = tuple((boxes[i,0:2] * [img.shape[1], img.shape0]).astype(np.int32))
    x2y2 = tuple((boxes[i,2:4] * [img.shape[1], img.shape0]).astype(np.int32))
    
    img = cv.rectangle(img, (x1y1), (x2y2), (255, 0 ,0), 2)
    
    img = cv.putText(img, '{} {:.4f}'.format(
      class_names[int(classes[i])], objectness[i])
                     (x1y1), cv.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
    
  return img

## Kod za obradu slika

In [33]:
import tensorflow as tf
from utils import load_class_names, output_boxes, draw_outputs, resize_image
import cv2 as cv
import numpy as np
from yolov3 import YOLOv3Net

physical_devices = tf.config.experimental.list_logical_devices('GPU')
assert len(physical_devices) > 0, 'Not enough GPU hardware devices available'

model_size = (416, 416, 3)
num_classes = 80
class_names = './cfg/coco.names'
max_output_size = 40
max_output_size_per_class = 20
iou_threshold = 0.5
confidence_threshold = 0.5

cfg_file = 'cfg/yolov3.cfg'
weight_file = 'weights/yolov3_weights'
img_path = 'data/images/test.jpg'

In [34]:
def main():
  
  model = YOLOv3Net(cfg_file, model_size, num_classes)
  model.load_weights(weight_file)
  
  class_names = load_class_names(class_names)
  
  image = cv.imread(img_path)
  image = np.array(image)
  image = tf.expand_dims(image, 0)
  
  resized_frame = resize_image(image, (model_size[0], model_size[1]))
  pred = model.predict(resized_frame)
  
  boxes, scores, classes, nums = output_boxes(\
    pred, model_size,
    max_output_size,
    max_output_size_per_class,
    iou_threshold,
    confidence_threshold)
  
  image = np.squeeze(image)
  img = draw_outputs(image, boxes, scores, classes, nums, class_names)
  
  
  win_name = 'Image detection'
  cv.imshow(win_name, img)
  cv.waitKey(0)
  cv.destroyAllWindows()