# **YOLO-CIANNA Object Detection example on PASCAL-VOC**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Deyht/CIANNA/blob/CIANNA_dev/YOLO_CIANNA_object_detection_example_on_PASCAL_VOC.ipynb)

## **Introduction - Notebook Setup**

**Important notes**:   
1) Due to RAM limits on the free Colab version, the notebook kernel might crash at some points if running it all at once or if re-running specific cells multiple times. A simple restart of the runtime kernel (Runtime -> Restart runtime) will solve the issue without losing the locally saved files (datasets, network saves, framework, etc.). Then simply re-run from the group of cells that crashed.

Each **independent** part of the notebook has been verified to run on the free version of Colab.

2) The Introduction part, which includes dataset download/formatting and the CIANNA framework installation, must be run every time the runtime is fully shut down and disconnected but not after a simple runtime restart.


---


**Link to slides accompanying a more complete version of the notebook presented at the IRMIA-DL 2022 Summer School**  
https://github.com/Deyht/IRMIA_2022/blob/main/DL_obj_detetion_with_YOLO_slides_full_v2.pdf


<a name="cianna_install"></a>

## **1\. DL Framework (CIANNA) installation**

#### Query GPU allocation and properties


In [None]:
%%shell

nvidia-smi

cd /content/

git clone https://github.com/NVIDIA/cuda-samples/

cd /content/cuda-samples/Samples/1_Utilities/deviceQuery/

make SMS="50 60 70 80"

./deviceQuery | grep Capability | cut -c50- > ~/cuda_infos.txt
./deviceQuery | grep "CUDA Driver Version / Runtime Version" | cut -c57- >> ~/cuda_infos.txt

cd ~/

#### Clone CIANNA git repository

Choice of a specific commit to preserve the notebook from incompatibilty in futur CIANNA updates.

In [None]:
%%shell

cd /content/

git clone https://github.com/Deyht/CIANNA

cd CIANNA
git checkout 9a9c048

#### Compiling CIANNA for the allocated GPU generation

There is no guaranteed forward or backward compatibility between Nvidia GPU generation, and some capabilities are generation specific. For these reasons, CIANNA must be provided the platform GPU generation at compile time.
The following cell will automatically update all the necessary files based on the detected GPU, and compile CIANNA.

In [None]:
%%shell

cd /content/CIANNA

mult="10"
cat ~/cuda_infos.txt
comp_cap="$(sed '1!d' ~/cuda_infos.txt)"
cuda_vers="$(sed '2!d' ~/cuda_infos.txt)"

lim="11.1"
old_arg=$(awk '{if ($1 < $2) print "-D CUDA_OLD";}' <<<"${cuda_vers} ${lim}")

sm_val=$(awk '{print $1*$2}' <<<"${mult} ${comp_cap}")

gen_val=$(awk '{if ($1 >= 80) print "-D GEN_AMPERE"; else if($1 >= 70) print "-D GEN_VOLTA";}' <<<"${sm_val}")

sed -i "s/.*arch=sm.*/\\t\tcuda_arg=\"\$cuda_arg -D CUDA -D comp_CUDA -lcublas -lcudart -arch=sm_$sm_val $old_arg $gen_val\"/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" src/python_module_setup.py

pyth_ver=$(python3 -c 'import sys; print("%d.%d"%(sys.version_info[:][0], sys.version_info[:][1]))')

sed -i "s/\/lib.linux-x86_64-[0-9].[0-9]/\/lib.linux-x86_64-$pyth_ver/g" ex_script.py

./compile.cp CUDA PY_INTERF

mv src/build/lib.linux-x86_64-* src/build/lib.linux-x86_64

#### Testing CIANNA installation

**IMPORTANT NOTE**   
CIANNA is mainly used in a script fashion and was not designed to run in notebooks. Every cell code that directly invokes CIANNA functions must be run as a script to avoid possible errors.  
To do so, the cell must have the following structure.

```
%%shell

cd /content/CIANNA

python3 - <<EOF

[... your python code ...]

EOF
```

This syntax allows one to easily edit python code in the notebook while running the cell as a script. Note that all the notebook variables can not be accessed by the cell in this context.


In [None]:
%%shell

cd /content/CIANNA

tar -xvzf mnist.tar.gz

In [None]:
%%shell


#Strictly equivalent to ex_script.py in the CIANNA repo 

cd /content/CIANNA

python3 - <<EOF


import numpy as np
import matplotlib.pyplot as plt
#Uncomment to access a locally compiled version

import sys
sys.path.insert(0,"/content/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn

############################################################################
##              Data reading (your mileage may vary)
############################################################################

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

print ("Reading inputs ... ", end = "", flush=True)

#Loading binary files
data = np.fromfile("mnist_dat/mnist_input.dat", dtype="float32")
data = np.reshape(data, (80000,28*28))
target = np.fromfile("mnist_dat/mnist_target.dat", dtype="float32")
target = np.reshape(target, (80000,10))


data_train = data[:60000,:]
data_valid = data[60000:70000,:]
data_test  = data[70000:80000,:]

target_train = target[:60000,:]
target_valid = target[60000:70000,:]
target_test  = target[70000:80000,:]

print ("Done !", flush=True)

############################################################################
##               CIANNA network construction and use
############################################################################

#Details about the functions and parameters are given in the GitHub Wiki

cnn.init(in_dim=i_ar([28,28]), in_nb_ch=1, out_dim=10, \
		bias=0.1, b_size=24, comp_meth="C_CUDA", dynamic_load=1, mixed_precision="FP32C_FP32A") #Change to C_BLAS or C_NAIV


cnn.create_dataset("TRAIN", size=60000, input=data_train, target=target_train)
cnn.create_dataset("VALID", size=10000, input=data_valid, target=target_valid)
cnn.create_dataset("TEST", size=10000, input=data_test, target=target_test)

#Used to load a saved network at a given epoch
#With load_step = 0, the network is trained from scratch
load_step = 0
if(load_step > 0):
	cnn.load("net_save/net0_s%04d.dat"%(load_step), load_step)
else:
  cnn.conv(f_size=i_ar([5,5]), nb_filters=8, padding=i_ar([2,2]), activation="RELU")
  cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
  cnn.conv(f_size=i_ar([5,5]), nb_filters=16, padding=i_ar([2,2]), activation="RELU")
  cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
  cnn.dense(nb_neurons=256, activation="RELU", drop_rate=0.5)
  cnn.dense(nb_neurons=128, activation="RELU", drop_rate=0.2)
  cnn.dense(nb_neurons=10, activation="SMAX")

cnn.train(nb_epoch=10, learning_rate=0.0004, momentum=0.9, confmat=1, save_every=0)
#Change save_every in previous function to save network weights
cnn.perf_eval()


#Uncomment to save network prediction
cnn.forward(repeat=1, drop_mode="AVG_MODEL")

del (data, target, data_train, target_train, data_valid, target_valid, data_test, target_test)


EOF



---


<a name="dataset_download"></a>

## **2\. Download and visualize PASCAL-VOC 2007 test dataset**

**Notes:** The following cells are simplified versions of the scripts that were used to format the datasets of PASCAL-VOC 2007 and 2012 and perform the necessary data augmentations for training. Here only the 2007 test dataset is handled.

In [None]:
%%shell

cd /content/

mkdir datasets
cd datasets

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar

tar -xf VOCtest_06-Nov-2007.tar

<a name="data_format"></a>
#### Format dataset

In [None]:
%cd /content/datasets/

import numpy as np
from tqdm import tqdm
from PIL import Image

def make_square(im, min_size, fill_color=(0, 0, 0, 0)):
    x, y = im.size
    size = max(min_size, x, y)
    new_im = Image.new('RGB', (size, size), fill_color)
    new_im.paste(im, (int((size - x) / 2), int((size - y) / 2)))
    return new_im

test_list_2007  = np.loadtxt("/content/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_test_2007 = 4952
orig_nb_images = nb_test_2007
nb_keep_val = 2000
image_size_raw = 480
#image_size = 416
nb_class = 20

all_im = np.zeros((nb_keep_val, image_size_raw, image_size_raw, 3), dtype="uint8")
all_im_prop = np.zeros((nb_keep_val, 4), dtype="float32")

for i in tqdm(range(0, nb_keep_val)):

	im = Image.open("/content/datasets/VOCdevkit/VOC2007/JPEGImages/"+test_list_2007[i]+".jpg")
	
	width, height = im.size

	im = make_square(im, image_size_raw)
	width2, height2 = im.size

	x_offset = int((width2 - width)*0.5)
	y_offset = int((height2 - height)*0.5)

	all_im_prop[i] = [x_offset, y_offset, width2, height2]

	im = im.resize((image_size_raw,image_size_raw))
	im_array = np.asarray(im)
	for depth in range(0,3):
		all_im[i,:,:,depth] = im_array[:,:,depth]

all_im.tofile("all_im.dat")
all_im_prop.tofile("all_im_prop.dat")

#### Dataset summary statistics and visualization

In [None]:
%cd /content/datasets/

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

test_list_2007  = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_test_2007 = 4952
orig_nb_images = nb_test_2007
nb_keep_val = 4952
image_size = 480
nb_class = 20

object_list = np.zeros((orig_nb_images,1+nb_class))

for i in tqdm(range(0, orig_nb_images)):
	
  tree = ET.parse("VOCdevkit/VOC2007/Annotations/"+test_list_2007[i]+".xml")
  root = tree.getroot()

  root = tree.getroot()

  k = 0
  im_obj_list = root.findall("object", namespaces=None)
  object_list[i,0] = len(im_obj_list)
  for obj in im_obj_list:
    diff = obj.find("difficult", namespaces=None)
    if(diff.text == "1"):
      object_list[i,0] -= 1
      continue
    oclass = obj.find("name", namespaces=None)
    int_class = np.where(class_list[:] == oclass.text)[0] + 1
    object_list[i,int_class] += 1

plt.rcParams.update({'font.size': 6})

val_dat = np.sum(object_list[orig_nb_images-nb_keep_val:,1:],axis=0)

print("%8d"%np.sum(val_dat),end="")
for k in range(0,nb_class):
  print("%8d"%val_dat[k], end="")
print("")
print("")

plt.subplots(figsize=(6,2),dpi=190, constrained_layout=True)
plt.bar(np.arange(0,nb_class), val_dat, width=0.3, align="center", label="Val")
plt.xticks(np.arange(0,nb_class), class_list, fontsize=6, rotation = 45)
plt.legend()
#plt.yscale('log')
plt.show()



In [None]:
nb_keep_val = 2000

all_im = np.fromfile("all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((nb_keep_val, image_size_raw, image_size_raw, 3)))
all_im_prop = np.reshape(all_im_prop,(nb_keep_val, 4))

In [None]:
id_start = 0 #define the beginning of the serie, then display nb_w * nb_h examples

nb_w = 4
nb_h = 8

fig, ax = plt.subplots(figsize=(5.0,0.4), dpi=180, constrained_layout=True)
ax.axis('off')
fig.patch.set_facecolor('black')

for k in range(0, nb_class):
	ax.text(k%10*0.12, k//10*0.5, class_list_short[k], color=plt.cm.tab20(k), fontsize=8)

plt.show()
print("")

fig, ax = plt.subplots(nb_h, nb_w, figsize=(1.5*nb_w,1.5*nb_h), dpi=210, constrained_layout=True)

for i in range(0, nb_h):
  for j in range(0, nb_w):
    i_d = j + i*nb_w + id_start

    x_offset, y_offset, width2, height2 = all_im_prop[i_d]

    c_data = all_im[i_d]/255.0
    ax[i,j].imshow(c_data)
    ax[i,j].axis('off')

    tree = ET.parse("VOCdevkit/VOC2007/Annotations/"+test_list_2007[i_d]+".xml")
    root = tree.getroot()
    
    obj_list = root.findall("object", namespaces=None)
    for obj in obj_list:
      diff = obj.find("difficult", namespaces=None)
      if(diff.text == "1"):
        continue
      oclass = obj.find("name", namespaces=None)
      bndbox = obj.find("bndbox", namespaces=None)

      int_class = np.where(class_list[:] == oclass.text)[0][0]
      xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
      ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
      xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
      ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2

      el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.7, ls="--", fill=False, color=plt.cm.tab20(int_class), zorder=3)
      c_patch = ax[i,j].add_patch(el)
      c_text = ax[i,j].text(xmin+6, ymax-10, "%s"%(class_list_short[int_class]), c=plt.cm.tab20(int_class), fontsize=4, clip_on=True)
      c_patch.set_path_effects([path_effects.Stroke(linewidth=1.8, foreground='black'),
                       path_effects.Normal()])
      c_text.set_path_effects([path_effects.Stroke(linewidth=1.0, foreground='black'),
                       path_effects.Normal()])

#plt.savefig("target_moisaic.png", dpi=250)
plt.show()

In [None]:
#Free the RAM before going further in the notebook
#A RUNTIME RESTART IS ADVISED

del (all_im, all_im_prop)


## **3 - The YOLO object detector**

In [None]:
%%shell

cd /content/
mkdir yolo_detector
cd yolo_detector

#### Dataset loading functions for dynamic data handling with CIANNA

In [None]:
%%writefile /content/yolo_detector/data_gen.py

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import patches
import matplotlib.patheffects as path_effects
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image, ImageEnhance, ImageOps
import os #, re, glob
import gc

import imgaug as ia
import imgaug.augmenters as iaa
from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
from imgaug.augmentables.batches import UnnormalizedBatch


class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","diningtable","dog","horse","motorbike",\
    "person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","table","dog","horse", "m-bike",\
    "person","p-plant","sheep","sofa","train","tv","empty"])

test_list_2007  = np.loadtxt("/content/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

## Data augmentation
def init_data_gen():
  global nb_train_2012, nb_train_2007, nb_test_2007, orig_nb_images
  global nb_images_per_batch, nb_keep_val, max_nb_obj_per_image, image_size_raw, image_size, seq_iaa, seq_iaa2
  global input_data, targets, input_val, targets_val, all_im, all_im_prop, nb_class, seq_iaa3

  nb_test_2007 = 4952
  orig_nb_images = nb_test_2007
  nb_keep_val = 2000 #4952
  image_size_raw = 480
  image_size = 416
  nb_class = 20
  max_nb_obj_per_image = 56
  
  seq_iaa3 = iaa.Sequential([
          iaa.Resize((image_size,image_size))])
  
  all_im = np.fromfile("/content/datasets/all_im.dat", dtype="uint8")
  all_im_prop = np.fromfile("/content/datasets/all_im_prop.dat", dtype="float32")
  all_im = np.reshape(all_im, ((nb_keep_val, image_size_raw, image_size_raw, 3)))
  all_im_prop = np.reshape(all_im_prop,(nb_keep_val, 4))

  input_val = np.zeros((nb_keep_val,image_size*image_size*3), dtype="float32")
  targets_val = np.zeros((nb_keep_val,1+max_nb_obj_per_image*(7+1)), dtype="float32")


def create_val_batch(visual_w=0, visual_h=0):
  visual_iter = 0

  for i in range(0, nb_keep_val):
    
    i_d = i

    tree = ET.parse("../datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[i]+".xml")
    root = tree.getroot()
    
    patch = all_im[i_d]

    x_offset, y_offset, width2, height2 = all_im_prop[i_d]

    obj_list = root.findall("object", namespaces=None)
    nb_box = len(obj_list)
    #for obj in obj_list:
    #	diff = obj.find("difficult", namespaces=None)
    #	if(diff.text == "1"):
    #		nb_box -= 1
    #		continue
    
    bbox_list = np.zeros((nb_box,6))
    
    k = 0
    for obj in obj_list:
      diff = obj.find("difficult", namespaces=None)
      #if(diff.text == "1"):
      #	continue
      oclass = obj.find("name", namespaces=None)
      bndbox = obj.find("bndbox", namespaces=None)
      
      int_class = int(np.where(class_list[:] == oclass.text)[0])
      xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_raw/width2
      ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_raw/height2
      xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_raw/width2
      ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_raw/height2
      
      bbox_list[k,:] = np.array([xmin,ymin,xmax,ymax,int_class,0])
      if(diff.text == "1"):
        bbox_list[k,5] = 1
      k += 1
      
    bbs = BoundingBoxesOnImage.from_xyxy_array(bbox_list[:,:4], shape=patch.shape)
    
    patch_aug, bbs_aug = seq_iaa3(image=patch,bounding_boxes=bbs)
    
    for depth in range(0,3):
      input_val[i,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
    
    targets_val[i,:] = 0.0
    targets_val[i,0] = nb_box
    b_pos = 0
    for k in range(0, nb_box):
      l_b = bbs_aug.bounding_boxes[k]
      xmin = l_b.x1
      ymin = l_b.y1
      xmax = l_b.x2
      ymax = l_b.y2
        
      n_xmin = np.clip(xmin, 0, image_size)
      n_ymin = np.clip(ymin, 0, image_size)
      n_xmax = np.clip(xmax, 0, image_size)
      n_ymax = np.clip(ymax, 0, image_size)
      
      frac_in = (abs(n_xmax-n_xmin)*abs(n_ymax-n_ymin))/(abs(xmax-xmin)*abs(ymax-ymin))
      if(frac_in < 0.2 or (abs(n_xmax-n_xmin)*abs(n_ymax-n_ymin) < 16*16)):
        targets_val[i,0] -= 1
        continue
      if(frac_in < 0.5 or (abs(n_xmax-n_xmin)*abs(n_ymax-n_ymin) < 32*32)):
        bbox_list[k,5] = 1
    
      targets_val[i,1+b_pos*8:1+(b_pos+1)*8] = np.array([bbox_list[k,4]+1, n_xmin,n_ymin,0.0,n_xmax,n_ymax,1.0, bbox_list[k,5]])
      b_pos += 1
      
    if(targets_val[i,0] > max_nb_obj_per_image):
      targets_val[i,0] = max_nb_obj_per_image
    
    if(visual_w*visual_h > 0):
      if(visual_iter == 0):
        fig, ax = plt.subplots(visual_h, visual_w, figsize=(2*visual_w,2*visual_h), dpi=180, constrained_layout=True)
      
      c_x = visual_iter // visual_w
      c_y = visual_iter % visual_w
      
      ax[c_x,c_y].imshow(patch_aug)
      ax[c_x,c_y].axis('off')
      
      targ_boxes = targets_val[i]
      for k in range(0, int(targ_boxes[0])):
        xmin = targ_boxes[1+k*8+1]
        ymin = targ_boxes[1+k*8+2]
        xmax = targ_boxes[1+k*8+4]
        ymax = targ_boxes[1+k*8+5]
        p_c = int(targ_boxes[1+k*8+0]) - 1
      
        el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.7, ls="--", fill=False, color=plt.cm.tab20(p_c), zorder=3)
        c_patch = ax[c_x,c_y].add_patch(el)
        c_text = ax[c_x,c_y].text(xmin+6, ymax-10, "%s"%(class_list_short[p_c]), c=plt.cm.tab20(p_c), fontsize=4, clip_on=True)
        c_patch.set_path_effects([path_effects.Stroke(linewidth=1.8, foreground='black'),
                        path_effects.Normal()])
        c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                        path_effects.Normal()])

      
      visual_iter += 1
      if(visual_iter >= visual_w*visual_h):
        plt.savefig("test_mosaic.jpg", dpi=400)
        #plt.show()
        return 
  
  return input_val, targets_val


def free_data_gen():
  global all_im, all_im_prop, input_val, targets_val
  del (all_im, all_im_prop, input_val, targets_val)
  gc.collect()
  return




In [None]:
%%writefile /content/yolo_detector/test_gen.py

import data_gen as gn

gn.init_data_gen()

print("\nOrdered validation examples")
gn.create_val_batch(4,3)

gn.free_data_gen()


In [None]:
# Might need to reload the notebook execution environment to unload previous data_gen afters changes
%cd /content/yolo_detector/

import subprocess
from IPython.display import Image

subprocess.call(["python3", "test_gen.py"])

Image("test_mosaic.jpg")


#### Downloading trained YOLO network

This light network was pre-trained on the ImageNet dataset for classification (simplified to 486 classes with 900 example of each). It was trained for the most part using 224x224 pixel images, and then refined at a 416x416 resolution. The network reaches 58% Top-1 accuracy on this modified version of the ImageNet classification.

The first 10 convolutional layers are kept and 3 new randomly initialzed conv layers are added to form the present YOLO detector. The resulting network was trained for detection using both PASCAL 2012 Trainval and 2007 Trainval images at 416x416 resolution (total of 16551 training images, for a total of 40058 objects, excluding the ones that are flagged as difficult).

In [None]:
%%shell

cd /content/yolo_detector/

wget https://share.obspm.fr/s/9LBRTqkpaT5RKK5/download/net_train_pascal_416_bf16_54map_v6.dat


#### Perform network prediction

In [None]:
%%shell

cd /content/yolo_detector/

python3 - <<EOF

import numpy as np
import matplotlib.pyplot as plt
from threading import Thread
import data_gen as gn
import re
import os

import sys
sys.path.insert(0,"/content/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn

load_epoch = 0
if (len(sys.argv) > 1):
	load_epoch = int(sys.argv[1])
	

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

nb_keep_val = 2000
nb_param = 0
nb_class = 20

max_nb_obj_per_image = 56

im_size = 416
nb_box = 5


cnn.init(in_dim=i_ar([im_size,im_size]), in_nb_ch=3, out_dim=1+max_nb_obj_per_image*(7+nb_param+1),
	 b_size=8, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP32C_FP32A")

gn.init_data_gen()

input_val, targets_val = gn.create_val_batch()


cnn.create_dataset("TEST", nb_keep_val, input_val, targets_val)

##### YOLO parameters tuning #####

#Size priors for all possible boxes per grid. element 
prior_w = f_ar([24.,90.,150.,208.,336.])
prior_h = f_ar([24.,150.,90.,336.,208.])

slopes_and_maxes = cnn.set_slopes_and_maxes(
						position    = cnn.set_sm_single(slope=1.0, fmax=4.5, fmin=-4.5),
						size        = cnn.set_sm_single(slope=0.5, fmax=1.6, fmin=-1.6),
						probability = cnn.set_sm_single(slope=1.0, fmax=4.5, fmin=-4.5),
						objectness  = cnn.set_sm_single(slope=1.0, fmax=4.5, fmin=-4.5),
						classes     = cnn.set_sm_single(slope=1.0, fmax=4.5, fmin=-4.5))

nb_yolo_filters = cnn.set_yolo_params(nb_box = nb_box, nb_class = nb_class, nb_param=nb_param, max_nb_obj_per_image=max_nb_obj_per_image,
				prior_w = prior_w, prior_h = prior_h, slopes_and_maxes = slopes_and_maxes, class_softmax=1, diff_flag=1)

cnn.load("net_train_pascal_416_bf16_54map_v6.dat",0,bin=1)

cnn.forward(repeat=1,no_error=1, saving=2, drop_mode="AVG_MODEL")

EOF

#### Display predicted boxes

In [None]:
%cd /content/yolo_detector/

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image

import re
import bisect
import os
import sys
from numba import jit

class_list = np.array(["aeroplane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","diningtable","dog","horse", "motorbike",\
    "person","pottedplant","sheep","sofa","train","tvmonitor","background"])
class_list_short = np.array(["plane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","table","dog","horse", "m-bike",\
    "person","p-plant","sheep","sofa","train","tv","background"])

test_list = np.loadtxt("/content/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_test_2007 = 4952
orig_nb_images = 2000
nb_keep_val = 2000

image_size_raw = 480
image_size = 416
nb_box = 5
nb_class = 20
nb_param = 0

max_nb_obj_per_image = 56

yolo_nb_reg = int(image_size/32)
c_size = 32

all_im = np.fromfile("/content/datasets/all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("/content/datasets/all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((nb_keep_val, image_size_raw, image_size_raw, 3)))
all_im_prop = np.reshape(all_im_prop,(nb_keep_val, 4))

load_epoch = 0

prior_w = np.array([24.,90.,150.,208.,336.])
prior_h = np.array([24.,150.,90.,336.,208.])

pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%load_epoch, dtype="float32")
predict = np.reshape(pred_raw, (nb_keep_val, nb_box*(8+nb_param+nb_class),yolo_nb_reg,yolo_nb_reg))

@jit(nopython=True, cache=True, fastmath=False)
def global_to_tile_coord(offset_tab, tile_coords, priors, c_size):
	bx = (offset_tab[0] + tile_coords[1])*c_size
	by = (offset_tab[1] + tile_coords[0])*c_size
	bw = priors[0]*np.exp(offset_tab[3])
	bh = priors[1]*np.exp(offset_tab[4])
	return float(bx), float(by), float(bw), float(bh)
 
@jit(nopython=True, cache=True, fastmath=False)
def box_extraction(c_pred, c_box, c_tile, prob_obj_cases, class_soft_limit):
  c_nb_box = 0
  for i in range(0,yolo_nb_reg):
    for j in range(0,yolo_nb_reg):
      for k in range(0,nb_box):
        offset = int(k*(8+nb_param+nb_class)) #no +1 for box prior in prediction
        c_box[4] = c_pred[offset+6,i,j]
        c_box[5] = c_pred[offset+7,i,j]
        p_c = np.max(c_pred[offset+8:offset+8+nb_class,i,j])
        cl = np.argmax(c_pred[offset+8:offset+8+nb_class,i,j]) 
        
        if(c_box[5]*p_c >= prob_obj_cases[k] and p_c > class_soft_limit[0]):
          bx, by, bw, bh = global_to_tile_coord(c_pred[offset:offset+6,i,j], \
                    np.array([i,j]), np.array([prior_w[k], prior_h[k]]), c_size)
          c_box[0] = max(0,bx - bw*0.5)
          c_box[1] = max(0,by - bh*0.5)
          c_box[2] = min(image_size,bx + bw*0.5)
          c_box[3] = min(image_size,by + bh*0.5)
          
          c_box[6] = k
          c_box[7:] = c_pred[offset+8:offset+8+nb_param+nb_class,i,j]
          c_tile[c_nb_box,:] = c_box[:]
          c_nb_box +=1

  return c_nb_box

@jit(nopython=True, cache=True, fastmath=False)
def fct_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d
	enclose_w = (max(box1[2], box2[2]) - min(box1[0], box2[0]))
	enclose_h = (max(box1[3], box2[3]) - min(box1[1],box2[1]))
	enclose_2d = enclose_w*enclose_h

	cx_a = (box1[2] + box1[0])*0.5; cx_b = (box2[2] + box2[0])*0.5
	cy_a = (box1[3] + box1[1])*0.5; cy_b = (box2[3] + box2[1])*0.5
	dist_cent = np.sqrt((cx_a - cx_b)*(cx_a - cx_b) + (cy_a - cy_b)*(cy_a - cy_b))
	diag_enclose = np.sqrt(enclose_w*enclose_w + enclose_h*enclose_h)

  # DIoU
	return float(inter_2d)/float(uni_2d) - float(dist_cent)/float(diag_enclose)
  # GIoU
	#return float(inter_2d)/float(uni_2d) - float(enclose_2d - uni_2d)/float(enclose_2d)
	
@jit(nopython=True, cache=True, fastmath=False)
def fct_classical_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d

	return float(inter_2d)/float(uni_2d)

#@jit(nopython=True, cache=True, fastmath=False)
def apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold):
  c_nb_box_final = 0
  is_match = 1
  c_box_size_prev = c_nb_box

  while(c_nb_box > 0):
    max_objct = np.argmax(c_tile[:c_box_size_prev,5]*np.amax(c_tile[:c_box_size_prev,7:], axis=1))
    c_box = np.copy(c_tile[max_objct])
    c_tile[max_objct,5] = 0.0
    c_tile_kept[c_nb_box_final] = c_box
    c_nb_box_final += 1
    c_nb_box -= 1
    i = 0
    
    for i in range(0,c_box_size_prev):
      if(c_tile[i,5] < 0.00000001):
        continue
      IoU = fct_IoU(c_box[:4], c_tile[i,:4])
      c_score = c_tile[i,5]*np.max(c_tile[i,7:])
      
      if((IoU > 0.2 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score >= 0.7)
         or (IoU > 0.1 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score < 0.7 and c_score >= 0.1)
         or (IoU > 0.4 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score >= 0.7)
         or (IoU > 0.3 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score < 0.7 and c_score >= 0.1)
         or (IoU > -0.1 and c_score < 0.1)):
        c_tile[i] = 0.0
        c_nb_box -= 1
     
  return c_nb_box_final

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")
patch = np.zeros((image_size, image_size), dtype="float32")

final_boxes = []

#Choice of filters that produce visually appealing results (!= best mAP )
obj_threshold = 3.5*np.array([0.1,0.1,0.1,0.1,0.1])
class_soft_limit = np.array([3.0/nb_class])

nms_threshold = 0.1
#Not used here, context dependant thresholds are defined in the NMS fct



for l in tqdm(range(0,nb_keep_val)):
	c_tile[:,:] = 0.0
	c_tile_kept[:,:] = 0.0

	c_pred = predict[l,:,:,:]
	c_nb_box = box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit)			

	c_nb_box_final = c_nb_box
	c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold)
	final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))
	
targets = np.zeros((nb_keep_val,1+max_nb_obj_per_image*7), dtype="float32")

class_count = np.zeros((nb_class))

for i in tqdm(range(0, nb_keep_val)):
	
	tree = ET.parse("/content/datasets/VOCdevkit/VOC2007/Annotations/"+test_list[i]+".xml")
	root = tree.getroot()
	
	x_offset, y_offset, width2, height2 = all_im_prop[i]
	
	k = 0
	obj_list = root.findall("object", namespaces=None)
	targets[i,0] = len(obj_list)
	for obj in obj_list:
		diff = obj.find("difficult", namespaces=None)
		#if(diff.text == "1"):
		#	targets[i,0] -= 1
		#	continue
		oclass = obj.find("name", namespaces=None)
		bndbox = obj.find("bndbox", namespaces=None)

		int_class = np.where(class_list[:] == oclass.text)
		xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
		ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
		xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
		ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2

		targets[i,1+k*7:1+(k+1)*7] = np.array([int_class[0][0]+1, xmin,ymin,0.0,xmax,ymax,1.0])
		class_count[int_class[0][0]] += 1
		
		k += 1



id_start = 0

nb_w = 4
nb_h = 8

fig, ax = plt.subplots(nb_h, nb_w, figsize=(1.5*nb_w,1.5*nb_h), dpi=210, constrained_layout=True)

for i in range(0, nb_h):
	for j in range(0, nb_w):
		i_d = i*nb_w + j + id_start
		
		c_data = all_im[i_d]/255.0
		ax[i,j].imshow(c_data)
		ax[i,j].axis('off')
		
		im_boxes = final_boxes[i_d]
		
		targ_boxes = targets[i_d]
		for k in range(0, int(targ_boxes[0])):
			xmin = (targ_boxes[1+k*7+1]) *(image_size_raw/image_size)
			ymin = (targ_boxes[1+k*7+2]) *(image_size_raw/image_size)
			xmax = (targ_boxes[1+k*7+4]) *(image_size_raw/image_size)
			ymax = (targ_boxes[1+k*7+5]) *(image_size_raw/image_size)
			p_c = int(targ_boxes[1+k*7+0]) - 1
			
			el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.4, ls="--", fill=False, color=plt.cm.tab20(p_c), zorder=3)
			c_patch = ax[i,j].add_patch(el)
			c_patch.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'),
												path_effects.Normal()])
			

		for k in range(0, np.shape(im_boxes)[0]):
			xmin = max(-0.5,(im_boxes[k,0])*(image_size_raw/image_size) - 0.5)
			ymin = max(-0.5,(im_boxes[k,1])*(image_size_raw/image_size) - 0.5)
			xmax = min(image_size_raw-0.5,(im_boxes[k,2])*(image_size_raw/image_size) - 0.5)
			ymax = min(image_size_raw-0.5,(im_boxes[k,3])*(image_size_raw/image_size) - 0.5)
			
			p_c = np.argmax(im_boxes[k,7:])
			
			el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.4, fill=False, color=plt.cm.tab20(p_c), zorder=3)
			c_patch = ax[i,j].add_patch(el)
			c_text = ax[i,j].text(xmin+5, ymax-4, "%s:%d-%0.2f-%0.2f"%(class_list[p_c],im_boxes[k,6],im_boxes[k,5],np.max(im_boxes[k,7:])), c=plt.cm.tab20(p_c), fontsize=2,clip_on=True)
			c_patch.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'),
												path_effects.Normal()])
			c_text.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'),
												path_effects.Normal()])

#plt.savefig("pred_mosaic.png",dpi=500, bbox_inches='tight')
plt.show()

#### Compute mean Average Precision

mAP@50 only on the first 2000 images of the test dataset.
The mPA on the the full PASCAL 2007 test dataset using this netowrk is ~53.94

In [None]:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image

import re
import bisect
import os
import sys
from numba import jit

class_list = np.array(["aeroplane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","diningtable","dog","horse", "motorbike",\
    "person","pottedplant","sheep","sofa","train","tvmonitor","background"])
class_list_short = np.array(["plane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","table","dog","horse", "m-bike",\
    "person","p-plant","sheep","sofa","train","tv","background"])

test_list = np.loadtxt("/content/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")


nb_test_2007 = 4952
orig_nb_images = 2000
nb_keep_val = 2000

image_size_raw = 480
image_size = 416
nb_box = 5
nb_class = 20
nb_param = 0

max_nb_obj_per_image = 56

yolo_nb_reg = int(image_size/32)
c_size = 32

all_im_prop = np.fromfile("/content/datasets/all_im_prop.dat", dtype="float32")
all_im_prop = np.reshape(all_im_prop,(nb_keep_val, 4))


load_epoch = 0

prior_w = np.array([24.,90.,150.,208.,336.])
prior_h = np.array([24.,150.,90.,336.,208.])


repeat = 1
pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%load_epoch, dtype="float32")
predict_raw = np.reshape(pred_raw, (nb_keep_val*repeat,nb_box*(8+nb_param+nb_class),yolo_nb_reg,yolo_nb_reg))

predict = np.zeros((nb_keep_val,nb_box*(8+nb_param+nb_class),yolo_nb_reg,yolo_nb_reg))

batch_size = 8
for i in range(0, nb_keep_val//batch_size + 1):
	if((i+1)*batch_size < nb_keep_val):
		length = batch_size
	else:
		length = nb_keep_val%batch_size
	
	l_pred = np.reshape(predict_raw[i*batch_size*repeat:(i*batch_size+length)*repeat],(repeat, length, nb_box*(8+nb_param+nb_class),yolo_nb_reg,yolo_nb_reg))
	predict[i*batch_size:i*batch_size+length] = np.mean(l_pred, axis=0)
	
	
targets = np.zeros((nb_keep_val,1+max_nb_obj_per_image*8), dtype="float32")

class_count = np.zeros((nb_class))

for i in range(0, nb_keep_val):
	i_d = i
	
	tree = ET.parse("/content/datasets/VOCdevkit/VOC2007/Annotations/"+test_list[i_d]+".xml")
	root = tree.getroot()
	
	x_offset, y_offset, width2, height2 = all_im_prop[i_d]
	
	k = 0
	obj_list = root.findall("object", namespaces=None)
	targets[i,0] = len(obj_list)
	for obj in obj_list:
		diff = obj.find("difficult", namespaces=None)
		#if(diff.text == "1"):
		#	targets[i,0] -= 1
		#	continue
		oclass = obj.find("name", namespaces=None)
		bndbox = obj.find("bndbox", namespaces=None)

		int_class = np.where(class_list[:] == oclass.text)
		xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_raw/width2 *(image_size/image_size_raw)
		ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_raw/height2 *(image_size/image_size_raw)
		xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_raw/width2 *(image_size/image_size_raw)
		ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_raw/height2 *(image_size/image_size_raw)

		targets[i,1+k*8:1+(k+1)*8] = np.array([int_class[0][0]+1, xmin,ymin,0.0,xmax,ymax,1.0,0.0])
		if(diff.text != "1"):
			class_count[int_class[0][0]] += 1
		else:
			targets[i,1+k*8+7] = 1.0
			
		k += 1


@jit(nopython=True, cache=True, fastmath=False)
def global_to_tile_coord(offset_tab, tile_coords, priors, c_size):
	bx = (offset_tab[0] + tile_coords[1])*c_size
	by = (offset_tab[1] + tile_coords[0])*c_size
	bw = priors[0]*np.exp(offset_tab[3])
	bh = priors[1]*np.exp(offset_tab[4])
	return float(bx), float(by), float(bw), float(bh)
 
@jit(nopython=True, cache=True, fastmath=False)
def box_extraction(c_pred, c_box, c_tile, prob_obj_cases, class_soft_limit):
  c_nb_box = 0
  for i in range(0,yolo_nb_reg):
    for j in range(0,yolo_nb_reg):
      for k in range(0,nb_box):
        offset = int(k*(8+nb_param+nb_class)) #no +1 for box prior in prediction
        c_box[4] = c_pred[offset+6,i,j]
        c_box[5] = c_pred[offset+7,i,j]
        p_c = np.max(c_pred[offset+8:offset+8+nb_class,i,j])
        cl = np.argmax(c_pred[offset+8:offset+8+nb_class,i,j])
        
        #print (np.sum(c_pred[offset+8:offset+8+nb_class,i,j]),c_pred[offset+8:offset+8+nb_class,i,j])
        
        if(c_box[5] >= prob_obj_cases[k] and p_c > class_soft_limit[0]):
          bx, by, bw, bh = global_to_tile_coord(c_pred[offset:offset+6,i,j], \
                    np.array([i,j]), np.array([prior_w[k], prior_h[k]]), c_size)
          c_box[0] = max(0,bx - bw*0.5)
          c_box[1] = max(0,by - bh*0.5)
          c_box[2] = min(image_size,bx + bw*0.5)
          c_box[3] = min(image_size,by + bh*0.5)
          
          c_box[6] = k
          c_box[7:] = c_pred[offset+8:offset+8+nb_param+nb_class,i,j]
          c_tile[c_nb_box,:] = c_box[:]
          c_nb_box +=1

  return c_nb_box

@jit(nopython=True, cache=True, fastmath=False)
def fct_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d
	enclose_w = (max(box1[2], box2[2]) - min(box1[0], box2[0]))
	enclose_h = (max(box1[3], box2[3]) - min(box1[1],box2[1]))
	enclose_2d = enclose_w*enclose_h

	cx_a = (box1[2] + box1[0])*0.5; cx_b = (box2[2] + box2[0])*0.5
	cy_a = (box1[3] + box1[1])*0.5; cy_b = (box2[3] + box2[1])*0.5
	dist_cent = np.sqrt((cx_a - cx_b)*(cx_a - cx_b) + (cy_a - cy_b)*(cy_a - cy_b))
	diag_enclose = np.sqrt(enclose_w*enclose_w + enclose_h*enclose_h)

  # DIoU
	return float(inter_2d)/float(uni_2d) - float(dist_cent)/float(diag_enclose)
  # GIoU
	#return float(inter_2d)/float(uni_2d) - float(enclose_2d - uni_2d)/float(enclose_2d)


@jit(nopython=True, cache=True, fastmath=False)
def fct_classical_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d

	return float(inter_2d)/float(uni_2d)


#@jit(nopython=True, cache=True, fastmath=False)
def apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold):
  c_nb_box_final = 0
  c_box_size_prev = c_nb_box

  while(c_nb_box > 0):
    max_objct = np.argmax(c_tile[:c_box_size_prev,5]*np.amax(c_tile[:c_box_size_prev,7:], axis=1))
    c_box = np.copy(c_tile[max_objct])
    c_tile[max_objct,5] = 0.0
    c_tile_kept[c_nb_box_final] = c_box
    c_nb_box_final += 1
    c_nb_box -= 1
    i = 0
    for i in range(0,c_box_size_prev):
      if(c_tile[i,5] < 0.00000001):
        continue
      IoU = fct_IoU(c_box[:4], c_tile[i,:4])
      c_score = c_tile[i,5]*np.max(c_tile[i,7:])
      
      if((IoU > 0.2 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score >= 0.7)
         or (IoU > 0.1 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score < 0.7 and c_score >= 0.1)
         or (IoU > 0.4 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score >= 0.7)
         or (IoU > 0.3 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score < 0.7 and c_score >= 0.1)
         or (IoU > -0.1 and c_score < 0.1)):
         c_tile[i] = 0.0
         c_nb_box -= 1
     
  return c_nb_box_final


c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")

final_boxes = []

#Choice of filters that produce visually appealing results (!= best mAP )
obj_threshold = 1.0*np.array([0.1,0.1,0.1,0.1,0.1])
class_soft_limit = np.array([4.0/nb_class])

nms_threshold = 0.1 
#Not used here, context dependant thresholds are defined in the NMS fct

for l in range(0,nb_keep_val):
	c_tile[:,:] = 0.0
	c_tile_kept[:,:] = 0.0

	c_pred = predict[l,:,:,:]
	c_nb_box = box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit)			

	c_nb_box_final = c_nb_box
	c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold)
	final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))

flat_pred_boxes = np.vstack(final_boxes)

AP_IoU_val = 0.5

recall_precision = np.empty((nb_keep_val), dtype="object")

print("Find associations ...", flush=True)

for i_d in range(0, nb_keep_val):
 
     recall_precision[i_d] = np.zeros((np.shape(final_boxes[i_d])[0], 6))
     
     if(np.shape(final_boxes[i_d])[0] == 0):
         continue
     
     recall_precision[i_d][:,0] = np.amax(final_boxes[i_d][:,7:], axis=1)
     recall_precision[i_d][:,1] = final_boxes[i_d][:,5]
     
     recall_precision[i_d][:,5] = np.argmax(final_boxes[i_d][:,7:], axis=1)
     
     kept_boxes = targets[i_d]
     
     IoU_table = np.zeros((int(kept_boxes[0]),np.shape(final_boxes[i_d])[0])) - 2.0
     
     for i in range(0,int(kept_boxes[0])):
         for j in range(0,np.shape(final_boxes[i_d])[0]):
             xmin = (kept_boxes[1+i*8+1])
             ymin = (kept_boxes[1+i*8+2])
             xmax = (kept_boxes[1+i*8+4])
             ymax = (kept_boxes[1+i*8+5])
             c_kept_box = np.array([xmin, ymin, xmax, ymax])
             IoU_table[i,j] = fct_classical_IoU(c_kept_box, final_boxes[i_d][j,:4])
             
     # Loop over the true boxes to find best prediction associated
     for i in range(0,int(kept_boxes[0])):
         best_match_id = np.unravel_index(np.argmax(IoU_table),np.shape(IoU_table))
         
         best_match_IoU = IoU_table[best_match_id]
         
         IoU_table[best_match_id[0],:] = -2.0
         
         if (best_match_IoU >= AP_IoU_val and np.argmax(final_boxes[i_d][best_match_id[1],7:]) == int(kept_boxes[1+best_match_id[0]*8+0]-1)):
         #if(c_IoU >= AP_IoU_val):
             recall_precision[i_d][best_match_id[1],2] = 1
             recall_precision[i_d][best_match_id[1],3] = best_match_id[1]
             recall_precision[i_d][best_match_id[1],4] = best_match_IoU
             IoU_table[:,best_match_id[1]] = -2.0
             if(kept_boxes[1+best_match_id[0]*8+7] > 0.99):
                 class_count[int(kept_boxes[1+best_match_id[0]*8+0]-1)] += 1
	

print("Process and flatten the mAP result")
flatten = np.vstack(recall_precision.flatten())

recall_precision_f = np.zeros((np.shape(flatten)[0], 10))
recall_precision_f[:,:6] = flatten[:,:]

recall_precision_fs = (recall_precision_f[(recall_precision_f[:,1]*recall_precision_f[:,0]).argsort()])[::-1]

recall_precision_fs[:,6] = np.cumsum(recall_precision_fs[:,2])
recall_precision_fs[:,7] = np.cumsum(1.0 - recall_precision_fs[:,2])
recall_precision_fs[:,8] = recall_precision_fs[:,6] / (recall_precision_fs[:,6]+recall_precision_fs[:,7])
recall_precision_fs[:,9] = recall_precision_fs[:,6] / np.sum(class_count)

np.savetxt("rec_prec.txt", recall_precision_fs)

interp_curve = np.maximum.accumulate(recall_precision_fs[::-1,8])[::-1]

AP_all = np.trapz(interp_curve, recall_precision_fs[:,9])
print ("AP_all (%.2f): %f%%"%(AP_IoU_val, AP_all*100.0))

    
plt.figure(figsize=(4*1.0,3*1.0), dpi=200, constrained_layout=True)
plt.plot(recall_precision_fs[:,9], recall_precision_fs[:,8])
plt.plot(recall_precision_fs[:,9], interp_curve, label="New")
plt.xlabel(r"Recall")
plt.ylabel(r"Precision")
plt.title("All classes as one AP curve", fontsize=8)

#print (class_count)
sumAP = 0
print ("**** Per class AP ****")
fig, ax = plt.subplots(figsize=(4*1.3,3*1.3), dpi=200, constrained_layout=True)
plt.xlabel(r"Recall")
plt.ylabel(r"Precision")
for k in range(0, nb_class):
	index = np.where(recall_precision_fs[:,5] == k)
	l_recall_precision_fs = recall_precision_fs[index[0]]
	l_recall_precision_fs[:,6] = np.cumsum(l_recall_precision_fs[:,2])
	l_recall_precision_fs[:,7] = np.cumsum(1.0 - l_recall_precision_fs[:,2])
	l_recall_precision_fs[:,8] = l_recall_precision_fs[:,6] / (l_recall_precision_fs[:,6]+l_recall_precision_fs[:,7])
	l_recall_precision_fs[:,9] = l_recall_precision_fs[:,6] / class_count[k]
	 
	interp_curve = np.maximum.accumulate(l_recall_precision_fs[::-1,8])[::-1]
	
	AP = np.trapz(interp_curve, l_recall_precision_fs[:,9])
	sumAP += AP
	
	plt.plot(l_recall_precision_fs[:,9], interp_curve, label=class_list_short[k],c=plt.cm.tab20(k))
	
	print("AP %-8s: %5.2f%%     Total: %4d - T: %4d - F: %4d"%(class_list_short[k], AP*100.0, class_count[k], l_recall_precision_fs[-1,6], l_recall_precision_fs[-1,7]))
plt.legend(bbox_to_anchor=(1.02,0.98), fontsize=8)
plt.title("Per class AP curve", fontsize=8)

print ("\n**** mAP (%.2f): %f%% ****"%(AP_IoU_val, sumAP/nb_class*100.0))

plt.savefig("AP_curve_@%.2f_per_class.jpg"%(AP_IoU_val))

## **4 - The YOLO object detector**

Prediction on an external image

In [None]:
%%shell

cd /content/datasets/

wget https://raw.githubusercontent.com/pjreddie/darknet/master/data/dog.jpg

In [None]:
%%shell

cd /content/yolo_detector/

python3 - <<EOF


import numpy as np
import cv2
import matplotlib.pyplot as plt

import sys
sys.path.insert(0,"/content/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn

class_list_short = np.array(["plane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","table","dog","horse", "m-bike",\
    "person","p-plant","sheep","sofa","train","tv","background"])

img_path = "/content/datasets/dog.jpg"

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

def make_square(im, min_size, fill_color=(0, 0, 0, 0)):
    x, y = im.size
    size = max(min_size, x, y)
    new_im = Image.new('RGB', (size, size), fill_color)
    new_im.paste(im, (int((size - x) / 2), int((size - y) / 2)))
    return new_im

#@jit(nopython=True, cache=True, fastmath=False)
def global_to_tile_coord(offset_tab, tile_coords, priors, c_size):
	bx = (offset_tab[0] + tile_coords[1])*c_size
	by = (offset_tab[1] + tile_coords[0])*c_size
	bw = priors[0]*np.exp(offset_tab[3])
	bh = priors[1]*np.exp(offset_tab[4])
	return float(bx), float(by), float(bw), float(bh)
 
#@jit(nopython=True, cache=True, fastmath=False)
def box_extraction(c_pred, c_box, c_tile, prob_obj_cases, class_soft_limit):
  c_nb_box = 0
  for i in range(0,yolo_nb_reg):
    for j in range(0,yolo_nb_reg):
      for k in range(0,nb_box):
        offset = int(k*(8+nb_param+nb_class)) #no +1 for box prior in prediction
        c_box[4] = c_pred[offset+6,i,j]
        c_box[5] = c_pred[offset+7,i,j]
        p_c = np.max(c_pred[offset+8:offset+8+nb_class,i,j])
        cl = np.argmax(c_pred[offset+8:offset+8+nb_class,i,j]) 
        
        if(c_box[5]*p_c >= prob_obj_cases[k] and p_c > class_soft_limit[0]):
          bx, by, bw, bh = global_to_tile_coord(c_pred[offset:offset+6,i,j], \
                    np.array([i,j]), np.array([prior_w[k], prior_h[k]]), c_size)
          c_box[0] = max(0,bx - bw*0.5 - 1)
          c_box[1] = max(0,by - bh*0.5 - 1)
          c_box[2] = min(image_size,bx + bw*0.5 + 1)
          c_box[3] = min(image_size,by + bh*0.5 + 1)
          
          c_box[6] = k
          c_box[7:] = c_pred[offset+8:offset+8+nb_param+nb_class,i,j]
          c_tile[c_nb_box,:] = c_box[:]
          c_nb_box +=1

  return c_nb_box

#@jit(nopython=True, cache=True, fastmath=False)
def fct_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d
	enclose_w = (max(box1[2], box2[2]) - min(box1[0], box2[0]))
	enclose_h = (max(box1[3], box2[3]) - min(box1[1],box2[1]))
	enclose_2d = enclose_w*enclose_h

	cx_a = (box1[2] + box1[0])*0.5; cx_b = (box2[2] + box2[0])*0.5
	cy_a = (box1[3] + box1[1])*0.5; cy_b = (box2[3] + box2[1])*0.5
	dist_cent = np.sqrt((cx_a - cx_b)*(cx_a - cx_b) + (cy_a - cy_b)*(cy_a - cy_b))
	diag_enclose = np.sqrt(enclose_w*enclose_w + enclose_h*enclose_h)

	return float(inter_2d)/float(uni_2d) - float(enclose_2d - uni_2d)/float(enclose_2d)


#@jit(nopython=True, cache=True, fastmath=False)
def apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold):
  c_nb_box_final = 0
  is_match = 1
  c_box_size_prev = c_nb_box

  while(c_nb_box > 0):
    max_objct = np.argmax(c_tile[:c_box_size_prev,5]*np.amax(c_tile[:c_box_size_prev,7:], axis=1))
    c_box = np.copy(c_tile[max_objct])
    c_tile[max_objct,5] = 0.0
    c_tile_kept[c_nb_box_final] = c_box
    c_nb_box_final += 1
    c_nb_box -= 1
    i = 0
    
    for i in range(0,c_box_size_prev):
      if(c_tile[i,5] < 0.00000001):
        continue
      IoU = fct_IoU(c_box[:4], c_tile[i,:4])
      c_score = c_tile[i,5]*np.max(c_tile[i,7:])
      
      if((IoU > 0.2 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score >= 0.7)
        or (IoU > 0.1 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score < 0.7 and c_score >= 0.1)
        or (IoU > 0.4 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score >= 0.7)
        or (IoU > 0.3 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score < 0.7 and c_score >= 0.1)
        or (IoU > -0.1 and c_score < 0.1)):
        c_tile[i] = 0.0
        c_nb_box -= 1
     
  return c_nb_box_final


img = cv2.imread(img_path, 1)

image_size = 416
nb_box = 5
nb_class = 20
nb_param = 0

max_nb_obj_per_image = 56

yolo_nb_reg = int(image_size/32)
c_size = 32

input_t = np.zeros((1,image_size*image_size*3), dtype="float32")
targets_t = np.zeros((1,1+max_nb_obj_per_image*(7+1)), dtype="float32")

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=1+max_nb_obj_per_image*(7+nb_param+1),
	 b_size=1, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP32C_FP32A")

prior_w = f_ar([24.,90.,150.,208.,336.])
prior_h = f_ar([24.,150.,90.,336.,208.])

slopes_and_maxes = cnn.set_slopes_and_maxes(
						position    = cnn.set_sm_single(slope=1.0, fmax=4.5, fmin=-4.5),
						size        = cnn.set_sm_single(slope=0.5, fmax=1.6, fmin=-1.6),
						probability = cnn.set_sm_single(slope=1.0, fmax=4.5, fmin=-4.5),
						objectness  = cnn.set_sm_single(slope=1.0, fmax=4.5, fmin=-4.5),
						classes     = cnn.set_sm_single(slope=1.0, fmax=4.5, fmin=-4.5))

nb_yolo_filters = cnn.set_yolo_params(nb_box = nb_box, nb_class = nb_class, nb_param=nb_param, max_nb_obj_per_image=max_nb_obj_per_image,
				prior_w = prior_w, prior_h = prior_h, slopes_and_maxes = slopes_and_maxes, class_softmax=1, diff_flag=1)


cnn.load("net_train_pascal_416_bf16_54map_v6.dat",0,bin=1)

obj_threshold = 3.0*np.array([0.1,0.1,0.1,0.1,0.1])
class_soft_limit = np.array([0.5])

nms_threshold = 0.0

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")
patch = np.zeros((image_size, image_size), dtype="float32")


old_size = img.shape[:2]
ratio = float(image_size)/max(old_size)
new_size = tuple([int(x*ratio) for x in old_size])

resized_img = cv2.resize(img, (new_size[1], new_size[0]), interpolation=cv2.INTER_AREA)

delta_w = image_size - new_size[1]
delta_h = image_size - new_size[0]
top, bottom = delta_h//2, delta_h-(delta_h//2)
left, right = delta_w//2, delta_w-(delta_w//2)

color = [0, 0, 0]
proc_img = cv2.copyMakeBorder(resized_img, top, bottom, left, right,
	cv2.BORDER_CONSTANT, value=color)


for depth in range(0,3):
	input_t[0,depth*image_size*image_size:(depth+1)*image_size*image_size] = proc_img[:,:,2-depth].flatten("C")/255.0

cnn.create_dataset("TEST", 1, input_t, targets_t, silent=1)

cnn.forward(no_error=1, saving=2, silent=1)

cnn.delete_dataset("TEST", silent=1)

pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%(0), dtype="float32")
predict = np.reshape(pred_raw, (1, nb_box*(8+nb_param+nb_class),yolo_nb_reg,yolo_nb_reg))

c_tile[:,:] = 0.0
c_tile_kept[:,:] = 0.0

c_pred = predict[0,:,:,:]
c_nb_box = box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit)

c_nb_box_final = c_nb_box
c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold)
#c_tile_kept = c_tile
final_boxes = np.copy(c_tile_kept[0:c_nb_box_final])

print (final_boxes)

for k in range(0,c_nb_box_final):
	if(final_boxes[k,6] < 0.3):
		continue
	
	xmin = int((final_boxes[k,0] - left)/ratio)
	ymin = int((final_boxes[k,1] - top)/ratio)
	xmax = int((final_boxes[k,2] - left)/ratio)
	ymax = int((final_boxes[k,3] - top)/ratio)
	
	p_c = np.argmax(final_boxes[k,7:])
		
	color = ((np.array(plt.cm.tab20(p_c))*255.0).astype("uint8")[0:3]).tolist()
	
	text = class_list_short[p_c]+" %0.2f"%(final_boxes[k,5])
	cv2.rectangle(img,(xmin,ymin),(xmax,ymax),color,2)
	(w, h), _ = cv2.getTextSize(text=text, 
		fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, thickness=1)
	cv2.rectangle(img,(xmin,ymax-30),(xmin+w+15,ymax),color,-1)
	cv2.putText(img, text=text, org=(xmin+10,ymax-8),
		fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=1)
		
cv2.imwrite("/content/yolo_detector/pred_img.png", img)


EOF

In [None]:
from IPython.display import Image

Image("/content/yolo_detector/pred_img.png")
