# **DL IRMIA summer school : Object Detection with YOLO**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Deyht/IRMIA_2022/blob/main/IRMIA_DL_Summer_school_2022_Object_Detection_with_YOLO_full_v2.ipynb)

## **Introduction - Notebook Setup**

**Important notes**:   
1) Due to RAM limits on the free Colab version, the notebook kernel might crash at some points if running it all at once or if re-running specific cells multiple times. A simple restart of the runtime kernel (Runtime -> Restart runtime) will solve the issue without losing the locally saved files (datasets, network saves, framework, etc.). Then simply re-run from the group of cells that crashed.

Each **independent** part of the notebook has been verified to run on the free version of Colab.

2) The Introduction part, which includes dataset download/formatting and the CIANNA framework installation, must be run every time the runtime is fully shut down and disconnected, as it is used in all parts A, B, and C.


---


**Link to the slides accompanying the notebook**  
https://github.com/Deyht/IRMIA_2022/blob/main/DL_obj_detetion_with_YOLO_slides_full_v2.pdf


<a name="repo_cloning"></a>
### **1\. Clone the associated Git repository**

In [None]:
%%shell

git clone https://github.com/Deyht/IRMIA_2022

cd /content/IRMIA_2022/pre_trained_nets/
tar -xvzf pre_trained_nets_v2.tar.gz


<a name="data_download"></a>
### **2\. PASCAL VOC 2012 and 2007**



####  Dataset download


In [None]:
%%shell

cd IRMIA_2022/

mkdir datasets
cd datasets

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar

tar -xf VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar


<a name="data_format"></a>
#### Format dataset

In [None]:
%cd /content/IRMIA_2022/datasets/

import numpy as np
from tqdm import tqdm
from PIL import Image

def make_square(im, min_size, fill_color=(0, 0, 0, 0)):
    x, y = im.size
    size = max(min_size, x, y)
    new_im = Image.new('RGB', (size, size), fill_color)
    new_im.paste(im, (int((size - x) / 2), int((size - y) / 2)))
    return new_im

train_list_2012 = np.loadtxt("VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007  = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
nb_keep_val = 4952
image_size = 288
nb_class = 20

all_im = np.zeros((orig_nb_images, image_size, image_size, 3), dtype="uint8")
all_im_prop = np.zeros((orig_nb_images, 4), dtype="float32")

for i in tqdm(range(0, orig_nb_images)):

	if(i < nb_train_2012):
		im = Image.open("VOCdevkit/VOC2012/JPEGImages/"+train_list_2012[i]+".jpg")
	elif(i < nb_train_2012+nb_train_2007):
		im = Image.open("VOCdevkit/VOC2007/JPEGImages/"+train_list_2007[i - nb_train_2012]+".jpg")
	else:
		im = Image.open("VOCdevkit/VOC2007/JPEGImages/"+test_list_2007[i - nb_train_2012 - nb_train_2007]+".jpg")
	
	width, height = im.size

	im = make_square(im, image_size)
	width2, height2 = im.size

	x_offset = int((width2 - width)*0.5)
	y_offset = int((height2 - height)*0.5)

	all_im_prop[i] = [x_offset, y_offset, width2, height2]

	im = im.resize((image_size,image_size))
	im_array = np.asarray(im)
	for depth in range(0,3):
		all_im[i,:,:,depth] = im_array[:,:,depth]

all_im.tofile("all_im.dat")
all_im_prop.tofile("all_im_prop.dat")


#### Dataset summary statistics

In [None]:
%cd /content/IRMIA_2022/datasets/

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

train_list_2012 = np.loadtxt("VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007  = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
nb_keep_val = 4952
image_size = 288
nb_class = 20

object_list = np.zeros((orig_nb_images,1+nb_class))

for i in tqdm(range(0, orig_nb_images)):
	
  if(i < nb_train_2012):
    tree = ET.parse("VOCdevkit/VOC2012/Annotations/"+train_list_2012[i]+".xml")
  elif(i < nb_train_2012+nb_train_2007):
    tree = ET.parse("VOCdevkit/VOC2007/Annotations/"+train_list_2007[i - nb_train_2012]+".xml")
  else:
    tree = ET.parse("VOCdevkit/VOC2007/Annotations/"+test_list_2007[i - nb_train_2012 - nb_train_2007]+".xml")
  root = tree.getroot()

  root = tree.getroot()

  k = 0
  im_obj_list = root.findall("object", namespaces=None)
  object_list[i,0] = len(im_obj_list)
  for obj in im_obj_list:
    diff = obj.find("difficult", namespaces=None)
    if(diff.text == "1"):
      object_list[i,0] -= 1
      continue
    oclass = obj.find("name", namespaces=None)
    int_class = np.where(class_list[:] == oclass.text)[0] + 1
    object_list[i,int_class] += 1

plt.rcParams.update({'font.size': 6})

all_dat = np.sum(object_list[:,1:],axis=0)
train_dat = np.sum(object_list[:orig_nb_images-nb_keep_val:,1:],axis=0)
val_dat = np.sum(object_list[orig_nb_images-nb_keep_val:,1:],axis=0)

print("%8s"%("Total"),end="")
for k in range(0,nb_class):
  print("%8s"%class_list_short[k],end="")
print("")
print("%8d"%np.sum(all_dat),end="")
for k in range(0,nb_class):
  print("%8d"%all_dat[k], end="")
print("")
print("%8d"%np.sum(train_dat),end="")
for k in range(0,nb_class):
  print("%8d"%train_dat[k], end="")
print("")
print("%8d"%np.sum(val_dat),end="")
for k in range(0,nb_class):
  print("%8d"%val_dat[k], end="")
print("")
print("")

plt.subplots(figsize=(6,2),dpi=190, constrained_layout=True)
plt.bar(np.arange(0,nb_class)-0.2, all_dat, width=-0.2, align="center", label="All")
plt.bar(np.arange(0,nb_class), train_dat, width=0.2, align="center", label="Train")
plt.bar(np.arange(0,nb_class)+0.2, val_dat, width=0.2, align="center", label="Val")
plt.xticks(np.arange(0,nb_class), class_list, fontsize=6, rotation = 45)
plt.legend()
#plt.yscale('log')
plt.show()

all_dat = all_dat / np.max(all_dat)
train_dat = train_dat / np.max(train_dat)
val_dat = val_dat / np.max(val_dat)

plt.subplots(figsize=(6,2),dpi=190, constrained_layout=True)
plt.bar(np.arange(0,nb_class)-0.2, all_dat, width=0.2, align="center", label="All")
plt.bar(np.arange(0,nb_class), train_dat, width=0.2, align="center", label="Train")
plt.bar(np.arange(0,nb_class)+0.2, val_dat, width=0.2, align="center", label="Val")
plt.xticks(range(0,nb_class), class_list, fontsize=6, rotation = 45)
plt.legend()
#plt.yscale('log')
plt.show()


In [None]:

all_im = np.fromfile("all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size, image_size, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))


In [None]:
id_start = 0 #define the beginning of the serie, then display nb_w * nb_h examples

nb_w = 4
nb_h = 8

fig, ax = plt.subplots(figsize=(5.0,0.4), dpi=180, constrained_layout=True)
ax.axis('off')
fig.patch.set_facecolor('black')

for k in range(0, nb_class):
	ax.text(k%10*0.12, k//10*0.5, class_list_short[k], color=plt.cm.tab20(k), fontsize=8)

plt.show()
print("")

fig, ax = plt.subplots(nb_h, nb_w, figsize=(1.5*nb_w,1.5*nb_h), dpi=210, constrained_layout=True)

for i in range(0, nb_h):
  for j in range(0, nb_w):
    i_d = j + i*nb_w + id_start

    x_offset, y_offset, width2, height2 = all_im_prop[orig_nb_images - nb_keep_val + i_d]

    c_data = all_im[orig_nb_images - nb_keep_val + i_d]/255.0
    ax[i,j].imshow(c_data)
    ax[i,j].axis('off')

    tree = ET.parse("VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007 - nb_keep_val + i_d]+".xml")
    root = tree.getroot()
    
    obj_list = root.findall("object", namespaces=None)
    for obj in obj_list:
      diff = obj.find("difficult", namespaces=None)
      if(diff.text == "1"):
        continue
      oclass = obj.find("name", namespaces=None)
      bndbox = obj.find("bndbox", namespaces=None)

      int_class = np.where(class_list[:] == oclass.text)[0][0]
      xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
      ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
      xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
      ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2

      el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.8, ls="--", fill=False, color=plt.cm.tab20(int_class), zorder=3)
      c_patch = ax[i,j].add_patch(el)
      c_text = ax[i,j].text(xmin+4, ymin+15, "%s"%(class_list_short[int_class]), c=plt.cm.tab20(int_class), fontsize=6, clip_on=True)
      c_patch.set_path_effects([path_effects.Stroke(linewidth=2.0, foreground='black'),
                       path_effects.Normal()])
      c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                       path_effects.Normal()])

#plt.savefig("target_moisaic.png", dpi=250)
plt.show()

In [None]:
#Free the RAM before going further in the notebook
#A RUNTIME RESTART IS ADVISED

del (all_im, all_im_prop)

<a name="cianna_install"></a>

### **3\. DL Framework (CIANNA) installation**

#### Query GPU allocation and properties


In [None]:
%%shell

nvidia-smi

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

make 

./deviceQuery | grep Capability | cut -c50- > ~/cuda_infos.txt
./deviceQuery | grep "CUDA Driver Version / Runtime Version" | cut -c57- >> ~/cuda_infos.txt

cd ~/

#### Clone CIANNA git repository

Choice of a specific commit to preserve the notebook from incompatibilty in futur CIANNA updates.

In [None]:
%%shell

cd /content/IRMIA_2022/

git clone https://github.com/Deyht/CIANNA

cd CIANNA
git checkout 93058ec

#### Compiling CIANNA for the allocated GPU generation

There is no guaranteed forward or backward compatibility between Nvidia GPU generation, and some capabilities are generation specific. For these reasons, CIANNA must be provided the platform GPU generation at compile time.
The following cell will automatically update all the necessary files based on the detected GPU, and compile CIANNA.

In [None]:
%%shell

cd /content/IRMIA_2022/CIANNA

mult="10"
cat ~/cuda_infos.txt
comp_cap="$(sed '1!d' ~/cuda_infos.txt)"
cuda_vers="$(sed '2!d' ~/cuda_infos.txt)"

lim="11.1"
old_arg=$(awk '{if ($1 < $2) print "-D CUDA_OLD";}' <<<"${cuda_vers} ${lim}")

sm_val=$(awk '{print $1*$2}' <<<"${mult} ${comp_cap}")

gen_val=$(awk '{if ($1 >= 80) print "-D GEN_AMPERE"; else if($1 >= 70) print "-D GEN_VOLTA";}' <<<"${sm_val}")

sed -i "s/.*arch=sm.*/\\t\tcuda_arg=\"\$cuda_arg -D CUDA -D comp_CUDA -lcublas -lcudart -arch=sm_$sm_val $old_arg $gen_val\"/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" src/python_module_setup.py

pyth_ver=$(python3 -c 'import sys; print("%d.%d"%(sys.version_info[:][0], sys.version_info[:][1]))')

sed -i "s/\/lib.linux-x86_64-[0-9].[0-9]/\/lib.linux-x86_64-$pyth_ver/g" ex_script.py

./compile.cp CUDA PY_INTERF

mv src/build/lib.linux-x86_64-* src/build/lib.linux-x86_64

#### Testing CIANNA installation

**IMPORTANT NOTE**   
CIANNA is mainly used in a script fashion and was not designed to run in notebooks. Every cell code that directly invokes CIANNA functions must be run as a script to avoid possible errors.  
To do so, the cell must have the following structure.

```
%%shell

cd /content/CIANNA

python3 - <<EOF

[... your python code ...]

EOF
```

This syntax allows one to easily edit python code in the notebook while running the cell as a script. Note that all the notebook variables can not be accessed by the cell in this context.


In [None]:
%%shell

cd /content/IRMIA_2022/CIANNA

tar -xvzf mnist.tar.gz

In [None]:
%%shell


#Strictly equivalent to ex_script.py in the CIANNA repo 

cd /content/IRMIA_2022/CIANNA

python3 - <<EOF


import numpy as np
import matplotlib.pyplot as plt
#Uncomment to access a locally compiled version

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn

############################################################################
##              Data reading (your mileage may vary)
############################################################################

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

print ("Reading inputs ... ", end = "", flush=True)

#Loading binary files
data = np.fromfile("mnist_dat/mnist_input.dat", dtype="float32")
data = np.reshape(data, (80000,28*28))
target = np.fromfile("mnist_dat/mnist_target.dat", dtype="float32")
target = np.reshape(target, (80000,10))


data_train = data[:60000,:]
data_valid = data[60000:70000,:]
data_test  = data[70000:80000,:]

target_train = target[:60000,:]
target_valid = target[60000:70000,:]
target_test  = target[70000:80000,:]

print ("Done !", flush=True)

############################################################################
##               CIANNA network construction and use
############################################################################

#Details about the functions and parameters are given in the GitHub Wiki

cnn.init(in_dim=i_ar([28,28]), in_nb_ch=1, out_dim=10, \
		bias=0.1, b_size=24, comp_meth="C_CUDA", dynamic_load=1, mixed_precision="FP32C_FP32A") #Change to C_BLAS or C_NAIV


cnn.create_dataset("TRAIN", size=60000, input=data_train, target=target_train)
cnn.create_dataset("VALID", size=10000, input=data_valid, target=target_valid)
cnn.create_dataset("TEST", size=10000, input=data_test, target=target_test)

#Used to load a saved network at a given epoch
#With load_step = 0, the network is trained from scratch
load_step = 0
if(load_step > 0):
	cnn.load("net_save/net0_s%04d.dat"%(load_step), load_step)
else:
  cnn.conv(f_size=i_ar([5,5]), nb_filters=32, padding=i_ar([2,2]), activation="RELU")
  cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
  cnn.conv(f_size=i_ar([5,5]), nb_filters=64, padding=i_ar([2,2]), activation="RELU")
  cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
  cnn.dense(nb_neurons=256, activation="RELU", drop_rate=0.5)
  cnn.dense(nb_neurons=128, activation="RELU", drop_rate=0.2)
  cnn.dense(nb_neurons=10, activation="SMAX")

cnn.train(nb_epoch=10, learning_rate=0.0004, momentum=0.9, confmat=1, save_every=0)
#Change save_every in previous function to save network weights
cnn.perf_eval()


#Uncomment to save network prediction
cnn.forward(repeat=1, drop_mode="AVG_MODEL")

del (data_train, target_train, data_valid, target_valid, data_test, target_test)


EOF



---



## **A - Simple classifier on PASCAL VOC**

### **1\. Train and valid data generation**

In [None]:
%%shell

cd /content/IRMIA_2022/
mkdir classifier
cd classifier

#### Dynamic data generator

In [None]:
%%writefile /content/IRMIA_2022/classifier/data_gen.py

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image
import os

import imgaug as ia
import imgaug.augmenters as iaa

class_list = np.array(["aeroplane", "bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse", "motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor"])
class_list_short = np.array(["plane", "bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv"])

train_list_2012 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007	= np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

def roll_zeropad(a, shift):
	a = np.roll(a, shift[0], axis = 1)
	if(shift[0] >= 0):
		a[:,0:shift[0]] = 0
	else:
		a[:,image_size_orig+shift[0]:] = 0
	a = np.roll(a, shift[1], axis = 0)
	if(shift[1] >= 0):
		a[0:shift[1],:] = 0
	else:
		a[image_size_orig+shift[1]:,:] = 0
	return a


def init_data_gen():
	global nb_train_2012, nb_train_2007, nb_test_2007, orig_nb_images, nb_class
	global nb_images_per_batch, nb_keep_val, nb_obj_val, image_size, image_size_orig, seq_iaa
	global input_data, targets, input_val, targets_val, all_im, all_im_prop

	nb_train_2012 = 11540
	nb_train_2007 = 5011
	nb_test_2007 = 4952
	orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
	nb_keep_val = 4952 #keep in 2007 test
	nb_images_per_batch = 4000
	nb_obj_val = 11831

	nb_class = 20
	image_size_orig = 288
	image_size = 96


	seq_iaa = iaa.Sequential([
			iaa.Fliplr(0.5),
			iaa.Flipud(0.1),
			iaa.Sometimes(0.1, iaa.GaussianBlur(sigma=(0, 0.5))),
			iaa.LinearContrast((0.75, 1.5)),
			iaa.Sometimes(0.1, iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255), per_channel=0.5)),
			iaa.Multiply((0.8,1.2), per_channel=0.2),
			iaa.Affine(translate_percent={"x": (-0.1,0.1), "y": (-0.1,0.1)}),
			iaa.Sometimes(0.5,iaa.Affine(scale={"x": (0.9,1.1), "y": (0.9,1.1)})),
			iaa.Sometimes(0.2,iaa.Affine(rotate=(-10,10),shear=(-6,6))),
			iaa.Sometimes(0.02,iaa.Grayscale(alpha=(0.0, 1.0)))
		])

	all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
	all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
	all_im = np.reshape(all_im, ((orig_nb_images, image_size_orig, image_size_orig, 3)))
	all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

	input_data = np.zeros((nb_images_per_batch,image_size*image_size*3), dtype="float32")
	targets = np.zeros((nb_images_per_batch,nb_class), dtype="float32")

	input_val = np.zeros((nb_obj_val,image_size*image_size*3), dtype="float32")
	targets_val = np.zeros((nb_obj_val,nb_class), dtype="float32")


def create_train_batch(visual_w=0,visual_h=0):
	visual_iter = 0
	for i in range(0, nb_images_per_batch):
		
		i_d = np.random.randint(0,orig_nb_images - nb_keep_val)
		if(i_d < nb_train_2012):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/Annotations/"+train_list_2012[i_d]+".xml")
		elif(i_d < nb_train_2012+nb_train_2007):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+train_list_2007[i_d - nb_train_2012]+".xml")
		else:
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[i_d - nb_train_2012 - nb_train_2007]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[i_d])
		x_offset, y_offset, width2, height2 = all_im_prop[i_d]

		im_obj_list = root.findall("object", namespaces=None)
		k = 0
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			else:
				bndbox = obj.find("bndbox", namespaces=None)
				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				
				width = (xmax-xmin); height = (ymax-ymin)
				if(width*height < 196):
					continue
				
				k += 1
				
		nb_obj = k
		if(nb_obj == 0):
			i -= 1
			continue
		
		
		obj_id = np.random.randint(0,nb_obj)
		k = 0
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			else:
				bndbox = obj.find("bndbox", namespaces=None)
				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				
				width = (xmax-xmin); height = (ymax-ymin)
				if(width*height < 196):
					continue
			if(obj_id == k):
				break
			else:
				k += 1
		
		oclass = obj.find("name", namespaces=None)
		int_class = int(np.where(class_list[:] == oclass.text)[0])
		l_targ = np.zeros(nb_class)
		l_targ[int_class] = 1
		targets[i,:] = np.copy(l_targ)
		
		im = Image.fromarray(patch)
		max_size = max((xmax-xmin),(ymax-ymin))
		c_x = (xmin+xmax)/2.0; c_y = (ymin+ymax)/2.0
		xmin = max(0,int(c_x - 0.5*max_size)); xmax = min(image_size_orig,int(c_x + 0.5*max_size))
		ymin = max(0,int(c_y - 0.5*max_size)); ymax = min(image_size_orig,int(c_y + 0.5*max_size))
		
		im_loc = im.crop((xmin,ymin,xmax,ymax))	
		im_loc = im_loc.resize((image_size,image_size), Image.NEAREST)
		im_array = np.asarray(im_loc)
		
		patch_aug = seq_iaa(image=im_array)
		
		if(visual_w*visual_h > 0):
			if(visual_iter == 0):
				fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
			
			c_x = visual_iter // visual_w
			c_y = visual_iter % visual_w
			
			ax[c_x,c_y].imshow(patch_aug)
			ax[c_x,c_y].axis('off')
			c_text = ax[c_x,c_y].text(image_size/2, 8, "%s"%(class_list_short[int_class]),
				ha="center", fontsize=10, clip_on=True, color="white")
			c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                       path_effects.Normal()])
			
			visual_iter += 1
			if(visual_iter >= visual_w*visual_h):
				plt.show()
				return
		
		for depth in range(0,3):
			input_data[i,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
		
	return input_data, targets


def create_val_batch(visual_w=0, visual_h=0):
	visual_iter = 0

	k = 0
	for i in range(0, nb_keep_val):
				
		tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007 - nb_keep_val + i]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i])
		x_offset, y_offset, width2, height2 = all_im_prop[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i]

		im = Image.fromarray(patch)

		im_obj_list = root.findall("object", namespaces=None)
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			else:
				bndbox = obj.find("bndbox", namespaces=None)
				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				
				width = (xmax-xmin); height = (ymax-ymin)
				if(width*height < 196):
					continue
			
			oclass = obj.find("name", namespaces=None)
			int_class = int(np.where(class_list[:] == oclass.text)[0])
			l_targ = np.zeros(nb_class)
			l_targ[int_class] = 1
			targets_val[k,:] = np.copy(l_targ)

			max_size = max((xmax-xmin),(ymax-ymin))
			c_x = (xmin+xmax)/2.0; c_y = (ymin+ymax)/2.0
			xmin = max(0,int(c_x - 0.5*max_size)); xmax = min(image_size_orig,int(c_x + 0.5*max_size))
			ymin = max(0,int(c_y - 0.5*max_size)); ymax = min(image_size_orig,int(c_y + 0.5*max_size))
		
			im_loc = im.crop((xmin,ymin,xmax,ymax))	
			im_loc = im_loc.resize((image_size,image_size), Image.NEAREST)
		
			im_array = np.asarray(im_loc)
		
			if(visual_w*visual_h > 0):
				if(visual_iter == 0):
					fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
				
				c_x = visual_iter // visual_w
				c_y = visual_iter % visual_w
				
				ax[c_x,c_y].imshow(im_array)
				ax[c_x,c_y].axis('off')
				c_text = ax[c_x,c_y].text(image_size/2, 8, "%s"%(class_list_short[int_class]),
					ha="center", fontsize=10, clip_on=True, color="white")
				c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                       path_effects.Normal()])
				
				visual_iter += 1
				if(visual_iter >= visual_w*visual_h):
					plt.show()
					return
		
			for depth in range(0,3):
				input_val[k,depth*image_size*image_size:(depth+1)*image_size*image_size] = im_array[:,:,depth].flatten("C")/255.0
			k+=1
			#print (k)
		
	return input_val, targets_val

def free_data_gen():
  global all_im, all_im_prop, input_data, targets, input_val, targets_val
  del (all_im, all_im_prop, input_data, targets, input_val, targets_val)
  return


#### Training image examples

In [None]:
%%writefile /content/IRMIA_2022/classifier/test_gen.py

import data_gen as gn1

gn1.init_data_gen()

print("Random augmented training examples")
gn1.create_train_batch(4,3)

print("\nOrdered validation examples")
gn1.create_val_batch(4,3)

gn1.free_data_gen()


In [None]:
# Might need to reload the notebook execution environment to unload previous data_gen afters changes
%cd /content/IRMIA_2022/classifier/

%run test_gen.py


### **2\.Training the classifier**


In [None]:
%%shell

cd /content/IRMIA_2022/classifier/

python3 - <<EOF

import numpy as np
from threading import Thread
import data_gen as gn1

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn


def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

def data_augm():
	input_data, targets = gn1.create_train_batch()
	cnn.delete_dataset("TRAIN_buf", silent=1)
	cnn.create_dataset("TRAIN_buf", nb_images_per_batch, input_data[:,:], targets[:,:], silent=1)
	return

nb_images_per_batch = 4000
nb_obj_val = 11831
nb_class = 20
image_size = 96

nb_augm = 1000
epoch_per_augm = 5

# -1 will load the provided pre trained network.
# Switch to 0 for training from scratch, 
# or to the value corresponding to an existing network save.
load_epoch = -1
# Increase the number of augmentation for training t
# to continue training of the pre trained network
if(load_epoch == -1):
	nb_augm = 2

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=nb_class,
	 b_size=16, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP32C_FP32A")

print ("Loading the dataset ...")

gn1.init_data_gen()

input_data, targets = gn1.create_train_batch()
input_val, targets_val = gn1.create_val_batch()

cnn.create_dataset("TRAIN", nb_images_per_batch, input_data[:,:], targets[:,:])
cnn.create_dataset("VALID", nb_obj_val, input_val[:,:], targets_val[:,:])

if(load_epoch == -1):
	cnn.load("/content/IRMIA_2022/pre_trained_nets/classifier_net0_s8000.dat",8000, bin=1)
elif(load_epoch > 0):
	cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:
	cnn.conv(f_size=i_ar([3,3]), nb_filters=16, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=32, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=64, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=128, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=128, padding=i_ar([1,1]), activation="RELU")
	cnn.dense(nb_neurons=256, activation="RELU", drop_rate=0.2)
	cnn.dense(nb_neurons=nb_class, activation="SMAX")


for batch_augm in range(0,nb_augm): #will run from 1000 x 5 epochs
		
	t = Thread(target=data_augm)
	t.start()
	
	cnn.train(nb_epoch=epoch_per_augm, learning_rate=0.003, end_learning_rate=0.00005, 
				decay=0.001, momentum=0.5, shuffle_every=1, confmat=1, 
				control_interv=5, save_every=100, silent=1, TC_scale_factor=16.0, save_bin=1)
	if(batch_augm == 0):
		cnn.perf_eval()

	t.join()
	
	cnn.swap_data_buffers("TRAIN")


gn1.free_data_gen()
del (input_data, targets, input_val, targets_val)

EOF



---



## **B - Sliding window detector**

### **1\. Train and valid data generation**


In [None]:
%%shell

cd /content/IRMIA_2022/
mkdir sliding_window
cd sliding_window

#### Adding a "background class" to the data generator

In [None]:
%%writefile /content/IRMIA_2022/sliding_window/data_gen.py

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image
import os

import imgaug as ia
import imgaug.augmenters as iaa

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

train_list_2012 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007	= np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")


def fct_inter(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h

	return float(inter_2d)


def init_data_gen():
	global nb_train_2012, nb_train_2007, nb_test_2007, orig_nb_images, nb_class
	global nb_images_per_batch, nb_keep_val, nb_empty_val, nb_obj_val, image_size, image_size_orig, seq_iaa
	global input_data, targets, input_val, targets_val, all_im, all_im_prop

	nb_train_2012 = 11540
	nb_train_2007 = 5011
	nb_test_2007 = 4952
	orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
	nb_keep_val = 4952 #keep in 2007 test
	nb_images_per_batch = 4000
	nb_obj_val = 11831
	nb_empty_val = 1000

	nb_class = 20
	image_size_orig = 288
	image_size = 96


	seq_iaa = iaa.Sequential([
			iaa.Fliplr(0.5),
			iaa.Flipud(0.1),
			iaa.Sometimes(0.1, iaa.GaussianBlur(sigma=(0, 0.5))),
			iaa.LinearContrast((0.75, 1.5)),
			iaa.Sometimes(0.1, iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255), per_channel=0.5)),
			iaa.Multiply((0.8,1.2), per_channel=0.2),
			iaa.Affine(translate_percent={"x": (-0.1,0.1), "y": (-0.1,0.1)}),
			iaa.Sometimes(0.5,iaa.Affine(scale={"x": (0.9,1.1), "y": (0.9,1.1)})),
			iaa.Sometimes(0.2,iaa.Affine(rotate=(-10,10),shear=(-6,6))),
			iaa.Sometimes(0.02,iaa.Grayscale(alpha=(0.0, 1.0)))
		])

	all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
	all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
	all_im = np.reshape(all_im, ((orig_nb_images, image_size_orig, image_size_orig, 3)))
	all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

	input_data = np.zeros((nb_images_per_batch,image_size*image_size*3), dtype="float32")
	targets = np.zeros((nb_images_per_batch,nb_class+1), dtype="float32")

	input_val = np.zeros((nb_obj_val+nb_empty_val,image_size*image_size*3), dtype="float32")
	targets_val = np.zeros((nb_obj_val+nb_empty_val,nb_class+1), dtype="float32")


def create_train_batch(visual_w=0,visual_h=0):
	visual_iter = 0
	for i in range(0, nb_images_per_batch):
		
		i_d = np.random.randint(0,orig_nb_images - nb_keep_val)
		if(i_d < nb_train_2012):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/Annotations/"+train_list_2012[i_d]+".xml")
		elif(i_d < nb_train_2012+nb_train_2007):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+train_list_2007[i_d - nb_train_2012]+".xml")
		else:
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[i_d - nb_train_2012 - nb_train_2007]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[i_d])
		x_offset, y_offset, width2, height2 = all_im_prop[i_d]

		# classical object cutout
		if(np.random.random() > 0.2):
			im_obj_list = root.findall("object", namespaces=None)
			k = 0
			for obj in im_obj_list:
				diff = obj.find("difficult", namespaces=None)
				if(diff.text == "1"):
					continue
				else:
					bndbox = obj.find("bndbox", namespaces=None)
					xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
					ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
					xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
					ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
					
					width = (xmax-xmin); height = (ymax-ymin)
					if(width*height < 196):
						continue
					
					k += 1
					
			nb_obj = k
			if(nb_obj == 0):
				i -= 1
				continue
			
			obj_id = np.random.randint(0,nb_obj)
			k = 0
			for obj in im_obj_list:
				diff = obj.find("difficult", namespaces=None)
				if(diff.text == "1"):
					continue
				else:
					bndbox = obj.find("bndbox", namespaces=None)
					xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
					ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
					xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
					ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
					
					width = (xmax-xmin); height = (ymax-ymin)
					if(width*height < 196):
						continue
				if(obj_id == k):
					break
				else:
					k += 1
			
			oclass = obj.find("name", namespaces=None)
			int_class = int(np.where(class_list[:] == oclass.text)[0])
			l_targ = np.zeros(nb_class+1)
			l_targ[int_class] = 1
			targets[i,:] = np.copy(l_targ)
			
			im = Image.fromarray(patch)
			max_size = max((xmax-xmin),(ymax-ymin))
			c_x = (xmin+xmax)/2.0; c_y = (ymin+ymax)/2.0
			xmin = max(0,int(c_x - 0.5*max_size)); xmax = min(image_size_orig,int(c_x + 0.5*max_size))
			ymin = max(0,int(c_y - 0.5*max_size)); ymax = min(image_size_orig,int(c_y + 0.5*max_size))
			
			im_loc = im.crop((xmin,ymin,xmax,ymax))	
			im_loc = im_loc.resize((image_size,image_size), Image.NEAREST)
			im_array = np.asarray(im_loc)
		
		else:
			found = 0
			l_size = 160
			try_per_size = 10

			int_class = 20
			l_targ = np.zeros(nb_class+1)
			l_targ[nb_class] = 1
			targets[i,:] = np.copy(l_targ)

			im_obj_list = root.findall("object", namespaces=None)
			box_list = np.zeros((len(im_obj_list),4))
			k = 0
			for obj in im_obj_list:
				bndbox = obj.find("bndbox", namespaces=None)

				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				box_list[k,:] = np.array([xmin,ymin,xmax,ymax])
				k += 1

			count_per_size = 0
			while((not found) and (l_size >= 0)):
				size = l_size + 32

				c_x = np.random.random()*(image_size_orig - size) + size/2
				c_y = np.random.random()*(image_size_orig - size) + size/2

				xmin = int(c_x - 0.5*size); xmax = int(c_x + 0.5*size)
				ymin = int(c_y - 0.5*size); ymax = int(c_y + 0.5*size)

				c_box = np.array([xmin, ymin, xmax, ymax])

				im_obj_list = root.findall("object", namespaces=None)
				inter_count = 0
				for l in range(0,len(im_obj_list)):
					loc_inter = fct_inter(c_box, box_list[l,:])
					if(loc_inter > 0.0):
						inter_count += 1

				if(inter_count == 0):
					found = 1

				count_per_size += 1
				if(count_per_size >= try_per_size):
					count_per_size = 0
					l_size -= 32

			if(not found):
				im_array = np.zeros((image_size,image_size,3),dtype="uint8")

			else:

				im = Image.fromarray(patch)

				im_loc = im.crop((xmin,ymin,xmax,ymax))
				im_loc = im_loc.resize((image_size,image_size), Image.NEAREST)

				im_array = np.asarray(im_loc)
  
		patch_aug = seq_iaa(image=im_array)
		
		if(visual_w*visual_h > 0):
			if(visual_iter == 0):
				fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
			
			c_x = visual_iter // visual_w
			c_y = visual_iter % visual_w
			
			ax[c_x,c_y].imshow(patch_aug)
			ax[c_x,c_y].axis('off')
			c_text = ax[c_x,c_y].text(image_size/2, 8, class_list_short[int_class],
				ha="center", fontsize=10, clip_on=True, color="white")
			c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
											 path_effects.Normal()])
			
			visual_iter += 1
			if(visual_iter >= visual_w*visual_h):
				plt.show()
				return
		
		for depth in range(0,3):
			input_data[i,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
		
	return input_data, targets


def create_val_batch(visual_w=0, visual_h=0):
	visual_iter = 0

	loc = 0
	for i in range(0, nb_keep_val):
				
		tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007 - nb_keep_val + i]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i])
		x_offset, y_offset, width2, height2 = all_im_prop[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i]

		im = Image.fromarray(patch)

		im_obj_list = root.findall("object", namespaces=None)
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			else:
				bndbox = obj.find("bndbox", namespaces=None)
				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				
				width = (xmax-xmin); height = (ymax-ymin)
				if(width*height < 196):
					continue
			
			oclass = obj.find("name", namespaces=None)
			int_class = int(np.where(class_list[:] == oclass.text)[0])
			l_targ = np.zeros(nb_class+1)
			l_targ[int_class] = 1
			targets_val[loc,:] = np.copy(l_targ)

			max_size = max((xmax-xmin),(ymax-ymin))
			c_x = (xmin+xmax)/2.0; c_y = (ymin+ymax)/2.0
			xmin = max(0,int(c_x - 0.5*max_size)); xmax = min(image_size_orig,int(c_x + 0.5*max_size))
			ymin = max(0,int(c_y - 0.5*max_size)); ymax = min(image_size_orig,int(c_y + 0.5*max_size))
		
			im_loc = im.crop((xmin,ymin,xmax,ymax))	
			im_loc = im_loc.resize((image_size,image_size), Image.NEAREST)
		
			im_array = np.asarray(im_loc)
		
			if(visual_w*visual_h > 0):
				if(visual_iter == 0):
					fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
				
				c_x = visual_iter // visual_w
				c_y = visual_iter % visual_w
				
				ax[c_x,c_y].imshow(im_array)
				ax[c_x,c_y].axis('off')
				c_text = ax[c_x,c_y].text(image_size/2, 8, class_list_short[int_class],
					ha="center", fontsize=10, clip_on=True, color="white")
				c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
											 path_effects.Normal()])
				
				visual_iter += 1
				if(visual_iter >= visual_w*visual_h):
					plt.show()
					return
		
			for depth in range(0,3):
				input_val[loc,depth*image_size*image_size:(depth+1)*image_size*image_size] = im_array[:,:,depth].flatten("C")/255.0
			loc+=1
	print (loc)
	
	for i in range(0, nb_empty_val):

		i_d = np.random.randint(0,nb_keep_val)

		patch = np.copy(all_im[orig_nb_images-nb_keep_val + i_d])

		x_offset, y_offset, width2, height2 = all_im_prop[orig_nb_images - nb_keep_val + i_d]

		tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007 - nb_keep_val + i_d]+".xml")
		root = tree.getroot()

		im = Image.fromarray(patch)

		found = 0
		l_size = 160
		try_per_size = 10

		int_class = 20
		l_targ = np.zeros(nb_class+1)
		l_targ[nb_class] = 1
		targets_val[loc+i,:] = np.copy(l_targ)

		im_obj_list = root.findall("object", namespaces=None)
		box_list = np.zeros((len(im_obj_list),4))
		k = 0
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			
			bndbox = obj.find("bndbox", namespaces=None)
			
			xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
			ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
			xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
			ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
			box_list[k,:] = np.array([xmin,ymin,xmax,ymax])
			k += 1

		count_per_size = 0
		while((not found) and (l_size >= 0)):
			size = l_size + 32
			
			c_x = np.random.random()*(image_size_orig - size) + size/2
			c_y = np.random.random()*(image_size_orig - size) + size/2
			
			xmin = int(c_x - 0.5*size); xmax = int(c_x + 0.5*size)
			ymin = int(c_y - 0.5*size); ymax = int(c_y + 0.5*size)
			
			c_box = np.array([xmin, ymin, xmax, ymax])
			
			im_obj_list = root.findall("object", namespaces=None)
			inter_count = 0
			for l in range(0,len(im_obj_list)):
				loc_inter = fct_inter(c_box, box_list[l,:])
				if(loc_inter > 0.0):
					inter_count += 1
			
			if(inter_count == 0):
				found = 1
			
			count_per_size += 1
			if(count_per_size >= try_per_size):
				count_per_size = 0
				l_size -= 32

		if(not found):
			im_array = np.zeros((image_size,image_size,3), dtype="float32")
			
		else:
			
			im = Image.fromarray(patch)

			im_loc = im.crop((xmin,ymin,xmax,ymax))
			im_loc = im_loc.resize((image_size,image_size), Image.NEAREST)
			
			im_array = np.asarray(im_loc)
			
		
		for depth in range(0,3):
			input_val[loc+i,depth*image_size*image_size:(depth+1)*image_size*image_size] = im_array[:,:,depth].flatten("C")/255.0
		
	return input_val, targets_val


def free_data_gen():
  global all_im, all_im_prop, input_data, targets, input_val, targets_val
  del (all_im, all_im_prop, input_data, targets, input_val, targets_val)
  return


#### Training image examples

In [None]:
%%writefile /content/IRMIA_2022/sliding_window/test_gen.py

import data_gen as gn2

gn2.init_data_gen()

print("Random augmented training examples")
gn2.create_train_batch(4,3)

print("\nOrdered validation examples")
gn2.create_val_batch(4,3)

gn2.free_data_gen()


In [None]:
# Might need to reload the notebook execution environment to unload previous data_gen afters changes
%cd /content/IRMIA_2022/sliding_window/

%run test_gen.py

### **2\. Training the detection classifier**

In [None]:
%%shell

cd /content/IRMIA_2022/sliding_window/

python3 - <<EOF

import numpy as np
from threading import Thread
import data_gen as gn2

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn


def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

def data_augm():
	input_data, targets = gn2.create_train_batch()
	cnn.delete_dataset("TRAIN_buf", silent=1)
	cnn.create_dataset("TRAIN_buf", nb_images_per_batch, input_data[:,:], targets[:,:], silent=1)
	return

nb_images_per_batch = 4000
nb_obj_val = 11831
nb_empty_val = 1000
nb_class = 20
image_size = 96

nb_augm = 1000
epoch_per_augm = 5

# -1 will load the provided pre trained network.
# Switch to 0 for training from scratch, 
# or to the value corresponding to an existing network save.
load_epoch = -1
# Increase the number of augmentation for training t
# to continue training of the pre trained network
if(load_epoch == -1):
	nb_augm = 2

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=nb_class+1,
	 b_size=32, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP32C_FP32A")

print ("Loading the dataset ...")

gn2.init_data_gen()

input_data, targets = gn2.create_train_batch()
input_val, targets_val = gn2.create_val_batch()

cnn.create_dataset("TRAIN", nb_images_per_batch, input_data[:,:], targets[:,:])
cnn.create_dataset("VALID", nb_obj_val + nb_empty_val, input_val[:,:], targets_val[:,:])


if(load_epoch == -1):
	cnn.load("/content/IRMIA_2022/pre_trained_nets/sliding_window_net0_s8000.dat",8000, bin=1)
elif(load_epoch > 0):
	cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:
	cnn.conv(f_size=i_ar([3,3]), nb_filters=16, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=32, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=64, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=128, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=128, padding=i_ar([1,1]), activation="RELU")
	cnn.dense(nb_neurons=256, activation="RELU", drop_rate=0.2)
	cnn.dense(nb_neurons=nb_class+1, activation="SMAX")


for batch_augm in range(0,nb_augm):
		
	t = Thread(target=data_augm)
	t.start()
	
	cnn.train(nb_epoch=epoch_per_augm, learning_rate=0.0015, end_learning_rate=0.00005, 
				decay=0.001, momentum=0.5, shuffle_every=1, confmat=1, 
				control_interv=5, save_every=100, silent=1, TC_scale_factor=16.0, save_bin=1)
	if(batch_augm == 0):
		cnn.perf_eval()

	t.join()
	
	cnn.swap_data_buffers("TRAIN")


gn2.free_data_gen()
del (input_data, targets, input_val, targets_val)

EOF

### **3\. Sliding window prediction**

#### Regions definition and network inference

In [None]:
%%shell

cd /content/IRMIA_2022/sliding_window/

python3 - <<EOF

import numpy as np
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image
import re
import os

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn

load_epoch = 0
if (len(sys.argv) > 1):
	load_epoch = int(sys.argv[1])

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
nb_keep_val = 400 # Lower than the actual number of example to keep RAM low enough

nb_class = 20
image_size_orig = 288
image_size = 96

frac_size = np.array([288,144,72])
frac_stride = np.array([0,72,36])

print ("Loading the dataset ...")

all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size_orig, image_size_orig, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

nb_regions_per_im = 1
for l in range(1,np.size(frac_size)):
	nb_regions_per_im += ((image_size_orig-frac_size[l])/frac_stride[l] + 1)**2

print (nb_regions_per_im)
all_nb_test_images = int(nb_regions_per_im*nb_keep_val)

print (all_nb_test_images)

input_test = np.zeros((all_nb_test_images,image_size*image_size*3), dtype="float32")
targets_test = np.zeros((all_nb_test_images,nb_class+1), dtype="float32")

k = 0
for i in tqdm(range(0, nb_keep_val)):
	
	i_d = orig_nb_images - nb_keep_val + i
	
	patch = np.copy(all_im[i_d])
	
	x_offset, y_offset, width2, height2 = all_im_prop[i_d]
	
	im = Image.fromarray(patch)
	
	for l in range(0, np.size(frac_size)):
		
		if(l == 0):
			nb_reg = 1
		else:
			nb_reg = int((image_size_orig-frac_size[l])/frac_stride[l] + 1)
		
		for l_x in range(0, nb_reg):
			for l_y in range(0, nb_reg):
				
				xmin = l_x * frac_stride[l]
				ymin = l_y * frac_stride[l]
				xmax = xmin + frac_size[l]
				ymax = ymin + frac_size[l]
				
				im_loc = im.crop((xmin,ymin,xmax,ymax))
				im_loc = im_loc.resize((image_size,image_size), Image.NEAREST)
				
				im_array = np.asarray(im_loc)
				
				for depth in range(0,3):
					input_test[k,depth*image_size*image_size:(depth+1)*image_size*image_size] = im_array[:,:,depth].flatten("C")/255.0
				k += 1

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=nb_class+1,
	 b_size=32, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP32C_FP32A")

cnn.create_dataset("TEST", all_nb_test_images, input_test[:,:], targets_test[:,:])

load_epoch = -1
if(load_epoch == -1):
	cnn.load("/content/IRMIA_2022/pre_trained_nets/sliding_window_net0_s8000.dat",8000, bin=1)
	load_epoch = 8000
elif(load_epoch > 0):
	cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:
	files = os.listdir("net_save/")
	paths = [os.path.join("net_save/", basename) for basename in files]
	path = max(paths, key=os.path.getctime)
	r_load_epoch = [int(s) for s in re.split('[s.]',path) if s.isdigit()]
	print (r_load_epoch)
	print("Epoch unspecified, loading most recent save : " + path)
	
	cnn.load(path, r_load_epoch[0], bin=1)
	
cnn.forward(no_error=1, saving=2)

del (all_im, all_im_prop, input_test, targets_test)

EOF

#### Prediction vizualisation

In [None]:
%cd /content/IRMIA_2022/sliding_window/

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image
import matplotlib.patheffects as path_effects

import re
import bisect
import os

import sys

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
nb_keep_val = 400 #keep in 2007 test

nb_class = 20
image_size_orig = 288
image_size = 96

frac_size = np.array([288,144,72])
frac_stride = np.array([0,72,36])
nb_reg_per_frac = np.array([1,0,0])
cumul_nb_per_frac = np.array([1,0,0])

all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size_orig, image_size_orig, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

nb_regions_per_im = 1
for l in range(1,np.size(frac_size)):
	nb_reg_per_frac[l] = ((image_size_orig-frac_size[l])/frac_stride[l] + 1)
	nb_regions_per_im += nb_reg_per_frac[l]**2
	cumul_nb_per_frac[l] = nb_regions_per_im

print (nb_reg_per_frac, cumul_nb_per_frac)

load_epoch = 0
if(load_epoch == 0):
	files = os.listdir("fwd_res/")
	paths = [os.path.join("fwd_res/", basename) for basename in files]
	path = max(paths, key=os.path.getctime)
	r_load_epoch = [int(s) for s in re.split('[_s.]',path) if s.isdigit()]
	print (r_load_epoch)
	print("Epoch unspecified, loading most recent prediction : " + path)
	
	load_epoch = r_load_epoch[0]

pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%load_epoch, dtype="float32")

pred_data = np.reshape(pred_raw,(nb_keep_val, int(nb_regions_per_im), 22))

width_list = np.array([2.0, 1.5, 1.0])


In [None]:

i_d = 0

nb_w = 4
nb_h = 8

fig, ax = plt.subplots(nb_h, nb_w, figsize=(2*nb_w,2*nb_h), dpi=210, constrained_layout=True)

for l_h in range(0, nb_h):
  for l_w in range(0, nb_w):
    loc = i_d + l_w + l_h*nb_w
    patch = np.copy(all_im[orig_nb_images - nb_keep_val + loc])
    
    ax[l_h,l_w].imshow(patch)
    ax[l_h,l_w].axis('off')

    for l in range(0,int(nb_regions_per_im)):
      max_loc = np.argmax(pred_data[loc,l,:])
      max_val = np.max(pred_data[loc,l,:])
      if(l == 0 or (max_val > 0.9 and max_loc < nb_class)):
        
        index = bisect.bisect(cumul_nb_per_frac, l)
        
        if(l > 0):
          i_l = l - cumul_nb_per_frac[index-1]
        else:
          i_l = 0
        i_x = i_l // nb_reg_per_frac[index]
        i_y = i_l % nb_reg_per_frac[index]
        
        xmin = i_x * frac_stride[index] - 0.5 + 2*index; ymin = i_y * frac_stride[index] - 0.5 + 2*index
        xmax = xmin + frac_size[index] - 4*index; ymax = ymin + frac_size[index] - 4*index
        el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth= width_list[index], fill=False, color=plt.cm.tab20(max_loc), zorder=3)
        c_patch = ax[l_h,l_w].add_patch(el)
        c_text = ax[l_h,l_w].text(xmin+4, ymin+15, "%s-%0.2f"%(class_list_short[max_loc], max_val), c=plt.cm.tab20(max_loc), fontsize=6, clip_on=True)
        c_patch.set_path_effects([path_effects.Stroke(linewidth=width_list[index]+1.5, foreground='black'),
                       path_effects.Normal()])
        c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                       path_effects.Normal()])

plt.show()

In [None]:
#Free the RAM before going further in the notebook
#A RUNTIME RESTART IS ADVISED

del (all_im, all_im_prop)




---



## **C - The YOLO object detector**
(YOLO - You Only Look Once)

### **1\. Train and valid data generation**

In [None]:
%%shell

cd /content/IRMIA_2022/
mkdir yolo_detector
cd yolo_detector

#### Dynamic Image augmentation and bounding box targets

In [None]:
%%writefile /content/IRMIA_2022/yolo_detector/data_gen.py

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import patches
import matplotlib.patheffects as path_effects
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image, ImageEnhance, ImageOps
import os

import imgaug as ia
import imgaug.augmenters as iaa
from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
from imgaug.augmentables.batches import UnnormalizedBatch


class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

train_list_2012 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007  = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")


## Data augmentation
def init_data_gen():
	global nb_train_2012, nb_train_2007, nb_test_2007, orig_nb_images
	global nb_images_per_batch, nb_keep_val, max_nb_obj_per_image, image_size, seq_iaa
	global input_data, targets, input_val, targets_val, all_im, all_im_prop

	nb_train_2012 = 11540
	nb_train_2007 = 5011
	nb_test_2007 = 4952
	orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
	nb_keep_val = 1000 
	# Pre trained net was trained using all the 2007 test examples as validation dataset
	# To remain homogeneous, here we exclude all these examples and use only 1000 as validation to save some RAM
	nb_images_per_batch = 1000 
	# Pre trained net was trained using 4000, so achieving similar performance would require x4 epochs
	max_nb_obj_per_image = 48
	image_size = 288
	
	forced_regen = False
	
	seq_iaa = iaa.Sequential([
			iaa.Fliplr(0.5),
			iaa.Flipud(0.2),
			iaa.Sometimes(0.1, iaa.GaussianBlur(sigma=(0, 0.5))),
			iaa.LinearContrast((0.75, 1.5)),
			iaa.Sometimes(0.1, iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255), per_channel=0.5)),
			iaa.Multiply((0.8,1.2), per_channel=0.2),
			iaa.Affine(translate_percent={"x": (-0.2,0.2), "y": (-0.2,0.2)}),
			iaa.Sometimes(0.5,iaa.Affine(scale={"x": (0.8,1.2), "y": (0.8,1.2)})),
			iaa.Sometimes(0.2,iaa.Affine(rotate=(-10,10),shear=(-6,6))),
			iaa.Sometimes(0.02,iaa.Grayscale(alpha=(0.0, 1.0)))
		])
	
	all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
	all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
	all_im = np.reshape(all_im, ((orig_nb_images, image_size, image_size, 3)))
	all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

	input_data = np.zeros((nb_images_per_batch,image_size*image_size*3), dtype="float32")
	targets = np.zeros((nb_images_per_batch,1+max_nb_obj_per_image*7), dtype="float32")

	input_val = np.zeros((nb_keep_val,image_size*image_size*3), dtype="float32")
	targets_val = np.zeros((nb_keep_val,1+max_nb_obj_per_image*7), dtype="float32")

## Data augmentation
def create_train_batch(visual_w=0, visual_h=0):
	visual_iter = 0

	for i in range(0, nb_images_per_batch):
		
		i_d = np.random.randint(0,orig_nb_images - nb_test_2007)
		
		if(i_d < nb_train_2012):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/Annotations/"+train_list_2012[i_d]+".xml")
		elif(i_d < nb_train_2012+nb_train_2007):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+train_list_2007[i_d - nb_train_2012]+".xml")
		else:
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[i_d - nb_train_2012 - nb_train_2007]+".xml")
		
		root = tree.getroot()
		
		x_offset, y_offset, width2, height2 = all_im_prop[i_d]

		patch = np.copy(all_im[i_d])

		obj_list = root.findall("object", namespaces=None)
		nb_box = len(obj_list)
		for obj in obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				nb_box -= 1
				continue
		
		bbox_list = np.zeros((nb_box,5))
		
		k = 0
		for obj in obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			oclass = obj.find("name", namespaces=None)
			bndbox = obj.find("bndbox", namespaces=None)
			
			int_class = int(np.where(class_list[:] == oclass.text)[0])
			xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
			ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
			xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
			ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2
			
			bbox_list[k,:] = np.array([xmin,ymin,xmax,ymax,int_class])
			k += 1
			
		bbs = BoundingBoxesOnImage.from_xyxy_array(bbox_list[:,:4], shape=patch.shape)
		
		patch_aug, bbs_aug = seq_iaa(image=patch,bounding_boxes=bbs)
		
		for depth in range(0,3):
			input_data[i,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
		
		targets[i,0] = nb_box
		b_pos = 0
		for k in range(0, nb_box):
			l_b = bbs_aug.bounding_boxes[k]
			xmin = l_b.x1
			ymin = l_b.y1
			xmax = l_b.x2
			ymax = l_b.y2
				
			n_xmin = max(0, xmin)
			n_ymin = max(0, ymin)
			n_xmax = min(image_size, xmax)
			n_ymax = min(image_size, ymax)
			
			frac_in = (abs(n_xmax-n_xmin)*abs(n_ymax-n_ymin))/(abs(xmax-xmin)*abs(ymax-ymin))
			
			if(frac_in < 0.35 or (frac_in < 0.5 and (abs(n_xmax-n_xmin)*abs(n_ymax-n_ymin)) < 160) or (abs(xmax-xmin)*abs(ymax-ymin) < 160)):
				targets[i,0] -= 1
				continue
		
			targets[i,1+b_pos*7:1+(b_pos+1)*7] = np.array([bbox_list[k,4]+1, n_xmin,n_ymin,0.0,n_xmax,n_ymax,1.0])
			b_pos += 1

		
		if(visual_w*visual_h > 0):
			if(visual_iter == 0):
				fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
			
			c_x = visual_iter // visual_w
			c_y = visual_iter % visual_w
			
			ax[c_x,c_y].imshow(patch_aug)
			ax[c_x,c_y].axis('off')
			
			targ_boxes = targets[i]
			for k in range(0, int(targ_boxes[0])):
				xmin = targ_boxes[1+k*7+1]
				ymin = targ_boxes[1+k*7+2]
				xmax = targ_boxes[1+k*7+4]
				ymax = targ_boxes[1+k*7+5]
				p_c = int(targ_boxes[1+k*7+0]) - 1
			
				el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.8, ls="--", fill=False, color=plt.cm.tab20(p_c), zorder=3)
				c_patch = ax[c_x,c_y].add_patch(el)
				c_text = ax[c_x,c_y].text(xmin+4, ymin+15, "%s"%(class_list_short[p_c]), c=plt.cm.tab20(p_c), fontsize=6, clip_on=True)
				c_patch.set_path_effects([path_effects.Stroke(linewidth=2.0, foreground='black'),
												path_effects.Normal()])
				c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
												path_effects.Normal()])

			
			visual_iter += 1
			if(visual_iter >= visual_w*visual_h):
				plt.show()
				return

	return input_data, targets


def create_val_batch(visual_w=0, visual_h=0):
	visual_iter = 0

	for i in range(0, nb_keep_val):
		
		i_d = nb_train_2012+nb_train_2007+nb_test_2007-nb_keep_val+i

		tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007-nb_keep_val+i]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[i_d])

		x_offset, y_offset, width2, height2 = all_im_prop[i_d]

		for depth in range(0,3):
			input_val[i,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch[:,:,depth].flatten("C")/255.0
		
		k = 0
		obj_list = root.findall("object", namespaces=None)
		targets_val[i,0] = len(obj_list)
		for obj in obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				targets_val[i,0] -= 1
				continue
			oclass = obj.find("name", namespaces=None)
			bndbox = obj.find("bndbox", namespaces=None)

			int_class = np.where(class_list[:] == oclass.text)
			xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
			ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
			xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
			ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2
			
			n_xmin = max(0, xmin)
			n_ymin = max(0, ymin)
			n_xmax = min(image_size, xmax)
			n_ymax = min(image_size, ymax)
			
			frac_in = (abs(n_xmax-n_xmin)*abs(n_ymax-n_ymin))/(abs(xmax-xmin)*abs(ymax-ymin))
			
			if(frac_in < 0.35 or (frac_in < 0.5 and (abs(n_xmax-n_xmin)*abs(n_ymax-n_ymin)) < 192) or (abs(xmax-xmin)*abs(ymax-ymin) < 192)):
				targets_val[i,0] -= 1
				#print ("Removed", frac_in)
				continue

			targets_val[i,1+k*7:1+(k+1)*7] = np.array([int_class[0][0]+1, n_xmin,n_ymin,0.0,n_xmax,n_ymax,1.0])
			k += 1
			#print (class_list[int_class])
		
		
		if(visual_w*visual_h > 0):
			if(visual_iter == 0):
				fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
			
			c_x = visual_iter // visual_w
			c_y = visual_iter % visual_w
			
			ax[c_x,c_y].imshow(patch)
			ax[c_x,c_y].axis('off')
			
			targ_boxes = targets_val[i]
			for k in range(0, int(targ_boxes[0])):
				xmin = targ_boxes[1+k*7+1]
				ymin = targ_boxes[1+k*7+2]
				xmax = targ_boxes[1+k*7+4]
				ymax = targ_boxes[1+k*7+5]
				p_c = int(targ_boxes[1+k*7+0]) - 1
			
				el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.8, ls="--", fill=False, color=plt.cm.tab20(p_c), zorder=3)
				c_patch = ax[c_x,c_y].add_patch(el)
				c_text = ax[c_x,c_y].text(xmin+4, ymin+15, "%s"%(class_list_short[p_c]), c=plt.cm.tab20(p_c), fontsize=6, clip_on=True)
				c_patch.set_path_effects([path_effects.Stroke(linewidth=2.0, foreground='black'),
												path_effects.Normal()])
				c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
												path_effects.Normal()])

			
			visual_iter += 1
			if(visual_iter >= visual_w*visual_h):
				plt.show()
				return
	
	return input_val, targets_val


def free_data_gen():
  global all_im, all_im_prop, input_data, targets, input_val, targets_val
  del (all_im, all_im_prop, input_data, targets, input_val, targets_val)
  return



#### Training image examples

In [None]:
%%writefile /content/IRMIA_2022/yolo_detector/test_gen.py

import data_gen as gn3

gn3.init_data_gen()

print("Random augmented training examples")
gn3.create_train_batch(4,3)

print("\nOrdered validation examples")
gn3.create_val_batch(4,3)

gn3.free_data_gen()

In [None]:
# Might need to reload the notebook execution environment to unload previous data_gen afters changes
%cd /content/IRMIA_2022/yolo_detector/

%run test_gen.py

### **2\. Training the YOLO detector**

In [None]:
%%shell

cd /content/IRMIA_2022/yolo_detector/

python3 - <<EOF

import numpy as np
from threading import Thread
import data_gen as gn3

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn


def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

def data_augm():
	input_data, targets = gn3.create_train_batch()
	cnn.delete_dataset("TRAIN_buf", silent=1)
	cnn.create_dataset("TRAIN_buf", nb_images, input_data[:,:], targets[:,:], silent=1)
	return

nb_keep_val = 1000
# Pre trained net was trained using all the 2007 test examples as validation dataset
# To remain homogeneous, here we exclude all these examples and use only 1000 as validation to save some RAM
nb_images = 1000
# Pre trained net was trained using 4000, so achieving similar performance would require x4 epochs (should also lower the decay value)
nb_param = 0
nb_class = 20

max_nb_obj_per_image = 48

im_size = 288
nb_box = 5

nb_epoch_per_augm = 2
load_pre_trained = 0

load_epoch = -1

if(load_epoch == -1):
	fit_parts = i_ar([1, 1, 1, 1, -1])
	load_epoch = 6500
	total_epochs = 6500 + 10
	load_pre_trained = 1
elif(load_epoch < 100):
	# PRE TRAINING
	fit_parts = i_ar([0, 1, 1, 1, -1])
	total_epochs = 100
else:
	# REGULAR TRAINING
	fit_parts = i_ar([1, 1, 1, 1, -1])
	total_epochs = 8000

# Pre trained net was trained using b_size=32, here the learning_rate value has been increased accordingly
# The b_size value has been lowered so the network fit in memory
cnn.init(in_dim=i_ar([im_size,im_size]), in_nb_ch=3, out_dim=1+max_nb_obj_per_image*(7+nb_param),
	 b_size=16, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP32C_FP32A")

gn3.init_data_gen()

input_data, targets = gn3.create_train_batch()
input_val, targets_val = gn3.create_val_batch()

cnn.create_dataset("TRAIN", nb_images, input_data[:,:], targets[:,:])
cnn.create_dataset("VALID", nb_keep_val, input_val[:,:], targets_val[:,:])

##### YOLO parameters tuning #####
# Note : Using squared priors and default values for most of the
# following parameters is sufficient to reach an mAP > 30%.
# This specific setup was optimized to get the most out of this specific "light" YOLO architecture. 

prior_w = f_ar([24.,48.,96.,144.,256.])
prior_h = f_ar([24.,64.,176.,80.,192.])

prior_noobj_prob = f_ar([0.15,0.15,0.15,0.15,0.15])

#Relative scaling of each error "type" : 
#[Position, Size, Probability, Objectness, Class, Ex. Param]
error_scales = f_ar([4.0, 2.0, 1.0, 5.0, 3.0, 1.0])

#Various IoU limit conditions
#[Good but not best boxes, Prob. fit, Obj. fit, class fit, param fit] 
IoU_limits = f_ar([0.4, -0.5, -1.0, -1.0, -0.3, -0.3])

slopes_and_maxes = f_ar([[1.0, 4.5, -4.5],\
						 [0.5, 1.2, -1.4],\
						 [1.0, 4.5, -4.5],\
						 [1.0, 4.5, -4.5],\
						 [1.0, 4.5, -4.5],\
						 [1.0, 2.0, -0.2]])

# A value of 1 might be better here if the choice of priors is appropriate 
strict_box_size = 2

start_block = int(load_epoch / nb_epoch_per_augm)

nb_yolo_filters = cnn.set_yolo_params(nb_box = nb_box, prior_w = prior_w, prior_h = prior_h, nb_class = nb_class, nb_param=nb_param,
				prior_noobj_prob = prior_noobj_prob, IoU_type = "GIoU", error_scales = error_scales,
				slopes_and_maxes = slopes_and_maxes, IoU_limits = IoU_limits, fit_parts = fit_parts, strict_box_size=strict_box_size)

if(load_pre_trained):
	cnn.load("/content/IRMIA_2022/pre_trained_nets/yolo_detector_net0_s6500.dat",6500, bin=1)
elif(load_epoch > 0):
	cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:
	# This specific architecture might be too difficult to train in a free Colab environement
	# Using half the number of filters for the first 5 conv layer can still provide good results with an mAP ~ 30%
	cnn.conv(f_size=i_ar([3,3]), nb_filters=24, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=48, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=96, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=128, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=256, padding=i_ar([1,1]), activation="RELU")
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=256, padding=i_ar([1,1]), activation="RELU")
	cnn.conv(f_size=i_ar([3,3]), nb_filters=256, padding=i_ar([1,1]), activation="RELU")
	cnn.conv(f_size=i_ar([1,1]), nb_filters=512, padding=i_ar([0,0]), activation="RELU", drop_rate=0.1)
	cnn.conv(f_size=i_ar([1,1]), nb_filters=nb_yolo_filters, padding=i_ar([0,0]), activation="YOLO")


for batch_augm in range(start_block,int(total_epochs/nb_epoch_per_augm)): 
	
	t = Thread(target=data_augm)
	t.start()
	
	cnn.train(nb_epoch=nb_epoch_per_augm, learning_rate=0.0003, end_learning_rate=0.000, shuffle_every=0,\
			 momentum=0.8, decay=0.0010, save_every=50, silent=1, save_bin=1, TC_scale_factor=32.0)
	if(batch_augm == 0):
		cnn.perf_eval()

	t.join()
	
	cnn.swap_data_buffers("TRAIN")

gn3.free_data_gen()
del (input_data, targets, input_val, targets_val)

EOF

### **3\. Post process the prediction**

Simple network forward

In [None]:
%%shell

cd /content/IRMIA_2022/yolo_detector/

python3 - <<EOF

import numpy as np
from threading import Thread
import data_gen as gn3
import re
import os

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

nb_keep_val = 1000
nb_param = 0
nb_class = 20

max_nb_obj_per_image = 48

im_size = 288
nb_box = 5


cnn.init(in_dim=i_ar([im_size,im_size]), in_nb_ch=3, out_dim=1+max_nb_obj_per_image*(7+nb_param),
	 b_size=16, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP32C_FP32A")

gn3.init_data_gen()

input_val, targets_val = gn3.create_val_batch()

cnn.create_dataset("TEST", nb_keep_val, input_val[:,:], targets_val[:,:])

##### YOLO parameters tuning #####

#Size priors for all possible boxes per grid. element 
#prior = f_ar([12.,16.,32.,64.,128.,192.])

prior_w = f_ar([24.,48.,96.,144.,256.])
prior_h = f_ar([24.,64.,176.,80.,192.])

prior_noobj_prob = f_ar([0.15,0.15,0.15,0.15,0.15])

#Relative scaling of each error "type" : 
#[Position, Size, Probability, Objectness, Class, Ex. Param]
error_scales = f_ar([4.0, 2.0, 1.0, 5.0, 3.0, 1.0])

#Various IoU limit conditions
#[Good but not best boxes, Prob. fit, Obj. fit, class fit, param fit] 
IoU_limits = f_ar([0.4, -0.5, -1.0, -1.0, -0.3, -0.3])

slopes_and_maxes = f_ar([[1.0, 4.5, -4.5],\
						 [0.5, 1.2, -1.4],\
						 [1.0, 4.5, -4.5],\
						 [1.0, 4.5, -4.5],\
						 [1.0, 4.5, -4.5],\
						 [1.0, 2.0, -0.2]])
				 
nb_yolo_filters = cnn.set_yolo_params(nb_box = nb_box, prior_w = prior_w, prior_h = prior_h, nb_class = nb_class, nb_param=nb_param,
                                        prior_noobj_prob = prior_noobj_prob, IoU_type = "GIoU", error_scales = error_scales,
                                        slopes_and_maxes = slopes_and_maxes, IoU_limits = IoU_limits, strict_box_size=1)

load_epoch = -1
if(load_epoch == -1):
	cnn.load("/content/IRMIA_2022/pre_trained_nets/yolo_detector_net0_s6500.dat",6500, bin=1)
	load_epoch = 6500
elif(load_epoch > 0):
	cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:
	files = os.listdir("/content/IRMIA_2022/yolo_detector/net_save/")
	paths = [os.path.join("/content/IRMIA_2022/yolo_detector/net_save/", basename) for basename in files]
	path = max(paths, key=os.path.getctime)
	r_load_epoch = [int(s) for s in re.split('[s.]',path) if s.isdigit()]
	print(r_load_epoch)
	print("Epoch unspecified, loading most recent save : " + path)
	
	cnn.load(path, r_load_epoch[0], bin=1)

cnn.forward(no_error=1, saving=2)

gn3.free_data_gen()
del (input_val, targets_val)

EOF

#### Loading raw Network prediction



In [None]:
%cd /content/IRMIA_2022/yolo_detector/

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image

import re
import bisect
import os
import sys
from numba import jit

class_list = np.array(["aeroplane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","diningtable","dog","horse", "motorbike",\
    "person","pottedplant","sheep","sofa","train","tvmonitor","background"])
class_list_short = np.array(["plane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","table","dog","horse", "m-bike",\
    "person","p-plant","sheep","sofa","train","tv","background"])

test_list = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = 11540 + 5011 + 4952
nb_keep_val = 1000

image_size = 288
nb_box = 5
nb_class = 20
nb_param = 0

max_nb_obj_per_image = 48

yolo_nb_reg = int(image_size/32)
c_size = 32

all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size, image_size, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

load_epoch = 0	
if(load_epoch == 0):
	files = os.listdir("fwd_res/")
	paths = [os.path.join("fwd_res/", basename) for basename in files]
	path = max(paths, key=os.path.getctime)
	r_load_epoch = [int(s) for s in re.split('[_s.]',path) if s.isdigit()]
	print (r_load_epoch)
	print("Epoch unspecified, loading most recent prediction : " + path)
	
	load_epoch = r_load_epoch[0]

prior_w = np.array([24.,48.,96.,144.,256.])
prior_h = np.array([24.,64.,176.,80.,192.])

pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%load_epoch, dtype="float32")
predict = np.reshape(pred_raw, (nb_keep_val, nb_box*(8+nb_param+nb_class),yolo_nb_reg,yolo_nb_reg))

@jit(nopython=True, cache=True, fastmath=False)
def global_to_tile_coord(offset_tab, tile_coords, priors, c_size):
	bx = (offset_tab[0] + tile_coords[1])*c_size
	by = (offset_tab[1] + tile_coords[0])*c_size
	bw = priors[0]*np.exp(offset_tab[3])
	bh = priors[1]*np.exp(offset_tab[4])
	return float(bx), float(by), float(bw), float(bh)
 
@jit(nopython=True, cache=True, fastmath=False)
def box_extraction(c_pred, c_box, c_tile):
  c_nb_box = 0
  for i in range(0,yolo_nb_reg):
    for j in range(0,yolo_nb_reg):
      for k in range(0,nb_box):
        offset = int(k*(8+nb_param+nb_class)) #no +1 for box prior in prediction
        c_box[4] = c_pred[offset+6,i,j]
        c_box[5] = c_pred[offset+7,i,j]
        p_c = np.max(c_pred[offset+8:offset+8+nb_class,i,j])
        cl = np.argmax(c_pred[offset+8:offset+8+nb_class,i,j]) 
        
        bx, by, bw, bh = global_to_tile_coord(c_pred[offset:offset+6,i,j], \
                  np.array([i,j]), np.array([prior_w[k], prior_h[k]]), c_size)
        c_box[0] = bx - bw*0.5
        c_box[1] = by - bh*0.5
        c_box[2] = bx + bw*0.5
        c_box[3] = by + bh*0.5
        
        c_box[6] = k
        c_box[7:] = c_pred[offset+8:offset+8+nb_param+nb_class,i,j]
        c_tile[c_nb_box,:] = c_box[:]
        c_nb_box +=1
  return c_nb_box

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")
patch = np.zeros((image_size, image_size), dtype="float32")

final_boxes = []

for l in range(0,nb_keep_val):
	c_tile[:,:] = 0.0
	c_tile_kept[:,:] = 0.0

	c_pred = predict[l,:,:,:]
	c_nb_box = box_extraction(c_pred, c_box, c_tile_kept)			
	final_boxes.append(np.copy(c_tile_kept[0:c_nb_box]))


Display raw prediction of the YOLO network (all boxes)

In [None]:
i_d = 0

fig, ax = plt.subplots(1, 1, figsize=(5,5), dpi=160, constrained_layout=True)

c_data = all_im[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i_d]/255.0
ax.imshow(c_data)
ax.axis('off')

im_boxes = final_boxes[i_d]

for k in range(0, np.shape(im_boxes)[0]):
			xmin = max(-0.5,(im_boxes[k,0]) - 0.5)
			ymin = max(-0.5,(im_boxes[k,1]) - 0.5)
			xmax = min(image_size-0.5,(im_boxes[k,2]) - 0.5)
			ymax = min(image_size-0.5,(im_boxes[k,3]) - 0.5)
			
			p_c = np.argmax(im_boxes[k,7:])
			
			el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=2.0*im_boxes[k,5]+0.5, fill=False, color=plt.cm.tab20(p_c), zorder=3)
			ax.add_patch(el)
   
plt.show()

#### Objectness filtering

In [None]:
@jit(nopython=True, cache=True, fastmath=False)
def box_filter(c_pred, c_box, c_tile, obj_limit, class_limit):
  c_nb_box = 0
  for i in range(0,yolo_nb_reg):
    for j in range(0,yolo_nb_reg):
      for k in range(0,nb_box):
        offset = int(k*(8+nb_param+nb_class)) #no +1 for box prior in prediction
        c_box[4] = c_pred[offset+6,i,j]
        c_box[5] = c_pred[offset+7,i,j]
        p_c = np.max(c_pred[offset+8:offset+8+nb_class,i,j])
        cl = np.argmax(c_pred[offset+8:offset+8+nb_class,i,j]) 
        
        if(c_box[5] >= obj_limit and p_c > class_limit):
          bx, by, bw, bh = global_to_tile_coord(c_pred[offset:offset+6,i,j], \
                    np.array([i,j]), np.array([prior_w[k], prior_h[k]]), c_size)
          c_box[0] = max(0,bx - bw*0.5 - 1)
          c_box[1] = max(0,by - bh*0.5 - 1)
          c_box[2] = min(image_size,bx + bw*0.5 + 1)
          c_box[3] = min(image_size,by + bh*0.5 + 1)
          
          c_box[6] = k
          c_box[7:] = c_pred[offset+8:offset+8+nb_param+nb_class,i,j]
          c_tile[c_nb_box,:] = c_box[:]
          c_nb_box +=1
  return c_nb_box

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")
patch = np.zeros((image_size, image_size), dtype="float32")

final_boxes = []

obj_limit = 0.3
class_limit = 0.6

for l in range(0,nb_keep_val):
	c_tile[:,:] = 0.0
	c_tile_kept[:,:] = 0.0

	c_pred = predict[l,:,:,:]
	c_nb_box = box_filter(c_pred, c_box, c_tile_kept, obj_limit, class_limit)
	final_boxes.append(np.copy(c_tile_kept[0:c_nb_box]))

In [None]:
i_d = 0

fig, ax = plt.subplots(1, 1, figsize=(5,5), dpi=160, constrained_layout=True)

c_data = all_im[nb_train_2007 + nb_test_2007 + nb_train_2012 - nb_keep_val + i_d]/255.0
ax.imshow(c_data)
ax.axis('off')

im_boxes = final_boxes[i_d]

for k in range(0, np.shape(im_boxes)[0]):
  xmin = max(-0.5,(im_boxes[k,0]) - 0.5)
  ymin = max(-0.5,(im_boxes[k,1]) - 0.5)
  xmax = min(image_size-0.5,(im_boxes[k,2]) - 0.5)
  ymax = min(image_size-0.5,(im_boxes[k,3]) - 0.5)
  
  p_c = np.argmax(im_boxes[k,7:])
  
  el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=2.0*im_boxes[k,5]+0.5, fill=False, color=plt.cm.tab20(p_c), zorder=3)
  ax.add_patch(el)
  ax.text(xmin+2, ymin-3, "%s:%0.2f-%0.2f"%(class_list_short[p_c],im_boxes[k,5],np.max(im_boxes[k,7:])), c=plt.cm.tab20(p_c), fontsize=9,clip_on=True)
   
plt.show()

#### Non-Max suppression

In [None]:

@jit(nopython=True, cache=True, fastmath=False)
def fct_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d
	enclose_w = (max(box1[2], box2[2]) - min(box1[0], box2[0]))
	enclose_h = (max(box1[3], box2[3]) - min(box1[1],box2[1]))
	enclose_2d = enclose_w*enclose_h

	cx_a = (box1[2] + box1[0])*0.5; cx_b = (box2[2] + box2[0])*0.5
	cy_a = (box1[3] + box1[1])*0.5; cy_b = (box2[3] + box2[1])*0.5
	dist_cent = np.sqrt((cx_a - cx_b)*(cx_a - cx_b) + (cy_a - cy_b)*(cy_a - cy_b))
	diag_enclose = np.sqrt(enclose_w*enclose_w + enclose_h*enclose_h)

  # DIoU
	#return float(inter_2d)/float(uni_2d) - float(dist_cent)/float(diag_enclose)
  # GIoU
	return float(inter_2d)/float(uni_2d) - float(enclose_2d - uni_2d)/float(enclose_2d)
	
@jit(nopython=True, cache=True, fastmath=False)
def fct_classical_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d

	return float(inter_2d)/float(uni_2d)

@jit(nopython=True, cache=True, fastmath=False)
def apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold):
  c_nb_box_final = 0
  is_match = 1
  c_box_size_prev = c_nb_box

  while(c_nb_box > 0):
    max_objct = np.argmax(c_tile[:c_box_size_prev,5])
    c_box = np.copy(c_tile[max_objct])
    c_tile[max_objct,4] = 0.0
    c_tile_kept[c_nb_box_final] = c_box
    c_nb_box_final += 1
    c_nb_box -= 1
    i = 0
    for i in range(0,c_box_size_prev):
      if(c_tile[i,5] < 0.0000001):
        continue
      IoU = fct_IoU(c_box[:4], c_tile[i,:4])
      if(IoU > nms_threshold):
        c_tile[i] = 0.0
        c_nb_box -= 1
     
  return c_nb_box_final

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")
patch = np.zeros((image_size, image_size), dtype="float32")

final_boxes = []

obj_limit = 0.3
class_limit = 0.5

nms_threshold = 0.3 #here using GIoU in the interval [-1,1]
#lower value is more strict

for l in range(0,nb_keep_val):
  c_tile[:,:] = 0.0
  c_tile_kept[:,:] = 0.0

  c_pred = predict[l,:,:,:]
  c_nb_box = box_filter(c_pred, c_box, c_tile, obj_limit, class_limit)

  c_nb_box_final = c_nb_box
  c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold)
  final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))


In [None]:

fig, ax = plt.subplots(1, 1, figsize=(5,5), dpi=210, constrained_layout=True)

c_data = all_im[nb_train_2007 + nb_test_2007 + nb_train_2012 - nb_keep_val + i_d]/255.0
ax.imshow(c_data)
ax.axis('off')

im_boxes = final_boxes[i_d]

for k in range(0, np.shape(im_boxes)[0]):
  xmin = max(-0.5,(im_boxes[k,0]) - 0.5)
  ymin = max(-0.5,(im_boxes[k,1]) - 0.5)
  xmax = min(image_size-0.5,(im_boxes[k,2]) - 0.5)
  ymax = min(image_size-0.5,(im_boxes[k,3]) - 0.5)
  
  p_c = np.argmax(im_boxes[k,7:])
  
  el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=1.5, fill=False, color=plt.cm.tab20(p_c), zorder=3)
  c_patch = ax.add_patch(el)
  c_text = ax.text(xmin+2, ymin-3, "%s:%0.2f-%0.2f"%(class_list_short[p_c],im_boxes[k,5],np.max(im_boxes[k,7:])), c=plt.cm.tab20(p_c), fontsize=9,clip_on=True)
  c_patch.set_path_effects([path_effects.Stroke(linewidth=2.5, foreground='black'),
                        path_effects.Normal()])
  c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                        path_effects.Normal()])

plt.show()

Load the target boxes

In [None]:
targets = np.zeros((nb_keep_val,1+max_nb_obj_per_image*7), dtype="float32")

class_count = np.zeros((nb_class))

for i in tqdm(range(0, nb_keep_val)):
	
	tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list[nb_test_2007 - nb_keep_val + i]+".xml")
	root = tree.getroot()
	
	x_offset, y_offset, width2, height2 = all_im_prop[orig_nb_images - nb_keep_val + i]
	
	k = 0
	obj_list = root.findall("object", namespaces=None)
	targets[i,0] = len(obj_list)
	for obj in obj_list:
		diff = obj.find("difficult", namespaces=None)
		if(diff.text == "1"):
			targets[i,0] -= 1
			continue
		oclass = obj.find("name", namespaces=None)
		bndbox = obj.find("bndbox", namespaces=None)

		int_class = np.where(class_list[:] == oclass.text)
		xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
		ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
		xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
		ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2

		targets[i,1+k*7:1+(k+1)*7] = np.array([int_class[0][0]+1, xmin,ymin,0.0,xmax,ymax,1.0])
		class_count[int_class[0][0]] += 1
		
		k += 1


In [None]:
id_start = 0

nb_w = 4
nb_h = 8

fig, ax = plt.subplots(nb_h, nb_w, figsize=(1.5*nb_w,1.5*nb_h), dpi=210, constrained_layout=True)

for i in range(0, nb_h):
	for j in range(0, nb_w):
		i_d = i*nb_w + j + id_start
		
		c_data = all_im[nb_train_2007 + nb_test_2007 + nb_train_2012 - nb_keep_val + i_d]/255.0
		ax[i,j].imshow(c_data)
		ax[i,j].axis('off')
		
		im_boxes = final_boxes[i_d]
		
		targ_boxes = targets[i_d]
		for k in range(0, int(targ_boxes[0])):
			xmin = (targ_boxes[1+k*7+1])
			ymin = (targ_boxes[1+k*7+2])
			xmax = (targ_boxes[1+k*7+4])
			ymax = (targ_boxes[1+k*7+5])
			p_c = int(targ_boxes[1+k*7+0]) - 1
			el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=1.0, ls="--", fill=False, color=plt.cm.tab20(p_c), zorder=3)
			c_patch = ax[i,j].add_patch(el)
			c_patch.set_path_effects([path_effects.Stroke(linewidth=2.0, foreground='black'),
												path_effects.Normal()])

		for k in range(0, np.shape(im_boxes)[0]):
			xmin = max(-0.5,(im_boxes[k,0]) - 0.5)
			ymin = max(-0.5,(im_boxes[k,1]) - 0.5)
			xmax = min(image_size-0.5,(im_boxes[k,2]) - 0.5)
			ymax = min(image_size-0.5,(im_boxes[k,3]) - 0.5)
			
			p_c = np.argmax(im_boxes[k,7:])
			
			el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=1.0, fill=False, color=plt.cm.tab20(p_c), zorder=3)
			c_patch = ax[i,j].add_patch(el)
			c_text = ax[i,j].text(xmin+5, ymin+18, "%s:%0.2f-%0.2f"%(class_list[p_c],im_boxes[k,5],np.max(im_boxes[k,7:])), c=plt.cm.tab20(p_c), fontsize=4,clip_on=True)
			c_patch.set_path_effects([path_effects.Stroke(linewidth=2.0, foreground='black'),
												path_effects.Normal()])
			c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
												path_effects.Normal()])

plt.show()


### **4\. Performance metric with mAP**

When measuring AP performance, the objectness (or any other sensitvity) threshold must be low,
since the metric actually integrate over all the confidence interval.

In [None]:
c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")
patch = np.zeros((image_size, image_size), dtype="float32")

final_boxes = []

# The mAP metric integrate the sensitivity of the network over the full range of objectness,
# therefore the obj_limit can be very low to sample the low objectness part of the AP curves
# without lowering the end mAP score (still non zero to save up computation time).
# 

obj_limit = 0.05
class_limit = 0.4

nms_threshold = 0.2 #here using DIoU in the interval [-1,1]
#lower value is more strict

AP_IoU_val = 0.5

for l in range(0,nb_keep_val):
  c_tile[:,:] = 0.0
  c_tile_kept[:,:] = 0.0

  c_pred = predict[l,:,:,:]
  c_nb_box = box_filter(c_pred, c_box, c_tile, obj_limit, class_limit)

  c_nb_box_final = c_nb_box
  c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold)
  final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))


recall_precision = np.empty((nb_keep_val), dtype="object")

print("Find associations ...", flush=True)

for i_d in tqdm(range(0, nb_keep_val)):
				
	recall_precision[i_d] = np.zeros((np.shape(final_boxes[i_d])[0], 6))
	
	if(np.shape(final_boxes[i_d])[0] == 0):
		continue
	
	recall_precision[i_d][:,0] = np.amax(final_boxes[i_d][:,7:], axis=1)
	recall_precision[i_d][:,1] = final_boxes[i_d][:,5]
	
	recall_precision[i_d][:,5] = np.argmax(final_boxes[i_d][:,7:], axis=1)
	
	kept_boxes = targets[i_d]
	
	IoU_table = np.zeros((int(kept_boxes[0]),np.shape(final_boxes[i_d])[0])) - 1.0
	
	for i in range(0,int(kept_boxes[0])):
		for j in range(0,np.shape(final_boxes[i_d])[0]):
			xmin = (kept_boxes[1+i*7+1])
			ymin = (kept_boxes[1+i*7+2])
			xmax = (kept_boxes[1+i*7+4])
			ymax = (kept_boxes[1+i*7+5])
			c_kept_box = np.array([xmin, ymin, xmax, ymax])
			IoU_table[i,j] = fct_classical_IoU(c_kept_box, final_boxes[i_d][j,:4])
			
	# Loop over the true boxes to find best prediction associated
	for i in range(0,int(kept_boxes[0])):
		best_match_id = np.unravel_index(np.argmax(IoU_table),np.shape(IoU_table))
		
		best_match_IoU = IoU_table[best_match_id]
		
		IoU_table[best_match_id[0],:] = -1.0
		
		if (best_match_IoU >= AP_IoU_val and np.argmax(final_boxes[i_d][best_match_id[1],7:]) == int(kept_boxes[1+best_match_id[0]*7+0]-1)):
		#if(c_IoU >= AP_IoU_val):
			recall_precision[i_d][best_match_id[1],2] = 1
			recall_precision[i_d][best_match_id[1],3] = best_match_id[1]
			recall_precision[i_d][best_match_id[1],4] = best_match_IoU
			IoU_table[:,best_match_id[1]] = -1.0
		

print("Process and flatten the mAP result")
flatten = np.vstack(recall_precision.flatten())

recall_precision_f = np.zeros((np.shape(flatten)[0], 10))
recall_precision_f[:,:6] = flatten[:,:]

recall_precision_fs = (recall_precision_f[(recall_precision_f[:,1]*recall_precision_f[:,0]).argsort()])[::-1]

recall_precision_fs[:,6] = np.cumsum(recall_precision_fs[:,2])
recall_precision_fs[:,7] = np.cumsum(1.0 - recall_precision_fs[:,2])
recall_precision_fs[:,8] = recall_precision_fs[:,6] / (recall_precision_fs[:,6]+recall_precision_fs[:,7])
recall_precision_fs[:,9] = recall_precision_fs[:,6] / np.sum(class_count)


interp_curve = np.zeros((np.shape(recall_precision_fs)[0],2))

interp_curve[:,0] = recall_precision_fs[:,9]
#Go in reverse to set the value for the all point interpolation
c_max_val = np.min(recall_precision_fs[:,8])
for i in range(0, np.shape(recall_precision_fs)[0]):
    i_d = np.shape(recall_precision_fs)[0] - i - 1
    if(recall_precision_fs[i_d,8] > c_max_val):
        c_max_val = recall_precision_fs[i_d,8]
    interp_curve[i_d,1] = c_max_val
    

AP_all = np.trapz(interp_curve[:,1], interp_curve[:,0])
print ("AP_all (%.2f): %f%%"%(AP_IoU_val, AP_all*100.0))

    
plt.figure(figsize=(4*1.0,3*1.0), dpi=200, constrained_layout=True)
plt.plot(recall_precision_fs[:,9], recall_precision_fs[:,8])
plt.plot(interp_curve[:,0], interp_curve[:,1], label="New")
plt.xlabel(r"Recall")
plt.ylabel(r"Precision")
plt.title("All classes as one AP curve", fontsize=8)

#print (class_count)
sumAP = 0
print ("**** Per class AP ****")
fig, ax = plt.subplots(figsize=(4*1.3,3*1.3), dpi=200, constrained_layout=True)
plt.xlabel(r"Recall")
plt.ylabel(r"Precision")
for k in range(0, nb_class):
	index = np.where(recall_precision_fs[:,5] == k)
	l_recall_precision_fs = recall_precision_fs[index[0]]
	l_recall_precision_fs[:,6] = np.cumsum(l_recall_precision_fs[:,2])
	l_recall_precision_fs[:,7] = np.cumsum(1.0 - l_recall_precision_fs[:,2])
	l_recall_precision_fs[:,8] = l_recall_precision_fs[:,6] / (l_recall_precision_fs[:,6]+l_recall_precision_fs[:,7])
	l_recall_precision_fs[:,9] = l_recall_precision_fs[:,6] / class_count[k]
	
	interp_curve = np.zeros((np.shape(index[0])[0],2))

	interp_curve[:,0] = l_recall_precision_fs[:,9]
	#Go in reverse to set the value for the all point interpolation
	c_max_val = np.min(l_recall_precision_fs[:,8])
	for i in range(0, np.shape(l_recall_precision_fs)[0]):
		i_d = np.shape(l_recall_precision_fs)[0] - i - 1
		if(l_recall_precision_fs[i_d,8] > c_max_val):
			c_max_val = l_recall_precision_fs[i_d,8]
		interp_curve[i_d,1] = c_max_val
	
	AP = np.trapz(interp_curve[:,1], interp_curve[:,0])
	sumAP += AP
	
	plt.plot(interp_curve[:,0], interp_curve[:,1], label=class_list_short[k],c=plt.cm.tab20(k))
	#plt.plot(l_recall_precision_fs[:,9], l_recall_precision_fs[:,8], label=class_list[k], c=plt.cm.tab20(k))
	
	print("AP %-8s: %5.2f%%     Total: %4d - T: %4d - F: %4d"%(class_list_short[k], AP*100.0, class_count[k], l_recall_precision_fs[-1,6], l_recall_precision_fs[-1,7]))
plt.legend(bbox_to_anchor=(1.02,0.98), fontsize=8)
plt.title("Per class AP curve", fontsize=8)

print ("\n**** mAP (%.2f): %f%% ****"%(AP_IoU_val, sumAP/nb_class*100.0))

plt.show()







In [None]:
# A runtime restart is advised before going to section 6

### **5\. Practical work**

**How to improve the detection result ?**

---
**1. Improve the prediction post-process**  
**Note:** *The present "post-school" version of the notebook includes an optimized post-process in a new section 6*


  *   Use a more complex object filtering
      *   CIANNA predicts separate Probability, Objectness, and class score  
        
          The current selection is based on [Prob x Obj.]  
          -- Try [Prob x Obj. x Class] instead  
          -- Try defining different thresholds for each of them
      *   The previous elements are both box and class dependant  

          Single values selection can be replaced by lists of values corresponding to each box/class  
          -- Try having a different Probability / Objectness threshold for each box size prior  
          -- Try having different class score threshold for each class. Update the class association for the boxes (e.g.put to zero class scores below the selected threshold before selecting the class)  
          -- Advanced: try having a list of Probability / Objectness / Class score thresholds for each box prior

  *   Have a more advanced NMS  
      -- Try different types of IoU for NMS (Classical, GIoU, DIoU, CIoU)  
      -- Try having different IoU threshold for boxes of the same class and boxes of a different class
      -- For the two previous cases, try having a different IoU threshold depending on the quality of the prediction (Try Prob. only at first, and then any combination of Prob. / Obj. / class)  
      -- As before, these values might also be different for each box prior (e.g being more strict if boxes from large size priors overlap than for smaller size priors)   

  *   Try MC dropout prediction  
      -- Choose what to do with the output list for each box element (pos, size, Prob., Obj., class.). E.g., average, percentile, keep maximum, keep minimum, etc.  
      -- Try computing the STD of the prediction on some output parts (e.g. Prob.) and use them as new filtering criteria  
---
**2. Improve the network itself**  
**Note:** *The following suggestions are already included in the present "post-school" notebook version. Still, the choice of parameters and priors might remain suboptimal, and one can still explore how changing these parameters affect the training results.**

Most suggested changes would work better on a new training from scratch. Still, continuing the training from the pre-trained network might be possible for some of them.
  *   Change the number of box size priors or their sizes  
      More box priors would help in crowded areas. It might also be useful to have less distance between two size priors.  
      BUT training is more difficult with a lot of box priors. Each box prior is optimized independently of the others, so they actually "share" the list of training examples. This might produce poorly constrained boxes or even cause unstable training! Splitting the box size space in non-square priors might be a better way to distribute the objects over the independent box priors.
  * Have a different "no object" scaling for each size-prior to better follow the target object size distribution  
  * Rebalance the loss by applying a different scaling to each element  
  * Adjust some IoU limits, mainly the "Good but not best association" limit, but also the limits for probability, objectness and classification inclusion (cascading loss).
  * Change the output activation function parameters. For example, choosing a greater slope for the classification should reduce the number of epochs required while getting closer to a binary behavior.
  * Finally, play on the network architecture. Note that there is a strong interplay between all the previous parameters and the selected architecture.  
    -- Try increasing the input size  
    -- Try changing the output grid size (either from the input size or by changing the spatial reduction factor)  
    -- Try replacing the pooling layers with stride 2 convolution filters  
    -- Play with the network depth and number of filters in the various layers  
---
**3. Improve the training dataset**
  * The present training dataset is strongly imbalanced. Achieving real balance would be difficult (and most probably not very efficient) dueto the fact that there are several objects of differenc class per image. Still, adding a scaling to each class Loss or to the probability of drawing a random image depending on the contained objects could strongly improve the results if done carefully.
  * Some training images are un-necessarily confusing either due to the image themself or their labeling. Removing such "outliers" (either the image, or the target box) might in fact improve the overall detection. It is also possible to play on the objects determined as difficult depending on the context, or to filter too small, or too overlapping boxes.
  * Finally, it is common to pre-train a detection network as a classifier at first since large datasets like **ImageNet** are available for such task. After training the classifier, the last few layers are removed and replaced by the last layers of a detection network before further training. 




### **6\. Proposed post-process optimization (Post-School update)**


#### Visually appealing


In [None]:
%cd /content/IRMIA_2022/yolo_detector/

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image

import re
import bisect
import os
import sys
from numba import jit

class_list = np.array(["aeroplane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","diningtable","dog","horse", "motorbike",\
    "person","pottedplant","sheep","sofa","train","tvmonitor","background"])
class_list_short = np.array(["plane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","table","dog","horse", "m-bike",\
    "person","p-plant","sheep","sofa","train","tv","background"])

test_list = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = 11540 + 5011 + 4952
nb_keep_val = 1000

image_size = 288
nb_box = 5
nb_class = 20
nb_param = 0

max_nb_obj_per_image = 48

yolo_nb_reg = int(image_size/32)
c_size = 32

all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size, image_size, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

load_epoch = 0	
if(load_epoch == 0):
	files = os.listdir("fwd_res/")
	paths = [os.path.join("fwd_res/", basename) for basename in files]
	path = max(paths, key=os.path.getctime)
	r_load_epoch = [int(s) for s in re.split('[_s.]',path) if s.isdigit()]
	print (r_load_epoch)
	print("Epoch unspecified, loading most recent prediction : " + path)
	
	load_epoch = r_load_epoch[0]

prior_w = np.array([24.,48.,96.,144.,256.])
prior_h = np.array([24.,64.,176.,80.,192.])

pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%load_epoch, dtype="float32")
predict = np.reshape(pred_raw, (nb_keep_val, nb_box*(8+nb_param+nb_class),yolo_nb_reg,yolo_nb_reg))

@jit(nopython=True, cache=True, fastmath=False)
def global_to_tile_coord(offset_tab, tile_coords, priors, c_size):
	bx = (offset_tab[0] + tile_coords[1])*c_size
	by = (offset_tab[1] + tile_coords[0])*c_size
	bw = priors[0]*np.exp(offset_tab[3])
	bh = priors[1]*np.exp(offset_tab[4])
	return float(bx), float(by), float(bw), float(bh)
 
@jit(nopython=True, cache=True, fastmath=False)
def box_extraction(c_pred, c_box, c_tile, prob_obj_cases, class_soft_limit):
  c_nb_box = 0
  for i in range(0,yolo_nb_reg):
    for j in range(0,yolo_nb_reg):
      for k in range(0,nb_box):
        offset = int(k*(8+nb_param+nb_class)) #no +1 for box prior in prediction
        c_box[4] = c_pred[offset+6,i,j]
        c_box[5] = c_pred[offset+7,i,j]
        p_c = np.max(c_pred[offset+8:offset+8+nb_class,i,j])
        cl = np.argmax(c_pred[offset+8:offset+8+nb_class,i,j]) 
        
        if(c_box[5]*p_c >= prob_obj_cases[k] and p_c > class_soft_limit[0]):
          bx, by, bw, bh = global_to_tile_coord(c_pred[offset:offset+6,i,j], \
                    np.array([i,j]), np.array([prior_w[k], prior_h[k]]), c_size)
          c_box[0] = max(0,bx - bw*0.5 - 1)
          c_box[1] = max(0,by - bh*0.5 - 1)
          c_box[2] = min(image_size,bx + bw*0.5 + 1)
          c_box[3] = min(image_size,by + bh*0.5 + 1)
          
          c_box[6] = k
          c_box[7:] = c_pred[offset+8:offset+8+nb_param+nb_class,i,j]
          c_tile[c_nb_box,:] = c_box[:]
          c_nb_box +=1

  return c_nb_box

@jit(nopython=True, cache=True, fastmath=False)
def fct_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d
	enclose_w = (max(box1[2], box2[2]) - min(box1[0], box2[0]))
	enclose_h = (max(box1[3], box2[3]) - min(box1[1],box2[1]))
	enclose_2d = enclose_w*enclose_h

	cx_a = (box1[2] + box1[0])*0.5; cx_b = (box2[2] + box2[0])*0.5
	cy_a = (box1[3] + box1[1])*0.5; cy_b = (box2[3] + box2[1])*0.5
	dist_cent = np.sqrt((cx_a - cx_b)*(cx_a - cx_b) + (cy_a - cy_b)*(cy_a - cy_b))
	diag_enclose = np.sqrt(enclose_w*enclose_w + enclose_h*enclose_h)

  # DIoU
	#return float(inter_2d)/float(uni_2d) - float(dist_cent)/float(diag_enclose)
  # GIoU
	return float(inter_2d)/float(uni_2d) - float(enclose_2d - uni_2d)/float(enclose_2d)
	
@jit(nopython=True, cache=True, fastmath=False)
def fct_classical_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d

	return float(inter_2d)/float(uni_2d)

#@jit(nopython=True, cache=True, fastmath=False)
def apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold):
  c_nb_box_final = 0
  is_match = 1
  c_box_size_prev = c_nb_box

  while(c_nb_box > 0):
    max_objct = np.argmax(c_tile[:c_box_size_prev,5]*np.amax(c_tile[:c_box_size_prev,7:], axis=1))
    c_box = np.copy(c_tile[max_objct])
    c_tile[max_objct,5] = 0.0
    c_tile_kept[c_nb_box_final] = c_box
    c_nb_box_final += 1
    c_nb_box -= 1
    i = 0
    
    for i in range(0,c_box_size_prev):
      if(c_tile[i,5] < 0.00000001):
        continue
      IoU = fct_IoU(c_box[:4], c_tile[i,:4])
      c_score = c_tile[i,5]*np.max(c_tile[i,7:])
      
      if((IoU > 0.2 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score >= 0.9)
        or (IoU > 0.2 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score < 0.9 and c_score >= 0.1)
        or (IoU > 0.5 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score >= 0.9)
        or (IoU > 0.3 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score < 0.9 and c_score >= 0.1)
        or (IoU > -0.6 and c_score < 0.1)):
        c_tile[i] = 0.0
        c_nb_box -= 1
     
  return c_nb_box_final

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")
patch = np.zeros((image_size, image_size), dtype="float32")

final_boxes = []

#Choice of filters that produce visually appealing results (!= best mAP )
obj_threshold = 6*np.array([0.1,0.1,0.1,0.1,0.1])
class_soft_limit = np.array([0.7])

nms_threshold = 0.1
#Not used here, context dependant thresholds are defined in the NMS fct

for l in tqdm(range(0,nb_keep_val)):
	c_tile[:,:] = 0.0
	c_tile_kept[:,:] = 0.0

	c_pred = predict[l,:,:,:]
	c_nb_box = box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit)			

	c_nb_box_final = c_nb_box
	c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold)
	final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))


In [None]:
targets = np.zeros((nb_keep_val,1+max_nb_obj_per_image*7), dtype="float32")

class_count = np.zeros((nb_class))

for i in tqdm(range(0, nb_keep_val)):
	
	tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list[nb_test_2007 - nb_keep_val + i]+".xml")
	root = tree.getroot()
	
	x_offset, y_offset, width2, height2 = all_im_prop[orig_nb_images - nb_keep_val + i]
	
	k = 0
	obj_list = root.findall("object", namespaces=None)
	targets[i,0] = len(obj_list)
	for obj in obj_list:
		diff = obj.find("difficult", namespaces=None)
		if(diff.text == "1"):
			targets[i,0] -= 1
			continue
		oclass = obj.find("name", namespaces=None)
		bndbox = obj.find("bndbox", namespaces=None)

		int_class = np.where(class_list[:] == oclass.text)
		xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
		ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
		xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
		ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2

		targets[i,1+k*7:1+(k+1)*7] = np.array([int_class[0][0]+1, xmin,ymin,0.0,xmax,ymax,1.0])
		class_count[int_class[0][0]] += 1
		
		k += 1


In [None]:
id_start = 0

nb_w = 4
nb_h = 8

fig, ax = plt.subplots(nb_h, nb_w, figsize=(1.5*nb_w,1.5*nb_h), dpi=210, constrained_layout=True)

for i in range(0, nb_h):
	for j in range(0, nb_w):
		i_d = i*nb_w + j + id_start
		
		c_data = all_im[nb_train_2007 + nb_test_2007 + nb_train_2012 - nb_keep_val + i_d]/255.0
		ax[i,j].imshow(c_data)
		ax[i,j].axis('off')
		
		im_boxes = final_boxes[i_d]
		
		targ_boxes = targets[i_d]
		for k in range(0, int(targ_boxes[0])):
			xmin = (targ_boxes[1+k*7+1])
			ymin = (targ_boxes[1+k*7+2])
			xmax = (targ_boxes[1+k*7+4])
			ymax = (targ_boxes[1+k*7+5])
			p_c = int(targ_boxes[1+k*7+0]) - 1
			el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=1.0, ls="--", fill=False, color=plt.cm.tab20(p_c), zorder=3)
			c_patch = ax[i,j].add_patch(el)
			c_patch.set_path_effects([path_effects.Stroke(linewidth=2.0, foreground='black'),
												path_effects.Normal()])

		for k in range(0, np.shape(im_boxes)[0]):
			xmin = max(-0.5,(im_boxes[k,0]) - 0.5)
			ymin = max(-0.5,(im_boxes[k,1]) - 0.5)
			xmax = min(image_size-0.5,(im_boxes[k,2]) - 0.5)
			ymax = min(image_size-0.5,(im_boxes[k,3]) - 0.5)
			
			p_c = np.argmax(im_boxes[k,7:])
			
			el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=1.0, fill=False, color=plt.cm.tab20(p_c), zorder=3)
			c_patch = ax[i,j].add_patch(el)
			c_text = ax[i,j].text(xmin+5, ymin+18, "%s:%0.2f-%0.2f"%(class_list[p_c],im_boxes[k,5],np.max(im_boxes[k,7:])), c=plt.cm.tab20(p_c), fontsize=4,clip_on=True)
			c_patch.set_path_effects([path_effects.Stroke(linewidth=2.0, foreground='black'),
												path_effects.Normal()])
			c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
												path_effects.Normal()])

plt.show()


In [None]:
## A runtime restart is advised

#### mAP maximization

**Note:** Using the following selection, the pre-trained network reaches an mAP of 36% when applied to the complete 4952 test set. The achieved mAP is only 33% on the last 1000 test examples in the present case due to a selection effect.

In [None]:
%cd /content/IRMIA_2022/yolo_detector/

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image

import re
import bisect
import os
import sys
from numba import jit

class_list = np.array(["aeroplane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","diningtable","dog","horse", "motorbike",\
    "person","pottedplant","sheep","sofa","train","tvmonitor","background"])
class_list_short = np.array(["plane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","table","dog","horse", "m-bike",\
    "person","p-plant","sheep","sofa","train","tv","background"])

test_list = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = 11540 + 5011 + 4952
nb_keep_val = 1000

image_size = 288
nb_box = 5
nb_class = 20
nb_param = 0

max_nb_obj_per_image = 48

yolo_nb_reg = int(image_size/32)
c_size = 32

all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size, image_size, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

load_epoch = 0	
if(load_epoch == 0):
	files = os.listdir("fwd_res/")
	paths = [os.path.join("fwd_res/", basename) for basename in files]
	path = max(paths, key=os.path.getctime)
	r_load_epoch = [int(s) for s in re.split('[_s.]',path) if s.isdigit()]
	print (r_load_epoch)
	print("Epoch unspecified, loading most recent prediction : " + path)
	
	load_epoch = r_load_epoch[0]

prior_w = np.array([24.,48.,96.,144.,256.])
prior_h = np.array([24.,64.,176.,80.,192.])

pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%load_epoch, dtype="float32")
predict = np.reshape(pred_raw, (nb_keep_val, nb_box*(8+nb_param+nb_class),yolo_nb_reg,yolo_nb_reg))

targets = np.zeros((nb_keep_val,1+max_nb_obj_per_image*7), dtype="float32")

class_count = np.zeros((nb_class))

for i in tqdm(range(0, nb_keep_val)):
	i_d = nb_test_2007 - nb_keep_val + i
	
	tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list[i_d]+".xml")
	root = tree.getroot()
	
	x_offset, y_offset, width2, height2 = all_im_prop[nb_train_2012 + nb_train_2007 + i_d ]
	
	k = 0
	obj_list = root.findall("object", namespaces=None)
	targets[i,0] = len(obj_list)
	for obj in obj_list:
		diff = obj.find("difficult", namespaces=None)
		if(diff.text == "1"):
			targets[i,0] -= 1
			continue
		oclass = obj.find("name", namespaces=None)
		bndbox = obj.find("bndbox", namespaces=None)

		int_class = np.where(class_list[:] == oclass.text)
		xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
		ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
		xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
		ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2

		targets[i,1+k*7:1+(k+1)*7] = np.array([int_class[0][0]+1, xmin,ymin,0.0,xmax,ymax,1.0])
		class_count[int_class[0][0]] += 1
		
		k += 1

@jit(nopython=True, cache=True, fastmath=False)
def global_to_tile_coord(offset_tab, tile_coords, priors, c_size):
	bx = (offset_tab[0] + tile_coords[1])*c_size
	by = (offset_tab[1] + tile_coords[0])*c_size
	bw = priors[0]*np.exp(offset_tab[3])
	bh = priors[1]*np.exp(offset_tab[4])
	return float(bx), float(by), float(bw), float(bh)
 
@jit(nopython=True, cache=True, fastmath=False)
def box_extraction(c_pred, c_box, c_tile, prob_obj_cases, class_soft_limit):
  c_nb_box = 0
  for i in range(0,yolo_nb_reg):
    for j in range(0,yolo_nb_reg):
      for k in range(0,nb_box):
        offset = int(k*(8+nb_param+nb_class)) #no +1 for box prior in prediction
        c_box[4] = c_pred[offset+6,i,j]
        c_box[5] = c_pred[offset+7,i,j]
        p_c = np.max(c_pred[offset+8:offset+8+nb_class,i,j])
        cl = np.argmax(c_pred[offset+8:offset+8+nb_class,i,j]) 
        
        if(c_box[5]*p_c >= prob_obj_cases[k] and p_c > class_soft_limit[0]):
          bx, by, bw, bh = global_to_tile_coord(c_pred[offset:offset+6,i,j], \
                    np.array([i,j]), np.array([prior_w[k], prior_h[k]]), c_size)
          c_box[0] = max(0,bx - bw*0.5 - 1)
          c_box[1] = max(0,by - bh*0.5 - 1)
          c_box[2] = min(image_size,bx + bw*0.5 + 1)
          c_box[3] = min(image_size,by + bh*0.5 + 1)
          
          c_box[6] = k
          c_box[7:] = c_pred[offset+8:offset+8+nb_param+nb_class,i,j]
          c_tile[c_nb_box,:] = c_box[:]
          c_nb_box +=1

  return c_nb_box

@jit(nopython=True, cache=True, fastmath=False)
def fct_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d
	enclose_w = (max(box1[2], box2[2]) - min(box1[0], box2[0]))
	enclose_h = (max(box1[3], box2[3]) - min(box1[1],box2[1]))
	enclose_2d = enclose_w*enclose_h

	cx_a = (box1[2] + box1[0])*0.5; cx_b = (box2[2] + box2[0])*0.5
	cy_a = (box1[3] + box1[1])*0.5; cy_b = (box2[3] + box2[1])*0.5
	dist_cent = np.sqrt((cx_a - cx_b)*(cx_a - cx_b) + (cy_a - cy_b)*(cy_a - cy_b))
	diag_enclose = np.sqrt(enclose_w*enclose_w + enclose_h*enclose_h)

  # DIoU
	#return float(inter_2d)/float(uni_2d) - float(dist_cent)/float(diag_enclose)
  # GIoU
	return float(inter_2d)/float(uni_2d) - float(enclose_2d - uni_2d)/float(enclose_2d)
	
@jit(nopython=True, cache=True, fastmath=False)
def fct_classical_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0])*abs(box1[3] - box1[1]) + \
		abs(box2[2]-box2[0])*abs(box2[3] - box2[1]) - inter_2d

	return float(inter_2d)/float(uni_2d)


#@jit(nopython=True, cache=True, fastmath=False)
def apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold):
  c_nb_box_final = 0
  c_box_size_prev = c_nb_box

  while(c_nb_box > 0):
    max_objct = np.argmax(c_tile[:c_box_size_prev,5]*np.amax(c_tile[:c_box_size_prev,7:], axis=1))
    c_box = np.copy(c_tile[max_objct])
    c_tile[max_objct,5] = 0.0
    c_tile_kept[c_nb_box_final] = c_box
    c_nb_box_final += 1
    c_nb_box -= 1
    i = 0
    for i in range(0,c_box_size_prev):
      if(c_tile[i,5] < 0.00000001):
        continue
      IoU = fct_IoU(c_box[:4], c_tile[i,:4])
      c_score = c_tile[i,5]*np.max(c_tile[i,7:])
      
      if((IoU > 0.3 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score >= 0.9)
        or (IoU > 0.3 and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]) and c_score < 0.9 and c_score >= 0.1)
        or (IoU > 0.4 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score >= 0.8)
        or (IoU > 0.4 and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]) and c_score < 0.8 and c_score >= 0.1)
        or (IoU > -0.6 and c_score < 0.1)):
        c_tile[i] = 0.0
        c_nb_box -= 1
     
  return c_nb_box_final


c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_param+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_param+nb_class),dtype="float32")
patch = np.zeros((image_size, image_size), dtype="float32")

final_boxes = []

#Choice of filters that produce visually appealing results (!= best mAP )
obj_threshold = 0.5*np.array([0.1,0.1,0.1,0.1,0.1])
class_soft_limit = np.array([0.4])

nms_threshold = 0.1 
#Not used here, context dependant thresholds are defined in the NMS fct

for l in tqdm(range(0,nb_keep_val)):
	c_tile[:,:] = 0.0
	c_tile_kept[:,:] = 0.0

	c_pred = predict[l,:,:,:]
	c_nb_box = box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit)			

	c_nb_box_final = c_nb_box
	c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, nms_threshold)
	final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))
 
flat_pred_boxes = np.vstack(final_boxes)

AP_IoU_val = 0.5

recall_precision = np.empty((nb_keep_val), dtype="object")

print("Find associations ...", flush=True)

for i_d in tqdm(range(0, nb_keep_val)):
				
	recall_precision[i_d] = np.zeros((np.shape(final_boxes[i_d])[0], 6))
	
	if(np.shape(final_boxes[i_d])[0] == 0):
		continue
	
	recall_precision[i_d][:,0] = np.amax(final_boxes[i_d][:,7:], axis=1)
	recall_precision[i_d][:,1] = final_boxes[i_d][:,5]
	
	recall_precision[i_d][:,5] = np.argmax(final_boxes[i_d][:,7:], axis=1)
	
	kept_boxes = targets[i_d]
	
	IoU_table = np.zeros((int(kept_boxes[0]),np.shape(final_boxes[i_d])[0])) - 1.0
	
	for i in range(0,int(kept_boxes[0])):
		for j in range(0,np.shape(final_boxes[i_d])[0]):
			xmin = (kept_boxes[1+i*7+1])
			ymin = (kept_boxes[1+i*7+2])
			xmax = (kept_boxes[1+i*7+4])
			ymax = (kept_boxes[1+i*7+5])
			c_kept_box = np.array([xmin, ymin, xmax, ymax])
			IoU_table[i,j] = fct_classical_IoU(c_kept_box, final_boxes[i_d][j,:4])
			
	# Loop over the true boxes to find best prediction associated
	for i in range(0,int(kept_boxes[0])):
		best_match_id = np.unravel_index(np.argmax(IoU_table),np.shape(IoU_table))
		
		best_match_IoU = IoU_table[best_match_id]
		
		IoU_table[best_match_id[0],:] = -1.0
		
		if (best_match_IoU >= AP_IoU_val and np.argmax(final_boxes[i_d][best_match_id[1],7:]) == int(kept_boxes[1+best_match_id[0]*7+0]-1)):
		#if(c_IoU >= AP_IoU_val):
			recall_precision[i_d][best_match_id[1],2] = 1
			recall_precision[i_d][best_match_id[1],3] = best_match_id[1]
			recall_precision[i_d][best_match_id[1],4] = best_match_IoU
			IoU_table[:,best_match_id[1]] = -1.0
		

print("Process and flatten the mAP result")
flatten = np.vstack(recall_precision.flatten())

recall_precision_f = np.zeros((np.shape(flatten)[0], 10))
recall_precision_f[:,:6] = flatten[:,:]

recall_precision_fs = (recall_precision_f[(recall_precision_f[:,1]*recall_precision_f[:,0]).argsort()])[::-1]

recall_precision_fs[:,6] = np.cumsum(recall_precision_fs[:,2])
recall_precision_fs[:,7] = np.cumsum(1.0 - recall_precision_fs[:,2])
recall_precision_fs[:,8] = recall_precision_fs[:,6] / (recall_precision_fs[:,6]+recall_precision_fs[:,7])
recall_precision_fs[:,9] = recall_precision_fs[:,6] / np.sum(class_count)


interp_curve = np.zeros((np.shape(recall_precision_fs)[0],2))

interp_curve[:,0] = recall_precision_fs[:,9]
#Go in reverse to set the value for the all point interpolation
c_max_val = np.min(recall_precision_fs[:,8])
for i in range(0, np.shape(recall_precision_fs)[0]):
    i_d = np.shape(recall_precision_fs)[0] - i - 1
    if(recall_precision_fs[i_d,8] > c_max_val):
        c_max_val = recall_precision_fs[i_d,8]
    interp_curve[i_d,1] = c_max_val
    

AP_all = np.trapz(interp_curve[:,1], interp_curve[:,0])
print ("AP_all (%.2f): %f%%"%(AP_IoU_val, AP_all*100.0))

    
plt.figure(figsize=(4*1.0,3*1.0), dpi=200, constrained_layout=True)
plt.plot(recall_precision_fs[:,9], recall_precision_fs[:,8])
plt.plot(interp_curve[:,0], interp_curve[:,1], label="New")
plt.xlabel(r"Recall")
plt.ylabel(r"Precision")
plt.title("All classes as one AP curve", fontsize=8)

#print (class_count)
sumAP = 0
print ("**** Per class AP ****")
fig, ax = plt.subplots(figsize=(4*1.3,3*1.3), dpi=200, constrained_layout=True)
plt.xlabel(r"Recall")
plt.ylabel(r"Precision")
for k in range(0, nb_class):
	index = np.where(recall_precision_fs[:,5] == k)
	l_recall_precision_fs = recall_precision_fs[index[0]]
	l_recall_precision_fs[:,6] = np.cumsum(l_recall_precision_fs[:,2])
	l_recall_precision_fs[:,7] = np.cumsum(1.0 - l_recall_precision_fs[:,2])
	l_recall_precision_fs[:,8] = l_recall_precision_fs[:,6] / (l_recall_precision_fs[:,6]+l_recall_precision_fs[:,7])
	l_recall_precision_fs[:,9] = l_recall_precision_fs[:,6] / class_count[k]
	
	interp_curve = np.zeros((np.shape(index[0])[0],2))

	interp_curve[:,0] = l_recall_precision_fs[:,9]
	#Go in reverse to set the value for the all point interpolation
	c_max_val = np.min(l_recall_precision_fs[:,8])
	for i in range(0, np.shape(l_recall_precision_fs)[0]):
		i_d = np.shape(l_recall_precision_fs)[0] - i - 1
		if(l_recall_precision_fs[i_d,8] > c_max_val):
			c_max_val = l_recall_precision_fs[i_d,8]
		interp_curve[i_d,1] = c_max_val
	
	AP = np.trapz(interp_curve[:,1], interp_curve[:,0])
	sumAP += AP
	
	plt.plot(interp_curve[:,0], interp_curve[:,1], label=class_list_short[k],c=plt.cm.tab20(k))
	
	print("AP %-8s: %5.2f%%     Total: %4d - T: %4d - F: %4d"%(class_list_short[k], AP*100.0, class_count[k], l_recall_precision_fs[-1,6], l_recall_precision_fs[-1,7]))
plt.legend(bbox_to_anchor=(1.02,0.98), fontsize=8)
plt.title("Per class AP curve", fontsize=8)

print ("\n**** mAP (%.2f): %f%% ****"%(AP_IoU_val, sumAP/nb_class*100.0))

plt.show()
