# **Object Detection with YOLO**

DL IRMIA summer school / ESPCI 2023 Update

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Deyht/IRMIA_2022/blob/main/IRMIA_DL_Summer_school_2022_Object_Detection_with_YOLO_full_v2.ipynb)

## **Introduction - Notebook Setup**

**Important notes**:   
1) Due to RAM limits on the free Colab version, the notebook kernel might crash at some points if running it all at once or if re-running specific cells multiple times. A simple restart of the runtime kernel (Runtime -> Restart runtime) will solve the issue without losing the locally saved files (datasets, network saves, framework, etc.). Then simply re-run from the group of cells that crashed.

Each **independent** part of the notebook has been verified to run on the free version of Colab.

2) The Introduction part, which includes dataset download/formatting and the CIANNA framework installation, must be run every time the runtime is fully shut down and disconnected, as it is used in all parts A, B, and C.


---


**Link to the slides accompanying the notebook**  
https://github.com/Deyht/IRMIA_2022/blob/main/DL_obj_detetion_with_YOLO_slides_full_v3.pdf


<a name="repo_cloning"></a>
### **1\. Clone the associated Git repository**

In [None]:
%%shell

git clone https://github.com/Deyht/IRMIA_2022



<a name="data_download"></a>
### **2\. PASCAL VOC 2012 and 2007**



####  Dataset download


In [None]:
%%shell

cd IRMIA_2022/

mkdir datasets
cd datasets

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar

tar -xf VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_06-Nov-2007.tar
tar -xf VOCtest_06-Nov-2007.tar


<a name="data_format"></a>
#### Format dataset

In [None]:
%cd /content/IRMIA_2022/datasets/

import numpy as np
from tqdm import tqdm
from PIL import Image

def make_square(im, min_size, fill_color=(0, 0, 0, 0)):
    x, y = im.size
    size = max(min_size, x, y)
    new_im = Image.new('RGB', (size, size), fill_color)
    new_im.paste(im, (int((size - x) / 2), int((size - y) / 2)))
    return new_im

train_list_2012 = np.loadtxt("VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007  = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
nb_keep_val = 4952
image_size = 288
nb_class = 20

all_im = np.zeros((orig_nb_images, image_size, image_size, 3), dtype="uint8")
all_im_prop = np.zeros((orig_nb_images, 4), dtype="float32")

for i in tqdm(range(0, orig_nb_images)):

	if(i < nb_train_2012):
		im = Image.open("VOCdevkit/VOC2012/JPEGImages/"+train_list_2012[i]+".jpg")
	elif(i < nb_train_2012+nb_train_2007):
		im = Image.open("VOCdevkit/VOC2007/JPEGImages/"+train_list_2007[i - nb_train_2012]+".jpg")
	else:
		im = Image.open("VOCdevkit/VOC2007/JPEGImages/"+test_list_2007[i - nb_train_2012 - nb_train_2007]+".jpg")
	
	width, height = im.size

	im = make_square(im, image_size)
	width2, height2 = im.size

	x_offset = int((width2 - width)*0.5)
	y_offset = int((height2 - height)*0.5)

	all_im_prop[i] = [x_offset, y_offset, width2, height2]

	im = im.resize((image_size,image_size))
	im_array = np.asarray(im)
	for depth in range(0,3):
		all_im[i,:,:,depth] = im_array[:,:,depth]

all_im.tofile("all_im.dat")
all_im_prop.tofile("all_im_prop.dat")


#### Dataset summary statistics

In [None]:
%cd /content/IRMIA_2022/datasets/

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv"])

train_list_2012 = np.loadtxt("VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007  = np.loadtxt("VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
nb_keep_val = 4952
image_size = 288
nb_class = 20

object_list = np.zeros((orig_nb_images,1+nb_class))

for i in tqdm(range(0, orig_nb_images)):
	
  if(i < nb_train_2012):
    tree = ET.parse("VOCdevkit/VOC2012/Annotations/"+train_list_2012[i]+".xml")
  elif(i < nb_train_2012+nb_train_2007):
    tree = ET.parse("VOCdevkit/VOC2007/Annotations/"+train_list_2007[i - nb_train_2012]+".xml")
  else:
    tree = ET.parse("VOCdevkit/VOC2007/Annotations/"+test_list_2007[i - nb_train_2012 - nb_train_2007]+".xml")
  root = tree.getroot()

  root = tree.getroot()

  k = 0
  im_obj_list = root.findall("object", namespaces=None)
  object_list[i,0] = len(im_obj_list)
  for obj in im_obj_list:
    diff = obj.find("difficult", namespaces=None)
    if(diff.text == "1"):
      object_list[i,0] -= 1
      continue
    oclass = obj.find("name", namespaces=None)
    int_class = np.where(class_list[:] == oclass.text)[0] + 1
    object_list[i,int_class] += 1

plt.rcParams.update({'font.size': 6})

all_dat = np.sum(object_list[:,1:],axis=0)
train_dat = np.sum(object_list[:orig_nb_images-nb_keep_val:,1:],axis=0)
val_dat = np.sum(object_list[orig_nb_images-nb_keep_val:,1:],axis=0)

print("%8s"%("Total"),end="")
for k in range(0,nb_class):
  print("%8s"%class_list_short[k],end="")
print("")
print("%8d"%np.sum(all_dat),end="")
for k in range(0,nb_class):
  print("%8d"%all_dat[k], end="")
print("")
print("%8d"%np.sum(train_dat),end="")
for k in range(0,nb_class):
  print("%8d"%train_dat[k], end="")
print("")
print("%8d"%np.sum(val_dat),end="")
for k in range(0,nb_class):
  print("%8d"%val_dat[k], end="")
print("")
print("")

plt.subplots(figsize=(6,2),dpi=190, constrained_layout=True)
plt.bar(np.arange(0,nb_class)-0.2, all_dat, width=-0.2, align="center", label="All")
plt.bar(np.arange(0,nb_class), train_dat, width=0.2, align="center", label="Train")
plt.bar(np.arange(0,nb_class)+0.2, val_dat, width=0.2, align="center", label="Val")
plt.xticks(np.arange(0,nb_class), class_list, fontsize=6, rotation = 45)
plt.legend()
#plt.yscale('log')
plt.show()

all_dat = all_dat / np.max(all_dat)
train_dat = train_dat / np.max(train_dat)
val_dat = val_dat / np.max(val_dat)

plt.subplots(figsize=(6,2),dpi=190, constrained_layout=True)
plt.bar(np.arange(0,nb_class)-0.2, all_dat, width=0.2, align="center", label="All")
plt.bar(np.arange(0,nb_class), train_dat, width=0.2, align="center", label="Train")
plt.bar(np.arange(0,nb_class)+0.2, val_dat, width=0.2, align="center", label="Val")
plt.xticks(range(0,nb_class), class_list, fontsize=6, rotation = 45)
plt.legend()
#plt.yscale('log')
plt.show()


In [None]:

all_im = np.fromfile("all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size, image_size, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))


In [None]:
id_start = 0 #define the beginning of the serie, then display nb_w * nb_h examples

nb_w = 4
nb_h = 8

fig, ax = plt.subplots(figsize=(5.0,0.4), dpi=180, constrained_layout=True)
ax.axis('off')
fig.patch.set_facecolor('black')

for k in range(0, nb_class):
	ax.text(k%10*0.12, k//10*0.5, class_list_short[k], color=plt.cm.tab20(k), fontsize=8)

plt.show()
print("")

fig, ax = plt.subplots(nb_h, nb_w, figsize=(1.5*nb_w,1.5*nb_h), dpi=210, constrained_layout=True)

for i in range(0, nb_h):
  for j in range(0, nb_w):
    i_d = j + i*nb_w + id_start

    x_offset, y_offset, width2, height2 = all_im_prop[orig_nb_images - nb_keep_val + i_d]

    c_data = all_im[orig_nb_images - nb_keep_val + i_d]/255.0
    ax[i,j].imshow(c_data)
    ax[i,j].axis('off')

    tree = ET.parse("VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007 - nb_keep_val + i_d]+".xml")
    root = tree.getroot()
    
    obj_list = root.findall("object", namespaces=None)
    for obj in obj_list:
      diff = obj.find("difficult", namespaces=None)
      if(diff.text == "1"):
        continue
      oclass = obj.find("name", namespaces=None)
      bndbox = obj.find("bndbox", namespaces=None)

      int_class = np.where(class_list[:] == oclass.text)[0][0]
      xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size/width2
      ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size/height2
      xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size/width2
      ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size/height2

      el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.8, ls="--", fill=False, color=plt.cm.tab20(int_class), zorder=3)
      c_patch = ax[i,j].add_patch(el)
      c_text = ax[i,j].text(xmin+4, ymin+15, "%s"%(class_list_short[int_class]), c=plt.cm.tab20(int_class), fontsize=6, clip_on=True)
      c_patch.set_path_effects([path_effects.Stroke(linewidth=2.0, foreground='black'),
                       path_effects.Normal()])
      c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                       path_effects.Normal()])

#plt.savefig("target_moisaic.png", dpi=250)
plt.show()

In [None]:
#Free the RAM before going further in the notebook
#A RUNTIME RESTART IS ADVISED

del (all_im, all_im_prop)

<a name="cianna_install"></a>

### **3\. DL Framework (CIANNA) installation**

#### Query GPU allocation and properties


In [None]:
%%shell

nvidia-smi

cd /content/

git clone https://github.com/NVIDIA/cuda-samples/

cd /content/cuda-samples/Samples/1_Utilities/deviceQuery/

make SMS="50 60 70 80"

./deviceQuery | grep Capability | cut -c50- > ~/cuda_infos.txt
./deviceQuery | grep "CUDA Driver Version / Runtime Version" | cut -c57- >> ~/cuda_infos.txt

cd ~/

#### Clone CIANNA git repository

Choice of a specific commit to preserve the notebook from incompatibilty in futur CIANNA updates.

In [None]:
%%shell

cd /content/IRMIA_2022/

git clone https://github.com/Deyht/CIANNA

cd CIANNA
git checkout 484354c

#### Compiling CIANNA for the allocated GPU generation

There is no guaranteed forward or backward compatibility between Nvidia GPU generation, and some capabilities are generation specific. For these reasons, CIANNA must be provided the platform GPU generation at compile time.
The following cell will automatically update all the necessary files based on the detected GPU, and compile CIANNA.

In [None]:
%%shell

cd /content/IRMIA_2022/CIANNA

mult="10"
cat ~/cuda_infos.txt
comp_cap="$(sed '1!d' ~/cuda_infos.txt)"
cuda_vers="$(sed '2!d' ~/cuda_infos.txt)"

lim="11.1"
old_arg=$(awk '{if ($1 < $2) print "-D CUDA_OLD";}' <<<"${cuda_vers} ${lim}")

sm_val=$(awk '{print $1*$2}' <<<"${mult} ${comp_cap}")

gen_val=$(awk '{if ($1 >= 80) print "-D GEN_AMPERE"; else if($1 >= 70) print "-D GEN_VOLTA";}' <<<"${sm_val}")

sed -i "s/.*arch=sm.*/\\t\tcuda_arg=\"\$cuda_arg -D CUDA -D comp_CUDA -lcublas -lcudart -arch=sm_$sm_val $old_arg $gen_val\"/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" src/python_module_setup.py

pyth_ver=$(python3 -c 'import sys; print("%d.%d"%(sys.version_info[:][0], sys.version_info[:][1]))')

sed -i "s/\/lib.linux-x86_64-[0-9].[0-9]/\/lib.linux-x86_64-$pyth_ver/g" ex_script.py

./compile.cp CUDA PY_INTERF

mv src/build/lib.linux-x86_64-* src/build/lib.linux-x86_64

#### Testing CIANNA installation

**IMPORTANT NOTE**   
CIANNA is mainly used in a script fashion and was not designed to run in notebooks. Every cell code that directly invokes CIANNA functions must be run as a script to avoid possible errors.  
To do so, the cell must have the following structure.

```
%%shell

cd /content/CIANNA

python3 - <<EOF

[... your python code ...]

EOF
```

This syntax allows one to easily edit python code in the notebook while running the cell as a script. Note that all the notebook variables can not be accessed by the cell in this context.


In [None]:
%%shell

cd /content/IRMIA_2022/CIANNA

tar -xvzf mnist.tar.gz

In [None]:
%%shell


#Strictly equivalent to ex_script.py in the CIANNA repo 

cd /content/IRMIA_2022/CIANNA

python3 - <<EOF


import numpy as np
import matplotlib.pyplot as plt
#Uncomment to access a locally compiled version

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn

############################################################################
##              Data reading (your mileage may vary)
############################################################################

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

print ("Reading inputs ... ", end = "", flush=True)

#Loading binary files
data = np.fromfile("mnist_dat/mnist_input.dat", dtype="float32")
data = np.reshape(data, (80000,28*28))
target = np.fromfile("mnist_dat/mnist_target.dat", dtype="float32")
target = np.reshape(target, (80000,10))


data_train = data[:60000,:]
data_valid = data[60000:70000,:]
data_test  = data[70000:80000,:]

target_train = target[:60000,:]
target_valid = target[60000:70000,:]
target_test  = target[70000:80000,:]

print ("Done !", flush=True)

############################################################################
##               CIANNA network construction and use
############################################################################

#Details about the functions and parameters are given in the GitHub Wiki

cnn.init(in_dim=i_ar([28,28]), in_nb_ch=1, out_dim=10, \
		bias=0.1, b_size=24, comp_meth="C_CUDA", dynamic_load=1, mixed_precision="FP32C_FP32A") #Change to C_BLAS or C_NAIV


cnn.create_dataset("TRAIN", size=60000, input=data_train, target=target_train)
cnn.create_dataset("VALID", size=10000, input=data_valid, target=target_valid)
cnn.create_dataset("TEST", size=10000, input=data_test, target=target_test)

#Used to load a saved network at a given epoch
#With load_step = 0, the network is trained from scratch
load_step = 0
if(load_step > 0):
	cnn.load("net_save/net0_s%04d.dat"%(load_step), load_step)
else:
  cnn.conv(f_size=i_ar([5,5]), nb_filters=32, padding=i_ar([2,2]), activation="RELU")
  cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
  cnn.conv(f_size=i_ar([5,5]), nb_filters=64, padding=i_ar([2,2]), activation="RELU")
  cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
  cnn.dense(nb_neurons=256, activation="RELU", drop_rate=0.5)
  cnn.dense(nb_neurons=128, activation="RELU", drop_rate=0.2)
  cnn.dense(nb_neurons=10, activation="SMAX")

cnn.train(nb_epoch=10, learning_rate=0.0004, momentum=0.9, confmat=1, save_every=0)
#Change save_every in previous function to save network weights
cnn.perf_eval()


#Uncomment to save network prediction
cnn.forward(repeat=1, drop_mode="AVG_MODEL")

del (data_train, target_train, data_valid, target_valid, data_test, target_test)


EOF



---



## **A - Simple classifier on PASCAL VOC**

### **1\. Train and valid data generation**

In [None]:
%%shell

cd /content/IRMIA_2022/
mkdir classifier
cd classifier

#### Dynamic data generator

In [None]:
%%writefile /content/IRMIA_2022/classifier/data_gen.py

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
import xml.etree.ElementTree as ET
from tqdm import tqdm
import os

import albumentations as A
import cv2

class_list = np.array(["aeroplane", "bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse", "motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor"])
class_list_short = np.array(["plane", "bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv"])

train_list_2012 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007	= np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

def roll_zeropad(a, shift):
	a = np.roll(a, shift[0], axis = 1)
	if(shift[0] >= 0):
		a[:,0:shift[0]] = 0
	else:
		a[:,image_size_orig+shift[0]:] = 0
	a = np.roll(a, shift[1], axis = 0)
	if(shift[1] >= 0):
		a[0:shift[1],:] = 0
	else:
		a[image_size_orig+shift[1]:,:] = 0
	return a


def init_data_gen():
	global nb_train_2012, nb_train_2007, nb_test_2007, orig_nb_images, nb_class
	global nb_images_per_batch, nb_keep_val, nb_obj_val, image_size, image_size_orig
	global input_data, targets, input_val, targets_val, all_im, all_im_prop

	nb_train_2012 = 11540
	nb_train_2007 = 5011
	nb_test_2007 = 4952
	orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
	nb_keep_val = 4952 #keep in 2007 test
	nb_images_per_batch = 2000
	nb_obj_val = 5000 # max 9986

	nb_class = 20
	image_size_orig = 288
	image_size = 128

	all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
	all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
	all_im = np.reshape(all_im, ((orig_nb_images, image_size_orig, image_size_orig, 3)))
	all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

	input_data = np.zeros((nb_images_per_batch,image_size*image_size*3), dtype="float32")
	targets = np.zeros((nb_images_per_batch,nb_class), dtype="float32")

	input_val = np.zeros((nb_obj_val,image_size*image_size*3), dtype="float32")
	targets_val = np.zeros((nb_obj_val,nb_class), dtype="float32")


def create_train_batch(visual_w=0,visual_h=0):
	visual_iter = 0
	for i in range(0, nb_images_per_batch):
		
		i_d = np.random.randint(0,orig_nb_images - nb_keep_val)
		if(i_d < nb_train_2012):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/Annotations/"+train_list_2012[i_d]+".xml")
		elif(i_d < nb_train_2012+nb_train_2007):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+train_list_2007[i_d - nb_train_2012]+".xml")
		else:
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[i_d - nb_train_2012 - nb_train_2007]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[i_d])
		x_offset, y_offset, width2, height2 = all_im_prop[i_d]

		im_obj_list = root.findall("object", namespaces=None)
		k = 0
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			else:
				bndbox = obj.find("bndbox", namespaces=None)
				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				
				width = (xmax-xmin); height = (ymax-ymin)
				if(width*height < 1024):
					continue
				k += 1
				
		nb_obj = k
		if(nb_obj == 0):
			i -= 1
			continue
		
		obj_id = np.random.randint(0,nb_obj)
		k = 0
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			else:
				bndbox = obj.find("bndbox", namespaces=None)
				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				
				width = (xmax-xmin); height = (ymax-ymin)
				if(width*height < 1024):
					continue
			if(obj_id == k):
				break
			else:
				k += 1
		
		oclass = obj.find("name", namespaces=None)
		int_class = int(np.where(class_list[:] == oclass.text)[0])
		l_targ = np.zeros(nb_class)
		l_targ[int_class] = 1
		targets[i,:] = np.copy(l_targ)
		
		max_size = max((xmax-xmin),(ymax-ymin))
		c_x = (xmin+xmax)/2.0; c_y = (ymin+ymax)/2.0
		xmin = max(0,int(c_x - 0.5*max_size)); xmax = min(image_size_orig,int(c_x + 0.5*max_size))
		ymin = max(0,int(c_y - 0.5*max_size)); ymax = min(image_size_orig,int(c_y + 0.5*max_size))
		
		transform1 = A.Compose([
			A.Crop(x_min=xmin, y_min=ymin, x_max=xmax, y_max=ymax, p=1.0),
			A.Resize(width=image_size,height=image_size, interpolation=1, p=1.0),
			A.Affine(scale=(0.9,1.1), translate_percent=(-0.1,0.1), rotate=(-5,5), interpolation=1, p=1.0),
			A.HorizontalFlip(p=0.5),
			A.ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1, p=1.0),
		])

		transformed = transform1(image=patch)
		patch_aug = transformed['image']

		if(visual_w*visual_h > 0):
			if(visual_iter == 0):
				fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
			
			c_x = visual_iter // visual_w
			c_y = visual_iter % visual_w
			
			ax[c_x,c_y].imshow(patch_aug)
			ax[c_x,c_y].axis('off')
			c_text = ax[c_x,c_y].text(image_size/2, image_size/8, "%s"%(class_list_short[int_class]),
				ha="center", fontsize=10, clip_on=True, color="white")
			c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                       path_effects.Normal()])
			
			visual_iter += 1
			if(visual_iter >= visual_w*visual_h):
				plt.show()
				return
		
		for depth in range(0,3):
			input_data[i,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
		
	return input_data, targets


def create_val_batch(visual_w=0, visual_h=0):
	visual_iter = 0

	k = 0
	val_nb_count = 0
	for i in range(0, nb_keep_val):
				
		tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007 - nb_keep_val + i]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i])
		x_offset, y_offset, width2, height2 = all_im_prop[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i]

		im_obj_list = root.findall("object", namespaces=None)
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			else:
				bndbox = obj.find("bndbox", namespaces=None)
				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				
				width = (xmax-xmin); height = (ymax-ymin)
				if(width*height < 1024):
					continue
				val_nb_count += 1
				if(val_nb_count >= nb_obj_val):
					return input_val, targets_val

			oclass = obj.find("name", namespaces=None)
			int_class = int(np.where(class_list[:] == oclass.text)[0])
			l_targ = np.zeros(nb_class)
			l_targ[int_class] = 1
			targets_val[k,:] = np.copy(l_targ)

			max_size = max((xmax-xmin),(ymax-ymin))
			c_x = (xmin+xmax)/2.0; c_y = (ymin+ymax)/2.0
			xmin = max(0,int(c_x - 0.5*max_size)); xmax = min(image_size_orig,int(c_x + 0.5*max_size))
			ymin = max(0,int(c_y - 0.5*max_size)); ymax = min(image_size_orig,int(c_y + 0.5*max_size))
		
			transform2 = A.Compose([
				A.Crop(x_min=xmin, y_min=ymin, x_max=xmax, y_max=ymax, p=1.0),
				A.Resize(width=image_size,height=image_size, interpolation=1, p=1.0),
			])

			transformed = transform2(image=patch)
			patch_aug = transformed['image']

			if(visual_w*visual_h > 0):
				if(visual_iter == 0):
					fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
				
				c_x = visual_iter // visual_w
				c_y = visual_iter % visual_w
				
				ax[c_x,c_y].imshow(patch_aug)
				ax[c_x,c_y].axis('off')
				c_text = ax[c_x,c_y].text(image_size/2, image_size/8, "%s"%(class_list_short[int_class]),
					ha="center", fontsize=10, clip_on=True, color="white")
				c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                       path_effects.Normal()])
				
				visual_iter += 1
				if(visual_iter >= visual_w*visual_h):
					plt.show()
					return
		
			for depth in range(0,3):
				input_val[k,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
			k+=1
			#print (k)
	print(val_nb_count)

	return input_val, targets_val

def free_data_gen():
  global all_im, all_im_prop, input_data, targets, input_val, targets_val
  del (all_im, all_im_prop, input_data, targets, input_val, targets_val)
  return


#### Training image examples

In [None]:
%%writefile /content/IRMIA_2022/classifier/test_gen.py

import data_gen as gn1

gn1.init_data_gen()

print("Random augmented training examples")
gn1.create_train_batch(4,3)

print("\nOrdered validation examples")
gn1.create_val_batch(4,3)

gn1.free_data_gen()


In [None]:
# Might need to reload the notebook execution environment to unload previous data_gen afters changes
%cd /content/IRMIA_2022/classifier/

%run test_gen.py


### **2\.Training the classifier**


Get the pretrained networks

In [None]:
%%shell

cd /content/IRMIA_2022/classifier/

wget https://share.obspm.fr/s/Pk529j42XRsonky/download/classifier_imagenet_pretrained_224_acc57.dat
wget https://share.obspm.fr/s/oPDe2rMWDTbiMwo/download/classifier_trained_acc84.dat

In [None]:
%%shell

cd /content/IRMIA_2022/classifier/

python3 - <<EOF

import numpy as np
from threading import Thread
import data_gen as gn1

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn


def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

def data_augm():
	input_data, targets = gn1.create_train_batch()
	cnn.delete_dataset("TRAIN_buf", silent=1)
	cnn.create_dataset("TRAIN_buf", nb_images_per_batch, input_data[:,:], targets[:,:], silent=1)
	return

nb_images_per_batch = 2000
nb_obj_val = 5000 # max 9986
nb_class = 20
image_size = 128

nb_augm = 1000
epoch_per_augm = 1

# -2 will load the provided imagenet pre-trained network.
# -1 will load a network trained for 300 epochs based on the imagenet pre-network 
# Switch to 0 for training from scratch, 
# or to the value corresponding to an existing network save.
load_epoch = -2
# Increase the number of augmentation for training t
# to continue training of the pre trained network

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=nb_class,
	 b_size=32, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP16C_FP32A")

print ("Loading the dataset ...")

gn1.init_data_gen()

input_data, targets = gn1.create_train_batch()
input_val, targets_val = gn1.create_val_batch()

cnn.create_dataset("TRAIN", nb_images_per_batch, input_data[:,:], targets[:,:])
cnn.create_dataset("VALID", nb_obj_val, input_val[:,:], targets_val[:,:])

if(load_epoch == -2):
	nb_augm = 300
	cnn.load("/content/IRMIA_2022/classifier/classifier_imagenet_pretrained_224_acc57.dat",0, nb_layers=20, bin=1)
	cnn.conv(f_size=i_ar([1,1]), nb_filters=nb_class, padding=i_ar([0,0]), activation="LIN")
	cnn.pool(p_size=i_ar([1,1]), p_global=1, p_type="AVG", activation="SMAX")
elif(load_epoch == -1):
	nb_augm = 100
	cnn.load("/content/IRMIA_2022/classifier/classifier_trained_acc84.dat",0, bin=1)
elif(load_epoch > 0):
	cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:
	cnn.conv(f_size=i_ar([3,3]), nb_filters=16  , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=4)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=32  , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=4)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=64 , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=8)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=128 , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=8)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=256 , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=16)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=512 , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=16)
	cnn.conv(f_size=i_ar([3,3]), nb_filters=1024, padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=32)
	cnn.conv(f_size=i_ar([3,3]), nb_filters=1024, padding=i_ar([1,1]), activation="RELU")
	cnn.conv(f_size=i_ar([1,1]), nb_filters=nb_class, padding=i_ar([0,0]), activation="LIN")
	cnn.pool(p_size=i_ar([1,1]), p_global=1, p_type="AVG", activation="SMAX")


for batch_augm in range(0,nb_augm):
		
	t = Thread(target=data_augm)
	t.start()
	
	cnn.train(nb_epoch=epoch_per_augm, learning_rate=0.00005, end_learning_rate=0.000001, 
				lr_decay=0.0015, momentum=0.8, shuffle_every=0, confmat=1, weight_decay=0.015,
				control_interv=5, save_every=100, TC_scale_factor=16.0, save_bin=1)
	if(batch_augm == 0):
		cnn.perf_eval()

	t.join()
	
	cnn.swap_data_buffers("TRAIN")


gn1.free_data_gen()
del (input_data, targets, input_val, targets_val)

EOF



---



## **B - Sliding window detector**

### **1\. Train and valid data generation**


In [None]:
%%shell

cd /content/IRMIA_2022/
mkdir sliding_window
cd sliding_window

#### Adding a "background class" to the data generator

In [None]:
%%writefile /content/IRMIA_2022/sliding_window/data_gen.py

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
import xml.etree.ElementTree as ET
from tqdm import tqdm
import os

import albumentations as A
import cv2

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

train_list_2012 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/ImageSets/Main/trainval.txt", dtype="str")
train_list_2007 = np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt", dtype="str")
test_list_2007	= np.loadtxt("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")


def fct_inter(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]))
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]))
	inter_2d = inter_w*inter_h

	return float(inter_2d)


def init_data_gen():
	global nb_train_2012, nb_train_2007, nb_test_2007, orig_nb_images, nb_class
	global nb_images_per_batch, nb_keep_val, nb_empty_val, nb_obj_val, image_size, image_size_orig
	global input_data, targets, input_val, targets_val, all_im, all_im_prop

	nb_train_2012 = 11540
	nb_train_2007 = 5011
	nb_test_2007 = 4952
	orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
	nb_keep_val = 4952 #keep in 2007 test
	nb_images_per_batch = 2000
	nb_obj_val = 5000 #max 9986
	nb_empty_val = 1000

	nb_class = 21
	image_size_orig = 288
	image_size = 128

	all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
	all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
	all_im = np.reshape(all_im, ((orig_nb_images, image_size_orig, image_size_orig, 3)))
	all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

	input_data = np.zeros((nb_images_per_batch,image_size*image_size*3), dtype="float32")
	targets = np.zeros((nb_images_per_batch,nb_class), dtype="float32")

	input_val = np.zeros((nb_obj_val+nb_empty_val,image_size*image_size*3), dtype="float32")
	targets_val = np.zeros((nb_obj_val+nb_empty_val,nb_class), dtype="float32")


def create_train_batch(visual_w=0,visual_h=0):
	visual_iter = 0
	for i in range(0, nb_images_per_batch):
		
		i_d = np.random.randint(0,orig_nb_images - nb_keep_val)
		if(i_d < nb_train_2012):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2012/Annotations/"+train_list_2012[i_d]+".xml")
		elif(i_d < nb_train_2012+nb_train_2007):
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+train_list_2007[i_d - nb_train_2012]+".xml")
		else:
			tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[i_d - nb_train_2012 - nb_train_2007]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[i_d])
		x_offset, y_offset, width2, height2 = all_im_prop[i_d]

		# classical object cutout
		if(np.random.random() > 0.2):
			im_obj_list = root.findall("object", namespaces=None)
			k = 0
			for obj in im_obj_list:
				diff = obj.find("difficult", namespaces=None)
				if(diff.text == "1"):
					continue
				else:
					bndbox = obj.find("bndbox", namespaces=None)
					xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
					ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
					xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
					ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
					
					width = (xmax-xmin); height = (ymax-ymin)
					if(width*height < 1024):
						continue
					
					k += 1
					
			nb_obj = k
			if(nb_obj == 0):
				i -= 1
				continue
			
			obj_id = np.random.randint(0,nb_obj)
			k = 0
			for obj in im_obj_list:
				diff = obj.find("difficult", namespaces=None)
				if(diff.text == "1"):
					continue
				else:
					bndbox = obj.find("bndbox", namespaces=None)
					xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
					ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
					xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
					ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
					
					width = (xmax-xmin); height = (ymax-ymin)
					if(width*height < 1024):
						continue
				if(obj_id == k):
					break
				else:
					k += 1
			
			oclass = obj.find("name", namespaces=None)
			int_class = int(np.where(class_list[:] == oclass.text)[0])
			l_targ = np.zeros(nb_class)
			l_targ[int_class] = 1
			targets[i,:] = np.copy(l_targ)
			
			max_size = max((xmax-xmin),(ymax-ymin))
			c_x = (xmin+xmax)/2.0; c_y = (ymin+ymax)/2.0
			xmin = max(0,int(c_x - 0.5*max_size)); xmax = min(image_size_orig,int(c_x + 0.5*max_size))
			ymin = max(0,int(c_y - 0.5*max_size)); ymax = min(image_size_orig,int(c_y + 0.5*max_size))
		
			transform1 = A.Compose([
				A.Crop(x_min=xmin, y_min=ymin, x_max=xmax, y_max=ymax, p=1.0),
				A.Resize(width=image_size,height=image_size, interpolation=1, p=1.0),
				A.Affine(scale=(0.9,1.1), translate_percent=(-0.1,0.1), rotate=(-5,5), interpolation=1, p=1.0),
				A.HorizontalFlip(p=0.5),
				A.ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1, p=1.0),
			])

			transformed = transform1(image=patch)
			patch_aug = transformed['image']

		else:
			found = 0
			l_size = 160
			try_per_size = 10

			int_class = 20
			l_targ = np.zeros(nb_class)
			l_targ[nb_class-1] = 1
			targets[i,:] = np.copy(l_targ)

			im_obj_list = root.findall("object", namespaces=None)
			box_list = np.zeros((len(im_obj_list),4))
			k = 0
			for obj in im_obj_list:
				bndbox = obj.find("bndbox", namespaces=None)

				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				box_list[k,:] = np.array([xmin,ymin,xmax,ymax])
				k += 1

			count_per_size = 0
			while((not found) and (l_size >= 0)):
				size = l_size + 32

				c_x = np.random.random()*(image_size_orig - size) + size/2
				c_y = np.random.random()*(image_size_orig - size) + size/2

				xmin = int(c_x - 0.5*size); xmax = int(c_x + 0.5*size)
				ymin = int(c_y - 0.5*size); ymax = int(c_y + 0.5*size)

				c_box = np.array([xmin, ymin, xmax, ymax])

				im_obj_list = root.findall("object", namespaces=None)
				inter_count = 0
				for l in range(0,len(im_obj_list)):
					loc_inter = fct_inter(c_box, box_list[l,:])
					if(loc_inter > 0.0):
						inter_count += 1

				if(inter_count == 0):
					found = 1

				count_per_size += 1
				if(count_per_size >= try_per_size):
					count_per_size = 0
					l_size -= 32

			if(not found):
				patch_aug = np.zeros((image_size,image_size,3),dtype="uint8")

			else:
				transform1 = A.Compose([
					A.Crop(x_min=xmin, y_min=ymin, x_max=xmax, y_max=ymax, p=1.0),
					A.Resize(width=image_size,height=image_size, interpolation=1, p=1.0),
					A.Affine(scale=(0.9,1.1), translate_percent=(-0.1,0.1), rotate=(-5,5), interpolation=1, p=1.0),
					A.HorizontalFlip(p=0.5),
					A.ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1, p=1.0),
				])

				transformed = transform1(image=patch)
				patch_aug = transformed['image']
		
		if(visual_w*visual_h > 0):
			if(visual_iter == 0):
				fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
			
			c_x = visual_iter // visual_w
			c_y = visual_iter % visual_w
			
			ax[c_x,c_y].imshow(patch_aug)
			ax[c_x,c_y].axis('off')
			c_text = ax[c_x,c_y].text(image_size/2, image_size/8, class_list_short[int_class],
				ha="center", fontsize=10, clip_on=True, color="white")
			c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
											 path_effects.Normal()])
			
			visual_iter += 1
			if(visual_iter >= visual_w*visual_h):
				plt.show()
				return
		
		for depth in range(0,3):
			input_data[i,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
		
	return input_data, targets


def create_val_batch(visual_w=0, visual_h=0):
	visual_iter = 0

	loc = 0
	val_nb_count = 0
	for i in range(0, nb_keep_val):
				
		tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007 - nb_keep_val + i]+".xml")
		root = tree.getroot()
		
		patch = np.copy(all_im[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i])
		x_offset, y_offset, width2, height2 = all_im_prop[nb_train_2007 + nb_train_2012 + nb_test_2007 - nb_keep_val + i]

		im_obj_list = root.findall("object", namespaces=None)
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			else:
				bndbox = obj.find("bndbox", namespaces=None)
				xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
				ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
				xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
				ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
				
				width = (xmax-xmin); height = (ymax-ymin)
				if(width*height < 1024):
					continue
				val_nb_count += 1
				if(val_nb_count > nb_obj_val):
					break;
			
			oclass = obj.find("name", namespaces=None)
			int_class = int(np.where(class_list[:] == oclass.text)[0])
			l_targ = np.zeros(nb_class)
			l_targ[int_class] = 1
			targets_val[loc,:] = np.copy(l_targ)

			max_size = max((xmax-xmin),(ymax-ymin))
			c_x = (xmin+xmax)/2.0; c_y = (ymin+ymax)/2.0
			xmin = max(0,int(c_x - 0.5*max_size)); xmax = min(image_size_orig,int(c_x + 0.5*max_size))
			ymin = max(0,int(c_y - 0.5*max_size)); ymax = min(image_size_orig,int(c_y + 0.5*max_size))
		
			transform2 = A.Compose([
				A.Crop(x_min=xmin, y_min=ymin, x_max=xmax, y_max=ymax, p=1.0),
				A.Resize(width=image_size,height=image_size, interpolation=1, p=1.0),
			])

			transformed = transform2(image=patch)
			patch_aug = transformed['image']
		
			if(visual_w*visual_h > 0):
				if(visual_iter == 0):
					fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)
				
				c_x = visual_iter // visual_w
				c_y = visual_iter % visual_w
				
				ax[c_x,c_y].imshow(patch_aug)
				ax[c_x,c_y].axis('off')
				c_text = ax[c_x,c_y].text(image_size/2, image_size/8, class_list_short[int_class],
					ha="center", fontsize=10, clip_on=True, color="white")
				c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
											 path_effects.Normal()])
				
				visual_iter += 1
				if(visual_iter >= visual_w*visual_h):
					plt.show()
					return
		
			for depth in range(0,3):
				input_val[loc,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
			loc+=1
	print (loc)
	
	for i in range(0, nb_empty_val):

		i_d = np.random.randint(0,nb_keep_val)

		patch = np.copy(all_im[orig_nb_images-nb_keep_val + i_d])

		x_offset, y_offset, width2, height2 = all_im_prop[orig_nb_images - nb_keep_val + i_d]

		tree = ET.parse("/content/IRMIA_2022/datasets/VOCdevkit/VOC2007/Annotations/"+test_list_2007[nb_test_2007 - nb_keep_val + i_d]+".xml")
		root = tree.getroot()

		found = 0
		l_size = 160
		try_per_size = 10

		int_class = 20
		l_targ = np.zeros(nb_class)
		l_targ[nb_class-1] = 1
		targets_val[loc+i,:] = np.copy(l_targ)

		im_obj_list = root.findall("object", namespaces=None)
		box_list = np.zeros((len(im_obj_list),4))
		k = 0
		for obj in im_obj_list:
			diff = obj.find("difficult", namespaces=None)
			if(diff.text == "1"):
				continue
			
			bndbox = obj.find("bndbox", namespaces=None)
			
			xmin = int(float(bndbox.find("xmin").text)+x_offset)*image_size_orig/width2
			ymin = int(float(bndbox.find("ymin").text)+y_offset)*image_size_orig/height2
			xmax = int(float(bndbox.find("xmax").text)+x_offset)*image_size_orig/width2
			ymax = int(float(bndbox.find("ymax").text)+y_offset)*image_size_orig/height2
			box_list[k,:] = np.array([xmin,ymin,xmax,ymax])
			k += 1

		count_per_size = 0
		while((not found) and (l_size >= 0)):
			size = l_size + 32
			
			c_x = np.random.random()*(image_size_orig - size) + size/2
			c_y = np.random.random()*(image_size_orig - size) + size/2
			
			xmin = int(c_x - 0.5*size); xmax = int(c_x + 0.5*size)
			ymin = int(c_y - 0.5*size); ymax = int(c_y + 0.5*size)
			
			c_box = np.array([xmin, ymin, xmax, ymax])
			
			im_obj_list = root.findall("object", namespaces=None)
			inter_count = 0
			for l in range(0,len(im_obj_list)):
				loc_inter = fct_inter(c_box, box_list[l,:])
				if(loc_inter > 0.0):
					inter_count += 1
			
			if(inter_count == 0):
				found = 1
			
			count_per_size += 1
			if(count_per_size >= try_per_size):
				count_per_size = 0
				l_size -= 32

		if(not found):
			patch_aug = np.zeros((image_size,image_size,3), dtype="float32")
			
		else:
			
			transform2 = A.Compose([
				A.Crop(x_min=xmin, y_min=ymin, x_max=xmax, y_max=ymax, p=1.0),
				A.Resize(width=image_size,height=image_size, interpolation=1, p=1.0),
			])

			transformed = transform2(image=patch)
			patch_aug = transformed['image']
			
		
		for depth in range(0,3):
			input_val[loc+i,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
		
	return input_val, targets_val


def free_data_gen():
  global all_im, all_im_prop, input_data, targets, input_val, targets_val
  del (all_im, all_im_prop, input_data, targets, input_val, targets_val)
  return


#### Training image examples

In [None]:
%%writefile /content/IRMIA_2022/sliding_window/test_gen.py

import data_gen as gn2

gn2.init_data_gen()

print("Random augmented training examples")
gn2.create_train_batch(4,3)

print("\nOrdered validation examples")
gn2.create_val_batch(4,3)

gn2.free_data_gen()


In [None]:
# Might need to reload the notebook execution environment to unload previous data_gen afters changes
%cd /content/IRMIA_2022/sliding_window/

%run test_gen.py

### **2\. Training the detection classifier**

Get the trained sliding window detector

In [None]:
%%shell

cd /content/IRMIA_2022/sliding_window/

https://share.obspm.fr/s/t6Kxs3bGb9ZJqH8/download/sliding_window_trained_acc84.dat

In [None]:
%%shell

cd /content/IRMIA_2022/sliding_window/

python3 - <<EOF

import numpy as np
from threading import Thread
import data_gen as gn2

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn


def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

def data_augm():
	input_data, targets = gn2.create_train_batch()
	cnn.delete_dataset("TRAIN_buf", silent=1)
	cnn.create_dataset("TRAIN_buf", nb_images_per_batch, input_data[:,:], targets[:,:], silent=1)
	return

nb_images_per_batch = 2000
nb_obj_val = 5000+1000 # max 9986
nb_class = 21
image_size = 128

nb_augm = 1000
epoch_per_augm = 1

# -2 will load the provided imagenet pre-trained network.
# -1 will load a network trained for 300 epochs based on the imagenet pre-network 
# Switch to 0 for training from scratch, 
# or to the value corresponding to an existing network save.
load_epoch = -2
# Increase the number of augmentation for training t
# to continue training of the pre trained network

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=nb_class,
	 b_size=32, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP16C_FP32A")

print ("Loading the dataset ...")

gn2.init_data_gen()

input_data, targets = gn2.create_train_batch()
input_val, targets_val = gn2.create_val_batch()

cnn.create_dataset("TRAIN", nb_images_per_batch, input_data[:,:], targets[:,:])
cnn.create_dataset("VALID", nb_obj_val, input_val[:,:], targets_val[:,:])

if(load_epoch == -2):
	nb_augm = 300
	cnn.load("/content/IRMIA_2022/classifier/classifier_imagenet_pretrained_224_acc57.dat",0, nb_layers=20, bin=1)
	cnn.conv(f_size=i_ar([1,1]), nb_filters=nb_class, padding=i_ar([0,0]), activation="LIN")
	cnn.pool(p_size=i_ar([1,1]), p_global=1, p_type="AVG", activation="SMAX")
elif(load_epoch == -1):
	nb_augm = 100
	cnn.load("/content/IRMIA_2022/sliding_window/sliding_window_trained_acc84.dat",0, bin=1)
elif(load_epoch > 0):
	cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:

	cnn.conv(f_size=i_ar([3,3]), nb_filters=16  , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=4)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=32  , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=4)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=64 , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=8)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=128 , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=8)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=256 , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=16)
	cnn.pool(p_size=i_ar([2,2]), p_type="MAX")
	
	cnn.conv(f_size=i_ar([3,3]), nb_filters=512 , padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=16)
	cnn.conv(f_size=i_ar([3,3]), nb_filters=1024, padding=i_ar([1,1]), activation="RELU")
	cnn.norm(group_size=32)
	cnn.conv(f_size=i_ar([3,3]), nb_filters=1024, padding=i_ar([1,1]), activation="RELU")
	cnn.conv(f_size=i_ar([1,1]), nb_filters=nb_class, padding=i_ar([0,0]), activation="LIN")
	cnn.pool(p_size=i_ar([1,1]), p_global=1, p_type="AVG", activation="SMAX")


for batch_augm in range(0,nb_augm):
		
	t = Thread(target=data_augm)
	t.start()
	
	cnn.train(nb_epoch=epoch_per_augm, learning_rate=0.00005, end_learning_rate=0.000001, 
				lr_decay=0.0015, momentum=0.8, shuffle_every=0, confmat=1, weight_decay=0.015,
				control_interv=5, save_every=100, TC_scale_factor=16.0, save_bin=1)
	if(batch_augm == 0):
		cnn.perf_eval()

	t.join()
	
	cnn.swap_data_buffers("TRAIN")


gn2.free_data_gen()
del (input_data, targets, input_val, targets_val)

EOF

### **3\. Sliding window prediction**

#### Regions definition and network inference

In [None]:
%%shell

cd /content/IRMIA_2022/sliding_window/

python3 - <<EOF

import numpy as np
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET
from tqdm import tqdm
import re
import os

import albumentations as A
import cv2

import sys
sys.path.insert(0,"/content/IRMIA_2022/CIANNA/src/build/lib.linux-x86_64")
import CIANNA as cnn

load_epoch = 0
if (len(sys.argv) > 1):
	load_epoch = int(sys.argv[1])

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
nb_keep_val = 300 # Lower than the actual number of example to keep RAM low enough

nb_class = 21
image_size_orig = 288
image_size = 128

frac_size = np.array([288,144,72])
frac_stride = np.array([0,72,36])

print ("Loading the dataset ...")

all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size_orig, image_size_orig, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

nb_regions_per_im = 1
for l in range(1,np.size(frac_size)):
	nb_regions_per_im += ((image_size_orig-frac_size[l])/frac_stride[l] + 1)**2

print (nb_regions_per_im)
all_nb_test_images = int(nb_regions_per_im*nb_keep_val)

print (all_nb_test_images)

input_test = np.zeros((all_nb_test_images,image_size*image_size*3), dtype="float32")
targets_test = np.zeros((all_nb_test_images,nb_class), dtype="float32")

k = 0
for i in tqdm(range(0, nb_keep_val)):
	
	i_d = orig_nb_images - nb_keep_val + i
	
	patch = np.copy(all_im[i_d])
	
	x_offset, y_offset, width2, height2 = all_im_prop[i_d]
		
	for l in range(0, np.size(frac_size)):
		
		if(l == 0):
			nb_reg = 1
		else:
			nb_reg = int((image_size_orig-frac_size[l])/frac_stride[l] + 1)
		
		for l_x in range(0, nb_reg):
			for l_y in range(0, nb_reg):
				
				xmin = l_x * frac_stride[l]
				ymin = l_y * frac_stride[l]
				xmax = xmin + frac_size[l]
				ymax = ymin + frac_size[l]
				
				transform2 = A.Compose([
					A.Crop(x_min=xmin, y_min=ymin, x_max=xmax, y_max=ymax, p=1.0),
					A.Resize(width=image_size,height=image_size, interpolation=1, p=1.0),
				])

				transformed = transform2(image=patch)
				patch_aug = transformed['image']
				
				for depth in range(0,3):
					input_test[k,depth*image_size*image_size:(depth+1)*image_size*image_size] = patch_aug[:,:,depth].flatten("C")/255.0
				k += 1

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=nb_class,
	 b_size=32, comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP16C_FP32A")

cnn.create_dataset("TEST", all_nb_test_images, input_test[:,:], targets_test[:,:])

load_epoch = 300
if(load_epoch == -1):
	cnn.load("/content/IRMIA_2022/sliding_window/sliding_window_trained.dat",0, bin=1)
	load_epoch = 0
elif(load_epoch > 0):
	cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:
	files = os.listdir("net_save/")
	paths = [os.path.join("net_save/", basename) for basename in files]
	path = max(paths, key=os.path.getctime)
	r_load_epoch = [int(s) for s in re.split('[s.]',path) if s.isdigit()]
	print (r_load_epoch)
	print("Epoch unspecified, loading most recent save : " + path)
	
	cnn.load(path, r_load_epoch[0], bin=1)
	
cnn.forward(no_error=1, saving=2)

del (all_im, all_im_prop, input_test, targets_test)

EOF

#### Prediction vizualisation

In [None]:
%cd /content/IRMIA_2022/sliding_window/

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import patches
import xml.etree.ElementTree as ET
from tqdm import tqdm
from PIL import Image
import matplotlib.patheffects as path_effects

import re
import bisect
import os

import sys

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor","empty"], dtype="str")
class_list_short = np.array(["plane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","table","dog","horse", "m-bike",\
		"person","p-plant","sheep","sofa","train","tv","empty"])

nb_train_2012 = 11540
nb_train_2007 = 5011
nb_test_2007 = 4952
orig_nb_images = nb_train_2012 + nb_train_2007 + nb_test_2007
nb_keep_val = 300 #keep in 2007 test

nb_class = 21
image_size_orig = 288
image_size = 128

frac_size = np.array([288,144,72])
frac_stride = np.array([0,72,36])
nb_reg_per_frac = np.array([1,0,0])
cumul_nb_per_frac = np.array([1,0,0])

all_im = np.fromfile("/content/IRMIA_2022/datasets/all_im.dat", dtype="uint8")
all_im_prop = np.fromfile("/content/IRMIA_2022/datasets/all_im_prop.dat", dtype="float32")
all_im = np.reshape(all_im, ((orig_nb_images, image_size_orig, image_size_orig, 3)))
all_im_prop = np.reshape(all_im_prop,(orig_nb_images, 4))

nb_regions_per_im = 1
for l in range(1,np.size(frac_size)):
	nb_reg_per_frac[l] = ((image_size_orig-frac_size[l])/frac_stride[l] + 1)
	nb_regions_per_im += nb_reg_per_frac[l]**2
	cumul_nb_per_frac[l] = nb_regions_per_im

print (nb_reg_per_frac, cumul_nb_per_frac)

load_epoch = 0
if(load_epoch == 0):
	files = os.listdir("fwd_res/")
	paths = [os.path.join("fwd_res/", basename) for basename in files]
	path = max(paths, key=os.path.getctime)
	r_load_epoch = [int(s) for s in re.split('[_s.]',path) if s.isdigit()]
	print (r_load_epoch)
	print("Epoch unspecified, loading most recent prediction : " + path)
	
	load_epoch = r_load_epoch[0]

pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%load_epoch, dtype="float32")

pred_data = np.reshape(pred_raw,(nb_keep_val, int(nb_regions_per_im), nb_class))

width_list = np.array([2.0, 1.5, 1.0])


In [None]:

i_d = 0

nb_w = 4
nb_h = 8

fig, ax = plt.subplots(nb_h, nb_w, figsize=(2*nb_w,2*nb_h), dpi=210, constrained_layout=True)

for l_h in range(0, nb_h):
  for l_w in range(0, nb_w):
    loc = i_d + l_w + l_h*nb_w
    patch = np.copy(all_im[orig_nb_images - nb_keep_val + loc])
    
    ax[l_h,l_w].imshow(patch)
    ax[l_h,l_w].axis('off')

    for l in range(0,int(nb_regions_per_im)):
      max_loc = np.argmax(pred_data[loc,l,:])
      max_val = np.max(pred_data[loc,l,:])
      if(l == 0 or (max_val > 0.9 and max_loc < nb_class-1)):
        
        index = bisect.bisect(cumul_nb_per_frac, l)
        
        if(l > 0):
          i_l = l - cumul_nb_per_frac[index-1]
        else:
          i_l = 0
        i_x = i_l // nb_reg_per_frac[index]
        i_y = i_l % nb_reg_per_frac[index]
        
        xmin = i_x * frac_stride[index] - 0.5 + 2*index; ymin = i_y * frac_stride[index] - 0.5 + 2*index
        xmax = xmin + frac_size[index] - 4*index; ymax = ymin + frac_size[index] - 4*index
        el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth= width_list[index], fill=False, color=plt.cm.tab20(max_loc), zorder=3)
        c_patch = ax[l_h,l_w].add_patch(el)
        c_text = ax[l_h,l_w].text(xmin+4, ymin+15, "%s-%0.2f"%(class_list_short[max_loc], max_val), c=plt.cm.tab20(max_loc), fontsize=6, clip_on=True)
        c_patch.set_path_effects([path_effects.Stroke(linewidth=width_list[index]+1.5, foreground='black'),
                       path_effects.Normal()])
        c_text.set_path_effects([path_effects.Stroke(linewidth=1.5, foreground='black'),
                       path_effects.Normal()])

plt.show()

In [None]:
#Free the RAM before going further in the notebook
#A RUNTIME RESTART IS ADVISED

del (all_im, all_im_prop)




---



# /!\ YOLO PART NOT UPDATED FOR ESPCI COURSE YET /!\

Go see the general YOLO notebook example from CIANNA
https://github.com/Deyht/CIANNA

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Deyht/CIANNA/blob/CIANNA_dev/YOLO_CIANNA_object_detection_example_on_PASCAL_VOC.ipynb)


## **C - The YOLO object detector**
(YOLO - You Only Look Once)