# **CIANNA PASCAL example script**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Deyht/CIANNA/blob/CIANNA/examples/PASCAL/pascal_pred_notebook.ipynb)

---


**Link to the CIANNA github repository**
https://github.com/Deyht/CIANNA

### **CIANNA installation**

#### Query GPU allocation and properties

If nvidia-smi fail, it might indicate that you launched the colab session whithout GPU reservation.  
To change the type of reservation go to "Runtime"->"Change runtime type" and select "GPU" as your hardware accelerator.

In [None]:
%%shell

nvidia-smi

cd /content/

git clone https://github.com/NVIDIA/cuda-samples/

cd /content/cuda-samples/Samples/1_Utilities/deviceQuery/

make SMS="50 60 70 80"

./deviceQuery | grep Capability | cut -c50- > ~/cuda_infos.txt
./deviceQuery | grep "CUDA Driver Version / Runtime Version" | cut -c57- >> ~/cuda_infos.txt

cd ~/

If you are granted a GPU that does not support FP16 computation, it is advised to change the mixed precision method to FP32C_FP32A in the corresponding cells.
See the detail description on mixed precision support with CIANNA on the [Systeme Requirements](https://github.com/Deyht/CIANNA/wiki/1\)-System-Requirements) wiki page.

#### Clone CIANNA git repository

In [None]:
%%shell

cd /content/

git clone https://github.com/Deyht/CIANNA

cd CIANNA

#### Compiling CIANNA for the allocated GPU generation

There is no guaranteed forward or backward compatibility between Nvidia GPU generation, and some capabilities are generation specific. For these reasons, CIANNA must be provided the platform GPU generation at compile time.
The following cell will automatically update all the necessary files based on the detected GPU, and compile CIANNA.

In [None]:
%%shell

cd /content/CIANNA

mult="10"
cat ~/cuda_infos.txt
comp_cap="$(sed '1!d' ~/cuda_infos.txt)"
cuda_vers="$(sed '2!d' ~/cuda_infos.txt)"

lim="11.1"
old_arg=$(awk '{if ($1 < $2) print "-D CUDA_OLD";}' <<<"${cuda_vers} ${lim}")

sm_val=$(awk '{print $1*$2}' <<<"${mult} ${comp_cap}")

gen_val=$(awk '{if ($1 >= 80) print "-D GEN_AMPERE"; else if($1 >= 70) print "-D GEN_VOLTA";}' <<<"${sm_val}")

sed -i "s/.*arch=sm.*/\\t\tcuda_arg=\"\$cuda_arg -D CUDA -D comp_CUDA -lcublas -lcudart -arch=sm_$sm_val $old_arg $gen_val\"/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" compile.cp
sed -i "s/\/cuda-[0-9][0-9].[0-9]/\/cuda-$cuda_vers/g" src/python_module_setup.py

./compile.cp CUDA PY_INTERF

mv src/build/lib.linux-x86_64-* src/build/lib.linux-x86_64

#### CIANNA notebook guideline

**IMPORTANT NOTE**   
CIANNA is mainly used in a script fashion and was not designed to run in notebooks. Every cell code that directly invokes CIANNA functions must be run as a script to avoid possible errors.  
To do so, the cell must have the following structure.

```
%%shell

cd /content/CIANNA

python3 - <<EOF

[... your python code ...]

EOF
```

This syntax allows one to easily edit python code in the notebook while running the cell as a script. Note that all the notebook variables can not be accessed by the cell in this context.


## PASCAL-VOC prediction network

The present notebook uses a network trained on the PASCAL 2012 trainval and and PASCAL 2007 trainval datasets. The training dataset comprises 16551 images, each associated with target bounding boxes with 20 possible classes. Python training scripts are provided in the corresponding example directory of CIANNA. The network architecture is similar to a darknet-19 with a few adjustments to account for the current CIANNA capabilities. The network was first pre-trained on ImageNET for classification on 1000 classes at a 224x224 resolution and then further pre-trained at a 448x448 resolution. Finally, the network is trained on the PASCAL dataset for detection at a 416x416 resolution.

In this notebook, we apply a trained network to the 4952 images in the PASCAL 2007 test dataset, and also provide a simplified script to use this network to perform a detection on an external image. Before prediction, all the test images are re-processed to be square-padded and centered at a 480x480 resolution.



### Downloading and preparing PASCAL data

In [None]:
%%shell

cd /content/CIANNA/examples/PASCAL/

python3 - <<EOF

import numpy as np
from tqdm import tqdm
from PIL import Image
import os, glob

data_path = "./"

def make_square(im, min_size, fill_color=(0, 0, 0, 0)):
  x, y = im.size
  size = max(min_size, x, y)
  new_im = Image.new('RGB', (size, size), fill_color)
  new_im.paste(im, (int((size - x) / 2), int((size - y) / 2)))
  return new_im

if(not os.path.isdir(data_path+"VOCdevkit")):
  os.system("wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar")
  os.system("tar -xf VOCtest_06-Nov-2007.tar")

test_list_2007  = np.loadtxt(data_path+"VOCdevkit/VOC2007/ImageSets/Main/test.txt"    , dtype="str")

nb_test_2007 = 4952
nb_keep_val = 4952
block_size = 2000
image_size_raw = 480
nb_class = 20

for block_id in range(0, (nb_keep_val + block_size - 1)//block_size):

  all_im = np.zeros((min(block_size, nb_keep_val - block_id*block_size), image_size_raw, image_size_raw, 3), dtype="uint8")
  all_im_prop = np.zeros((min(block_size, nb_keep_val - block_id*block_size), 4), dtype="float32")

  for i in tqdm(range(0, min(block_size, nb_keep_val - block_id*block_size))):
    im = Image.open(data_path+"VOCdevkit/VOC2007/JPEGImages/"+test_list_2007[block_id*block_size + i]+".jpg")

    width, height = im.size

    im = make_square(im, image_size_raw)
    width2, height2 = im.size

    x_offset = int((width2 - width)*0.5)
    y_offset = int((height2 - height)*0.5)

    all_im_prop[i] = [x_offset, y_offset, width2, height2]

    im = im.resize((image_size_raw,image_size_raw), resample=Image.BILINEAR)
    im_array = np.asarray(im)
    for depth in range(0,3):
      all_im[i,:,:,depth] = im_array[:,:,depth]

  all_im.tofile("all_im_b%d.dat"%(block_id))
  all_im_prop.tofile("all_im_prop_b%d.dat"%(block_id))

  del (all_im, all_im_prop)

EOF

### Performing network prediction

In [None]:
%%shell

cd /content/CIANNA/examples/PASCAL/

python3 - <<EOF

import numpy as np
import xml.etree.ElementTree as ET
import albumentations as A
import cv2

#Comment to access system wide install
import sys, glob, os
sys.path.insert(0,glob.glob('../../src/build/lib.*/')[-1])
import CIANNA as cnn

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

def prep_data(block_id):
  print("Preparing data for block %d ..."%(block_id))

  l_b_size = min(block_size, nb_keep_val - block_id*block_size)
  input_val = np.zeros((l_b_size,flat_image_slice*3), dtype="float32")
  targets_val = np.zeros((l_b_size,1+max_nb_obj_per_image*(7+1)), dtype="float32")

  all_im = np.fromfile(data_path+"all_im_b%d.dat"%(block_id), dtype="uint8")
  all_im_prop = np.fromfile(data_path+"all_im_prop_b%d.dat"%(block_id), dtype="float32")
  all_im = np.reshape(all_im, ((l_b_size, image_size_raw, image_size_raw, 3)))
  all_im_prop = np.reshape(all_im_prop,(l_b_size, 4))

  for i in range(0, min(block_size, nb_keep_val - block_id*block_size)):

    tree = ET.parse(data_path+"VOCdevkit/VOC2007/Annotations/"+test_list_2007[block_id*block_size + i]+".xml")
    root = tree.getroot()
    obj_list = root.findall("object", namespaces=None)

    patch = np.copy(all_im[i])
    x_offset, y_offset, width, height = all_im_prop[i]
    max_dim = max(width, height)

    bbox_list = np.zeros((len(obj_list),7))
    k = 0
    for obj in obj_list:
      diff = obj.find("difficult", namespaces=None)
      oclass = obj.find("name", namespaces=None)
      bndbox = obj.find("bndbox", namespaces=None)

      int_class = int(np.where(class_list[:] == oclass.text)[0])
      xmin = (float(bndbox.find("xmin").text)+x_offset)*image_size_raw/width
      ymin = (float(bndbox.find("ymin").text)+y_offset)*image_size_raw/height
      xmax = (float(bndbox.find("xmax").text)+x_offset)*image_size_raw/width
      ymax = (float(bndbox.find("ymax").text)+y_offset)*image_size_raw/height

      bbox_list[k,:] = np.array([xmin,ymin,xmax,ymax,int_class,0,k])
      if(diff.text != "1"):
        class_count_val[int_class] += 1
      else:
        bbox_list[k,5] = 1
      k += 1

    bbs = bbox_list[:,:]
    transformed = transform_val(image=patch,bboxes=bbs)
    patch_aug = transformed['image']
    bbs_aug = np.asarray(transformed['bboxes'])

    for depth in range(0,3):
      input_val[i,depth*flat_image_slice:(depth+1)*flat_image_slice] = (patch_aug[:,:,depth].flatten("C")-100.0)/155.0

    targets_val[i,:] = 0.0
    targets_val[i,0] = np.shape(bbs_aug)[0]
    for k in range(0, np.shape(bbs_aug)[0]):
      xmin = bbs_aug[k,0]
      ymin = bbs_aug[k,1]
      xmax = bbs_aug[k,2]
      ymax = bbs_aug[k,3]
      orig_box = bbox_list[int(bbs_aug[k,6])]
      diff = bbs_aug[k,5]
      targets_val[i,1+k*8:1+(k+1)*8] = np.array([bbs_aug[k,4]+1,xmin,ymin,0.0,xmax,ymax,1.0,diff])

    if(targets_val[i,0] > max_nb_obj_per_image):
      targets_val[i,0] = max_nb_obj_per_image

  del(all_im, all_im_prop)
  return input_val, targets_val



data_path = "./"

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor"], dtype="str")

test_list_2007 = np.loadtxt(data_path+"VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

image_size_raw = 480
image_size = 416
flat_image_slice = image_size*image_size
nb_class = 20
max_nb_obj_per_image = 56

nb_test_2007 = 4952
nb_keep_val = 4952
block_size = 2000

transform_val = A.Compose([
  A.Resize(width=image_size,height=image_size,interpolation=1)
  ], bbox_params=A.BboxParams(format='pascal_voc'))

class_count_val = np.zeros((nb_class))

load_epoch = 0
if (len(sys.argv) > 1):
  load_epoch = int(sys.argv[1])

cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=1, b_size=32,
  comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP16C_FP32A", inference_only=1)

cnn.set_yolo_params()

if(load_epoch > 0):
  cnn.load("net_save/net0_s%04d.dat"%load_epoch,load_epoch, bin=1)
else:
  if(not os.path.isfile("net_train_pascal_416_fp16_75.3map.dat")):
    os.system("wget https://share.obspm.fr/s/XxY3gXnpXgsxA24/download/net_train_pascal_416_fp16_75.3map.dat")
  cnn.load("net_train_pascal_416_fp16_75.3map.dat", 0, bin=1)

#cnn.print_arch_tex("./", "arch", activation=1, dropout=0)

for block_id in range(0, (nb_keep_val + block_size - 1)//block_size):

  input_val, targets_val = prep_data(block_id)
  cnn.create_dataset("TEST", min(block_size, nb_keep_val - block_id*block_size), input_val, targets_val)
  del(input_val, targets_val)

  cnn.forward(repeat=1, no_error=1, saving=2, drop_mode="AVG_MODEL")
  os.system("mv fwd_res/net0_%04d.dat fwd_res/net0_%04d_b%d.dat"%(load_epoch, load_epoch, block_id))
  cnn.delete_dataset("TEST")

EOF

### Compute PASCAL 2007 testset mAP

In [None]:
%%shell

cd /content/CIANNA/examples/PASCAL/

python3 - <<EOF

from aux_fct import *
#Use auxiliary functions from aux_fct.py

data_path = "./"

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","diningtable","dog","horse","motorbike",\
    "person","pottedplant","sheep","sofa","train","tvmonitor"], dtype="str")

test_list_2007 = np.loadtxt(data_path+"VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

image_size_raw = 480
image_size = 416
flat_image_slice = image_size*image_size
nb_class = 20
max_nb_obj_per_image = 56

nb_test_2007 = 4952
nb_keep_val = 4952
block_size = 2000

load_epoch = 0
obj_threshold=0.03
class_soft_limit=0.3
nms_threshold_same=0.4
nms_threshold_diff=0.95

transform_val = A.Compose([
  A.Resize(width=image_size,height=image_size,interpolation=1)
  ], bbox_params=A.BboxParams(format='pascal_voc'))

#####################################################
# Construct validation target
#####################################################

class_count_val = np.zeros((nb_class))

def prep_data_targ(block_id):
  print("Preparing data for block %d ..."%(block_id))

  l_b_size = min(block_size, nb_keep_val - block_id*block_size)
  targets_val = np.zeros((l_b_size,1+max_nb_obj_per_image*(7+1)), dtype="float32")

  all_im = np.fromfile(data_path+"all_im_b%d.dat"%(block_id), dtype="uint8")
  all_im_prop = np.fromfile(data_path+"all_im_prop_b%d.dat"%(block_id), dtype="float32")
  all_im = np.reshape(all_im, ((l_b_size, image_size_raw, image_size_raw, 3)))
  all_im_prop = np.reshape(all_im_prop,(l_b_size, 4))

  for i in range(0, min(block_size, nb_keep_val - block_id*block_size)):

    tree = ET.parse(data_path+"VOCdevkit/VOC2007/Annotations/"+test_list_2007[block_id*block_size + i]+".xml")
    root = tree.getroot()
    obj_list = root.findall("object", namespaces=None)

    patch = np.copy(all_im[i])
    x_offset, y_offset, width, height = all_im_prop[i]
    max_dim = max(width, height)

    bbox_list = np.zeros((len(obj_list),7))
    k = 0
    for obj in obj_list:
      diff = obj.find("difficult", namespaces=None)
      oclass = obj.find("name", namespaces=None)
      bndbox = obj.find("bndbox", namespaces=None)

      int_class = int(np.where(class_list[:] == oclass.text)[0])
      xmin = (float(bndbox.find("xmin").text)+x_offset)*image_size_raw/width
      ymin = (float(bndbox.find("ymin").text)+y_offset)*image_size_raw/height
      xmax = (float(bndbox.find("xmax").text)+x_offset)*image_size_raw/width
      ymax = (float(bndbox.find("ymax").text)+y_offset)*image_size_raw/height

      bbox_list[k,:] = np.array([xmin,ymin,xmax,ymax,int_class,0,k])
      if(diff.text != "1"):
        class_count_val[int_class] += 1
      else:
        bbox_list[k,5] = 1
      k += 1

    bbs = bbox_list[:,:]
    transformed = transform_val(image=patch,bboxes=bbs)
    patch_aug = transformed['image']
    bbs_aug = np.asarray(transformed['bboxes'])

    targets_val[i,:] = 0.0
    targets_val[i,0] = np.shape(bbs_aug)[0]
    for k in range(0, np.shape(bbs_aug)[0]):
      xmin = bbs_aug[k,0]
      ymin = bbs_aug[k,1]
      xmax = bbs_aug[k,2]
      ymax = bbs_aug[k,3]
      orig_box = bbox_list[int(bbs_aug[k,6])]
      diff = bbs_aug[k,5]
      targets_val[i,1+k*8:1+(k+1)*8] = np.array([bbs_aug[k,4]+1,xmin,ymin,0.0,xmax,ymax,1.0,diff])

    if(targets_val[i,0] > max_nb_obj_per_image):
      targets_val[i,0] = max_nb_obj_per_image

  del(all_im, all_im_prop)
  return targets_val

targets_val = np.zeros((nb_keep_val,1+max_nb_obj_per_image*(7+1)), dtype="float32")

for block_id in range(0, (nb_keep_val + block_size - 1)//block_size):

  b_targets_val = prep_data_targ(block_id)
  targets_val[block_id*block_size:(block_id+1)*block_size,:] = b_targets_val[:,:]
  del (b_targets_val)


#####################################################
# Filter network predictions (objectness, NMS, etc)
#####################################################

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_class),dtype="float32")

final_boxes = []

for block_id in range(0, (nb_keep_val + block_size - 1)//block_size):

  l_b_size = min(block_size, nb_keep_val - block_id*block_size)
  pred_raw = np.fromfile("fwd_res/net0_%04d_b%d.dat"%(load_epoch, block_id), dtype="float32")
  predict = np.reshape(pred_raw, (l_b_size,nb_box*(8+nb_class),yolo_nb_reg,yolo_nb_reg))

  for l in tqdm(range(0, l_b_size)):

    c_tile[:,:] = 0.0
    c_tile_kept[:,:] = 0.0

    c_pred = predict[l,:,:,:]
    c_nb_box = box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit)

    c_nb_box_final = c_nb_box
    amax_array = np.amax(c_tile[:,7:], axis=1)
    c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, amax_array, nms_threshold_same, nms_threshold_diff)

    final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))


#####################################################
# Compute the mAP for the validation sample
#####################################################

AP_IoU_val=0.5

for l in range(0,nb_keep_val):
  p_c = np.amax(final_boxes[l][:,7:], axis=1)
  final_boxes[l] = (final_boxes[l][(final_boxes[l][:,5]*p_c[:]**1).argsort()])[::-1]

recall_precision = np.empty((nb_keep_val), dtype="object")

print("Find associations ...", flush=True)

for i_d in range(0, nb_keep_val):

  recall_precision[i_d] = np.zeros((np.shape(final_boxes[i_d])[0], 6))

  if(np.shape(final_boxes[i_d])[0] == 0):
    continue

  recall_precision[i_d][:,0] = np.amax(final_boxes[i_d][:,7:], axis=1)
  recall_precision[i_d][:,1] = final_boxes[i_d][:,5]

  recall_precision[i_d][:,5] = np.argmax(final_boxes[i_d][:,7:], axis=1)

  kept_boxes = targets_val[i_d]
  kept_mask = np.zeros(int(kept_boxes[0]), dtype="int")

  for i in range(0,np.shape(final_boxes[i_d])[0]):
    best_IoU = -2.0
    best_targ = -1
    for j in range(0,int(kept_boxes[0])):
      xmin = (kept_boxes[1+j*8+1])
      ymin = (kept_boxes[1+j*8+2])
      xmax = (kept_boxes[1+j*8+4])
      ymax = (kept_boxes[1+j*8+5])
      c_kept_box = np.array([xmin, ymin, xmax, ymax])
      c_IoU = fct_classical_IoU(c_kept_box, final_boxes[i_d][i,:4])
      if(c_IoU > best_IoU and np.argmax(final_boxes[i_d][i,7:]) == int(kept_boxes[1+j*8+0]-1) and kept_mask[j] == 0):
        best_IoU = c_IoU
        best_targ = j

    if (best_IoU >= AP_IoU_val):
      if(kept_boxes[1+best_targ*8+7] > 0.99):
        recall_precision[i_d][i,2] = -1
      else:
        recall_precision[i_d][i,2] = 1
        recall_precision[i_d][i,3] = best_targ
        recall_precision[i_d][i,4] = c_IoU
        kept_mask[best_targ] = 1


print("Process and flatten the mAP result")
flatten = np.vstack(recall_precision.flatten())

recall_precision_f = np.zeros((np.shape(flatten)[0], 10))
recall_precision_f[:,:6] = flatten[:,:]

recall_precision_fs = (recall_precision_f[(recall_precision_f[:,1]*recall_precision_f[:,0]**1).argsort()])[::-1]

ignore_index = np.where(recall_precision_fs[:,2] == -1)[0]

recall_precision_fs = np.delete(recall_precision_fs,ignore_index, axis=0)

recall_precision_fs[:,6] = np.cumsum(recall_precision_fs[:,2])
recall_precision_fs[:,7] = np.cumsum(1.0 - recall_precision_fs[:,2])
recall_precision_fs[:,8] = recall_precision_fs[:,6] / (recall_precision_fs[:,6]+recall_precision_fs[:,7])
recall_precision_fs[:,9] = recall_precision_fs[:,6] / np.sum(class_count_val)

interp_curve = np.maximum.accumulate(recall_precision_fs[::-1,8])[::-1]

AP_all = np.trapz(interp_curve, recall_precision_fs[:,9])
print ("AP_all (%.2f): %f%%"%(AP_IoU_val, AP_all*100.0))

plt.figure(figsize=(4*1.0,3*1.0), dpi=200, constrained_layout=True)
plt.plot(recall_precision_fs[:,9], recall_precision_fs[:,8])
plt.plot(recall_precision_fs[:,9], interp_curve, label="New")
plt.xlabel(r"Recall")
plt.ylabel(r"Precision")
plt.title("All classes as one AP curve", fontsize=8)

sumAP = 0
print ("**** Per class AP ****")
fig, ax = plt.subplots(figsize=(4*1.3,3*1.3), dpi=200, constrained_layout=True)
plt.xlabel(r"Recall")
plt.ylabel(r"Precision")

for k in range(0, nb_class):
  index = np.where(recall_precision_fs[:,5] == k)
  l_recall_precision_fs = recall_precision_fs[index[0]]

  l_recall_precision_fs[:,6] = np.cumsum(l_recall_precision_fs[:,2])
  l_recall_precision_fs[:,7] = np.cumsum(1.0 - l_recall_precision_fs[:,2])
  l_recall_precision_fs[:,8] = l_recall_precision_fs[:,6] / (l_recall_precision_fs[:,6]+l_recall_precision_fs[:,7])
  l_recall_precision_fs[:,9] = l_recall_precision_fs[:,6] / class_count_val[k]

  interp_curve = np.maximum.accumulate(l_recall_precision_fs[::-1,8])[::-1]

  AP = np.trapz(interp_curve, l_recall_precision_fs[:,9])
  sumAP += AP

  plt.plot(l_recall_precision_fs[:,9], interp_curve, label=class_list[k],c=plt.cm.tab20(k))

  print("AP %-15s: %5.2f%%   Total: %4d - T: %4d - F: %4d"%(class_list[k], AP*100.0, class_count_val[k], l_recall_precision_fs[-1,6], l_recall_precision_fs[-1,7]))

print ("\n**** mAP (%.2f): %f%% ****"%(AP_IoU_val, sumAP/nb_class*100.0))

plt.legend(bbox_to_anchor=(1.02,0.98), fontsize=8)
plt.title("Per class AP curve", fontsize=8)
plt.savefig("AP_curve_@%.2f_per_class.jpg"%(AP_IoU_val))

del (targets_val, final_boxes)

EOF

### Visualize network predictions

In [None]:
%%shell

cd /content/CIANNA/examples/PASCAL/

python3 - <<EOF

from aux_fct import *
#Use auxiliary functions from aux_fct.py

data_path = "./"

class_list = np.array(["aeroplane","bicycle","bird","boat","bottle","bus","car",\
		"cat","chair","cow","diningtable","dog","horse","motorbike",\
		"person","pottedplant","sheep","sofa","train","tvmonitor"], dtype="str")

test_list_2007 = np.loadtxt(data_path+"VOCdevkit/VOC2007/ImageSets/Main/test.txt", dtype="str")

image_size_raw = 480
image_size = 416
flat_image_slice = image_size*image_size
nb_class = 20
max_nb_obj_per_image = 56

transform_val = A.Compose([
  A.Resize(width=image_size,height=image_size,interpolation=1)
  ], bbox_params=A.BboxParams(format='pascal_voc'))

nb_test_2007 = 4952
nb_keep_val = 4952
block_size = 2000

load_epoch = 0
obj_threshold=0.3 #Remove low objectness boxes for display
class_soft_limit=0.3
nms_threshold_same=0.4
nms_threshold_diff=0.95

#####################################################
# Construct validation target
#####################################################

class_count_val = np.zeros((nb_class))

def prep_data_targ(block_id):
  print("Preparing data for block %d ..."%(block_id))

  l_b_size = min(block_size, nb_keep_val - block_id*block_size)
  targets_val = np.zeros((l_b_size,1+max_nb_obj_per_image*(7+1)), dtype="float32")

  all_im = np.fromfile(data_path+"all_im_b%d.dat"%(block_id), dtype="uint8")
  all_im_prop = np.fromfile(data_path+"all_im_prop_b%d.dat"%(block_id), dtype="float32")
  all_im = np.reshape(all_im, ((l_b_size, image_size_raw, image_size_raw, 3)))
  all_im_prop = np.reshape(all_im_prop,(l_b_size, 4))

  for i in range(0, min(block_size, nb_keep_val - block_id*block_size)):

    tree = ET.parse(data_path+"VOCdevkit/VOC2007/Annotations/"+test_list_2007[block_id*block_size + i]+".xml")
    root = tree.getroot()
    obj_list = root.findall("object", namespaces=None)

    patch = np.copy(all_im[i])
    x_offset, y_offset, width, height = all_im_prop[i]
    max_dim = max(width, height)

    bbox_list = np.zeros((len(obj_list),7))
    k = 0
    for obj in obj_list:
      diff = obj.find("difficult", namespaces=None)
      oclass = obj.find("name", namespaces=None)
      bndbox = obj.find("bndbox", namespaces=None)

      int_class = int(np.where(class_list[:] == oclass.text)[0])
      xmin = (float(bndbox.find("xmin").text)+x_offset)*image_size_raw/width
      ymin = (float(bndbox.find("ymin").text)+y_offset)*image_size_raw/height
      xmax = (float(bndbox.find("xmax").text)+x_offset)*image_size_raw/width
      ymax = (float(bndbox.find("ymax").text)+y_offset)*image_size_raw/height

      bbox_list[k,:] = np.array([xmin,ymin,xmax,ymax,int_class,0,k])
      if(diff.text != "1"):
        class_count_val[int_class] += 1
      else:
        bbox_list[k,5] = 1
      k += 1

    bbs = bbox_list[:,:]
    transformed = transform_val(image=patch,bboxes=bbs)
    patch_aug = transformed['image']
    bbs_aug = np.asarray(transformed['bboxes'])

    targets_val[i,:] = 0.0
    targets_val[i,0] = np.shape(bbs_aug)[0]
    for k in range(0, np.shape(bbs_aug)[0]):
      xmin = bbs_aug[k,0]
      ymin = bbs_aug[k,1]
      xmax = bbs_aug[k,2]
      ymax = bbs_aug[k,3]
      orig_box = bbox_list[int(bbs_aug[k,6])]
      diff = bbs_aug[k,5]
      targets_val[i,1+k*8:1+(k+1)*8] = np.array([bbs_aug[k,4]+1,xmin,ymin,0.0,xmax,ymax,1.0,diff])

    if(targets_val[i,0] > max_nb_obj_per_image):
      targets_val[i,0] = max_nb_obj_per_image

  del(all_im, all_im_prop)
  return targets_val

targets_val = np.zeros((nb_keep_val,1+max_nb_obj_per_image*(7+1)), dtype="float32")

for block_id in range(0, (nb_keep_val + block_size - 1)//block_size):

  b_targets_val = prep_data_targ(block_id)
  targets_val[block_id*block_size:(block_id+1)*block_size,:] = b_targets_val[:,:]
  del (b_targets_val)


#####################################################
# Filter network predictions (objectness, NMS, etc)
#####################################################

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_class),dtype="float32")

final_boxes = []

for block_id in range(0, (nb_keep_val + block_size - 1)//block_size):

  l_b_size = min(block_size, nb_keep_val - block_id*block_size)
  pred_raw = np.fromfile("fwd_res/net0_%04d_b%d.dat"%(load_epoch, block_id), dtype="float32")
  predict = np.reshape(pred_raw, (l_b_size,nb_box*(8+nb_class),yolo_nb_reg,yolo_nb_reg))

  for l in tqdm(range(0, l_b_size)):

    c_tile[:,:] = 0.0
    c_tile_kept[:,:] = 0.0

    c_pred = predict[l,:,:,:]
    c_nb_box = box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit)

    c_nb_box_final = c_nb_box
    amax_array = np.amax(c_tile[:,7:], axis=1)
    c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, amax_array, nms_threshold_same, nms_threshold_diff)

    final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))


visual_w = 6
visual_h = 4
display_target=1
block_id = 0
id_start=0

l_b_size = min(block_size, nb_keep_val - block_id*block_size)
all_im = np.fromfile(data_path+"all_im_b%d.dat"%(block_id), dtype="uint8")
all_im_prop = np.fromfile(data_path+"all_im_prop_b%d.dat"%(block_id), dtype="float32")
all_im = np.reshape(all_im, ((l_b_size, image_size_raw, image_size_raw, 3)))
all_im_prop = np.reshape(all_im_prop,(l_b_size, 4))

fig, ax = plt.subplots(visual_h, visual_w, figsize=(1.5*visual_w,1.5*visual_h), dpi=210, constrained_layout=True)

for i in range(0, visual_h):
  for j in range(0, visual_w):
    i_d = block_id*block_size + i*visual_w + j + id_start

    c_data = all_im[i_d]/255.0
    ax[i,j].imshow(c_data)
    ax[i,j].axis('off')

    im_boxes = final_boxes[i_d]

    if(display_target):
      targ_boxes = targets_val[i_d]
      for k in range(0, int(targ_boxes[0])):
        xmin = targ_boxes[1+k*8+1] *(image_size_raw/image_size)
        ymin = targ_boxes[1+k*8+2] *(image_size_raw/image_size)
        xmax = targ_boxes[1+k*8+4] *(image_size_raw/image_size)
        ymax = targ_boxes[1+k*8+5] *(image_size_raw/image_size)
        p_c = int(targ_boxes[1+k*8+0]) - 1

        el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.4, ls="--", fill=False, color=plt.cm.tab20(p_c), zorder=3)
        c_patch = ax[i,j].add_patch(el)
        c_text  = ax[i,j].text(xmin+4, ymin+10, "%s"%(class_list[p_c]), c=plt.cm.tab20(p_c), fontsize=2, clip_on=True)
        c_patch.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'), path_effects.Normal()])
        c_text.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'), path_effects.Normal()])


    for k in range(0, np.shape(im_boxes)[0]):
      xmin = max(-0.5,(im_boxes[k,0])*(image_size_raw/image_size) - 0.5)
      ymin = max(-0.5,(im_boxes[k,1])*(image_size_raw/image_size) - 0.5)
      xmax = min(image_size_raw-0.5,(im_boxes[k,2])*(image_size_raw/image_size) - 0.5)
      ymax = min(image_size_raw-0.5,(im_boxes[k,3])*(image_size_raw/image_size) - 0.5)

      p_c = np.argmax(im_boxes[k,7:])

      el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.4, fill=False, color=plt.cm.tab20(p_c), zorder=3)
      c_patch = ax[i,j].add_patch(el)
      c_text = ax[i,j].text(xmin+5, ymax-4, "%s:%d-%0.2f-%0.2f"%(class_list[p_c],im_boxes[k,6],im_boxes[k,5],np.max(im_boxes[k,7:])), c=plt.cm.tab20(p_c), fontsize=2,clip_on=True)
      c_patch.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'),
                        path_effects.Normal()])
      c_text.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'),
                        path_effects.Normal()])

plt.savefig("pred_mosaic.jpg",dpi=500, bbox_inches='tight')

del (targets_val, all_im, all_im_prop)

EOF

In [None]:
%cd /content/CIANNA/examples/PASCAL/

#Display the produced JPG
from IPython.display import Image
Image("pred_mosaic.jpg", width=1280)

## External image prediction

Minimalist example on how to use the network to perform prediction on an external .jpg image.

In [None]:

%%shell

cd /content/CIANNA/examples/PASCAL/

python3 - <<EOF


import numpy as np
import matplotlib.pyplot as plt
from matplotlib import patches
import matplotlib.patheffects as path_effects
from PIL import Image

from numba import jit

import albumentations as A
import cv2

#Comment to access system wide install
import os, sys, glob
sys.path.insert(0,glob.glob('../../src/build/lib.*/')[-1])
import CIANNA as cnn

#Minimum deployement setup for prediction on a single image

def i_ar(int_list):
	return np.array(int_list, dtype="int")

def f_ar(float_list):
	return np.array(float_list, dtype="float32")

@jit(nopython=True, cache=False, fastmath=False)
def fct_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]) + 1)
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]) + 1)
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0] + 1)*abs(box1[3] - box1[1] + 1) + \
		abs(box2[2]-box2[0] + 1)*abs(box2[3] - box2[1] + 1) - inter_2d
	enclose_w = (max(box1[2], box2[2]) - min(box1[0], box2[0]))
	enclose_h = (max(box1[3], box2[3]) - min(box1[1],box2[1]))
	enclose_2d = enclose_w*enclose_h

	cx_a = (box1[2] + box1[0])*0.5; cx_b = (box2[2] + box2[0])*0.5
	cy_a = (box1[3] + box1[1])*0.5; cy_b = (box2[3] + box2[1])*0.5
	dist_cent = np.sqrt((cx_a - cx_b)*(cx_a - cx_b) + (cy_a - cy_b)*(cy_a - cy_b))
	diag_enclose = np.sqrt(enclose_w*enclose_w + enclose_h*enclose_h)

  # DIoU
	return float(inter_2d)/float(uni_2d) - float(dist_cent)/float(diag_enclose)
  # GIoU
	#return float(inter_2d)/float(uni_2d) - float(enclose_2d - uni_2d)/float(enclose_2d)


@jit(nopython=True, cache=False, fastmath=False)
def fct_classical_IoU(box1, box2):
	inter_w = max(0, min(box1[2], box2[2]) - max(box1[0], box2[0]) + 1)
	inter_h = max(0, min(box1[3], box2[3]) - max(box1[1], box2[1]) + 1)
	inter_2d = inter_w*inter_h
	uni_2d = abs(box1[2]-box1[0] + 1)*abs(box1[3] - box1[1] + 1) + \
		abs(box2[2]-box2[0] + 1)*abs(box2[3] - box2[1] + 1) - inter_2d

	return float(inter_2d)/float(uni_2d)


@jit(nopython=True, cache=False, fastmath=False)
def box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit):
	c_nb_box = 0
	for i in range(0,yolo_nb_reg):
		for j in range(0,yolo_nb_reg):
			for k in range(0,nb_box):
				offset = int(k*(8+nb_class)) #no +1 for box prior in prediction
				c_box[4] = c_pred[offset+6,i,j]
				c_box[5] = c_pred[offset+7,i,j]
				p_c = np.max(c_pred[offset+8:offset+8+nb_class,i,j])
				cl = np.argmax(c_pred[offset+8:offset+8+nb_class,i,j])

				if(c_box[5] >= obj_threshold and c_box[5]*p_c**1 >= 0.01 and p_c > class_soft_limit):
					c_box[0] = c_pred[offset,i,j]
					c_box[1] = c_pred[offset+1,i,j]
					c_box[2] = c_pred[offset+3,i,j]
					c_box[3] = c_pred[offset+4,i,j]

					c_box[6] = k
					c_box[7:] = c_pred[offset+8:offset+8+nb_class,i,j]
					c_tile[c_nb_box,:] = c_box[:]
					c_nb_box +=1

	return c_nb_box


@jit(nopython=True, cache=False, fastmath=False)
def apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, amax_array, nms_threshold_same, nms_threshold_diff):
	c_nb_box_final = 0
	c_box_size_prev = c_nb_box

	while(c_nb_box > 0):
		max_objct = np.argmax(c_tile[:c_box_size_prev,5]*amax_array[:c_box_size_prev])
		c_box = np.copy(c_tile[max_objct])
		c_tile[max_objct,5] = 0.0
		c_tile_kept[c_nb_box_final] = c_box
		c_nb_box_final += 1
		c_nb_box -= 1
		i = 0

		for i in range(0,c_box_size_prev):
			if(c_tile[i,5] < 0.00000001):
				continue
			IoU = fct_IoU(c_box[:4], c_tile[i,:4])

			if((IoU > nms_threshold_same and np.argmax(c_box[7:]) == np.argmax(c_tile[i,7:]))
				or (IoU > nms_threshold_diff and np.argmax(c_box[7:]) != np.argmax(c_tile[i,7:]))):
				c_tile[i] = 0.0
				c_nb_box -= 1

	return c_nb_box_final



class_list = np.array(["aeroplane", "bicycle","bird","boat","bottle","bus","car",\
    "cat","chair","cow","diningtable","dog","horse", "motorbike",\
    "person","pottedplant","sheep","sofa","train","tvmonitor","background"])

#The network is resiliant to slight augment in image resolution, which increase the mAP
#We recommand changing image_size by step of 64 (2 grid elements)
#Here training resolution was 416

image_size = 416 + 64*1
flat_image_slice = image_size*image_size
nb_class = 20
nb_box = 5
yolo_reg_size = 32
yolo_nb_reg = int(image_size/yolo_reg_size)

load_epoch = 0


if(not os.path.isfile("office_1.jpg")):
	os.system("wget https://share.obspm.fr/s/GynmcyDtkrsbyLe/download/office_1.jpg")

im = Image.open("office_1.jpg", mode='r')

if(im.format != "RGB"):
	im = im.convert('RGB')

patch = np.asarray(im)

dim_long = np.argmax(im.size)
ratio = image_size/im.size[dim_long]

other_dim = int(np.mod(dim_long+1,2))
offset = np.zeros((2))
offset[dim_long] = 0.0
offset[other_dim] = max(0.0,image_size - im.size[other_dim]*ratio)/2.0

transform = A.Compose([
	A.LongestMaxSize(max_size=image_size, interpolation=1, p=1.0),
	A.PadIfNeeded(min_width=image_size, min_height=image_size, border_mode=cv2.BORDER_CONSTANT, p=1.0),
])

transformed = transform(image=patch)
patch_aug = transformed['image']

input_data = f_ar(np.zeros((1,3*image_size*image_size)))
empty_target = f_ar(np.zeros((1,1)))

for depth in range(0,3):
	input_data[0,depth*flat_image_slice:(depth+1)*flat_image_slice] = (patch_aug[:,:,depth].flatten("C") - 100.0)/155.0


cnn.init(in_dim=i_ar([image_size,image_size]), in_nb_ch=3, out_dim=1, b_size=1,
	comp_meth='C_CUDA', dynamic_load=1, mixed_precision="FP16C_FP32A", inference_only=1)

cnn.create_dataset("TEST", 1, input_data, empty_target)

cnn.set_yolo_params()

cnn.load("net_train_pascal_416_fp16_75.3map.dat",0, bin=1)

cnn.print_arch_tex("./", "arch", activation=1, dropout=0)

cnn.forward(repeat=1, no_error=1, saving=2, drop_mode="AVG_MODEL")



obj_threshold=0.3
class_soft_limit=0.3
nms_threshold_same=0.3
nms_threshold_diff=0.95

pred_raw = np.fromfile("fwd_res/net0_%04d.dat"%load_epoch, dtype="float32")
predict = np.reshape(pred_raw, (1, nb_box*(8+nb_class),yolo_nb_reg,yolo_nb_reg))

c_tile = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_class)),dtype="float32")
c_tile_kept = np.zeros((yolo_nb_reg*yolo_nb_reg*nb_box,(6+1+nb_class)),dtype="float32")
c_box = np.zeros((6+1+nb_class),dtype="float32")

final_boxes = []

c_tile[:,:] = 0.0
c_tile_kept[:,:] = 0.0

c_pred = predict[0,:,:,:]
c_nb_box = box_extraction(c_pred, c_box, c_tile, obj_threshold, class_soft_limit)

c_nb_box_final = c_nb_box
amax_array = np.amax(c_tile[:,7:], axis=1)
c_nb_box_final = apply_NMS(c_tile, c_tile_kept, c_box, c_nb_box, amax_array, nms_threshold_same, nms_threshold_diff)

final_boxes.append(np.copy(c_tile_kept[0:c_nb_box_final]))


#Image is displayed at full resolution. Changing imshow and removing ratio allows to visualize the prediction at the resolution seen by the network.
fig, ax = plt.subplots(1,1, dpi=210, constrained_layout=True)

ax.imshow(patch)
ax.axis('off')

im_boxes = final_boxes[0]

for k in range(0, np.shape(im_boxes)[0]):
	xmin = (im_boxes[k,0]-offset[0])/ratio
	ymin = (im_boxes[k,1]-offset[1])/ratio
	xmax = (im_boxes[k,2]-offset[0])/ratio
	ymax = (im_boxes[k,3]-offset[1])/ratio

	p_c = np.argmax(im_boxes[k,7:])

	el = patches.Rectangle((xmin,ymin), (xmax-xmin), (ymax-ymin), linewidth=0.4, fill=False, color=plt.cm.tab20(p_c), zorder=3)
	c_patch = ax.add_patch(el)
	c_text = ax.text(xmin+5, ymax-4, "%s:%d-%0.2f-%0.2f"%(class_list[p_c],im_boxes[k,6],im_boxes[k,5],np.max(im_boxes[k,7:])), c=plt.cm.tab20(p_c), fontsize=2,clip_on=True)
	c_patch.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'),
										path_effects.Normal()])
	c_text.set_path_effects([path_effects.Stroke(linewidth=0.8, foreground='black'),
										path_effects.Normal()])

plt.savefig("pred_on_image.jpg",dpi=400, bbox_inches='tight')

EOF

In [None]:
%cd /content/CIANNA/examples/PASCAL/

#Display the produced JPG
from IPython.display import Image
Image("pred_on_image.jpg", width=960)