# Evaluation process

-    After training our model we perform our algorithm evaluation against test samples. Given that we are facing with an object detection problem, we are using two common metric called Average Precision(AP) and Average Recall(AR). We calculate these metrics using the Python COCO API. Coco is the well known large-scale object detection, segmentation, dataset, etc. The COCO API comes with tools for interacting with the COCO dataset, though we are only paying attention to the evaluation package.

-    **The COCO API** is available on the following link: https://github.com/cocodataset/cocoapi 


-    We should download it and follow installation instructions.

-    Once installed, we need two provide the toolkit with two JSON files. The first JSON file will contain all ground truth Annotations and the second JSON file will contain our predictions. The main difference between the contents of the JSON file and our XML Annotations is that the bounding box format is different. Whereas the XML format follows the (xmin,ymin,xmax,ymax) order, the JSON files follow the (x,y,w,h) format, where x,y represent the xmin, ymin coordinates, and w,h the width and height from the bounding box.


-    We will perform the whole evaluation process in 4 stages:

    1. The first step is to transform XML Annotations corresponding to the groundtruth and obtain the first JSON.
            
    2. In second step we will show how to transform the outputs from the inference stage using the Yolo algorithm.
            
    3. On the third step we will run the trained network to perform inference on the test samples and we will create the final JSON file using the function from the second step.
            
    4. Finally, using both JSONs we will evaluate prediction performance against the ground truth.



# 1. Transforming XML Annotations into JSON COCO format

-     Our evaluation metrics will be obtained comparing the ground truth objects and confidence scores against our predictions. Since we are using the COCO API we need to transform our XML Annotations into a JSON format where the bounding box annotations follow the format: **(xmin,ymin,w,h)**.

-     For this matter, we are going to use the **voc2coco** tool available on the following link:https://github.com/Tony607/voc2coco

-    Once installed we should be able to generate our JSON file using the following command:

            python voc2coco.py ./data/voc2012_raw/VOCdevkit/VOC2012/AnnotationsVal ./data/coco/output.json
    

<img src="./validation_json.png" width="600" />

The resulting JSON is showed on the image. Transformed bounding box coordinates are inside the 'bbox' tag.

# 2.Transforming YOLO detections into COCO format

Since we are using COCO API, we need to transform the format of Yolo detections to the COCO format. The main difference is the bounding box format. While yolo outputs the bounding box in terms of xmin,ymin,xmax,ymax coordinates, COCO outputs are expressed on x,y,w,h coordinates

We should transform outputs from all our images and create a JSON file that will serve as the input to the COCO API evaluation function.

The transformation will be done by the **write_eval_file** function:

In [4]:
def xyxy2xywh(x1y1, x2y2):
    x = x1y1[0]
    y = x1y1[1]
    w = x2y2[0] - x1y1[0]
    h = x2y2[1] - x1y1[1]

    return x,y,w,h

def write_eval_file(img, image_name, outputs, class_names):
    boxes, objectness, classes, nums = outputs
    boxes, objectness, classes, nums = boxes[0], objectness[0], classes[0], nums[0]
    wh = np.flip(img.shape[0:2])

    image_id = image_name

    tmp = list()
    for i in range(nums):
        x1y1 = tuple((np.array(boxes[i][0:2]) * wh).astype(np.float64))
        x2y2 = tuple((np.array(boxes[i][2:4]) * wh).astype(np.float64))

        x,y,w,h = xyxy2xywh(x1y1, x2y2)
        
        score = objectness[i].numpy().astype(np.float64)

        tmp.append({"image_id":image_id,
                    "category_id":int(classes[i]),
                    "bbox":[round(x,2),round(y,2),round(w,2),round(h,2)],
                    "score": score})
        
    return tmp

Now we need to read the filenames of the images for our evaluation. This images are located on the val.txt file. We read that file and store filenames under the **names_val** array:

# 3. Inference using trained model

Model definition

In [None]:
WEIGHTS = 'checkpoints/yolov3_train_75.tf'

In [None]:
import sys
from absl import app, logging, flags
from absl.flags import FLAGS
import time
import cv2
import numpy as np
import tensorflow as tf
from yolov3_tf2.models import (
    YoloV3, YoloV3Tiny
)
from yolov3_tf2.dataset import transform_images, load_tfrecord_dataset
from yolov3_tf2.utils import draw_outputs

flags.DEFINE_string('classes', './data/coco.names', 'path to classes file')
flags.DEFINE_string('weights', WEIGHTS,
                    'path to weights file')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_string('image', './data/girl.png', 'path to input image')
flags.DEFINE_string('tfrecord', None, 'tfrecord instead of image')
flags.DEFINE_string('output', './output.jpg', 'path to output image')
flags.DEFINE_integer('num_classes', 80, 'number of classes in the model')

flags.DEFINE_string('video','./data/2020-07-07-185827.mp4' ,'path to video file or number for webcam)')
flags.DEFINE_string('output_format', 'XVID', 'codec used in VideoWriter when saving video to file')

#LOAD WEIGHTS AND CREATE MODEL
FLAGS.output = '/content/drive/My Drive/yolov3-tf2/data/temp.avi'
FLAGS.num_classes = 3
FLAGS.classes = 'data/voc2012.names'
FLAGS.weights = WEIGHTS
FLAGS.tiny = False

# Lower threshold due to insufficient training
FLAGS.yolo_iou_threshold = 0.2
FLAGS.yolo_score_threshold = 0.2

app._run_init(['yolov3'], app.parse_flags_with_usage)

if FLAGS.tiny:
    yolo = YoloV3Tiny(classes=FLAGS.num_classes)
else:
    yolo = YoloV3(classes=FLAGS.num_classes)

yolo.load_weights(FLAGS.weights).expect_partial()
logging.info('weights loaded')

class_names = [c.strip() for c in open(FLAGS.classes).readlines()]
logging.info('classes loaded')

Now we read val.txt file to extract validation filenames for inference

In [None]:
path_val =  'data/voc2012_raw/VOCdevkit/VOC2012/ImageSets/Main/'
val = open(path_val + 'val.txt',"r")
Lines = val.readlines() 
names_val = []
for line in Lines: 
    name = line.split()[0]
    #name = name + '.png'
    names_val.append(name)

Now we iterate through the samples, we perform inference using the previously trained weights. We then transform the inference outputs into the COCO JSON format using the function **write_eval_file** from step 2

In [None]:
from IPython.display import Image, display
from yolov3_tf2.utils import draw_outputs

import os

image_path = '/content/drive/My Drive/yolov3-tf2/data/voc2012_raw/VOCdevkit/VOC2012/JPEGImages/'
names = names_val
eval_files = []

for i, filename in enumerate(names):
  if (i%50) == 0:
    print("Image ",i)
    print(image_path + FLAGS.image)

  FLAGS.image = filename + '.png'
  try:  
    img_raw = tf.image.decode_image(
        open(image_path + FLAGS.image, 'rb').read(), channels=3)
  except:
    try:
      FLAGS.image = filename + '.jpg'
      img_raw = tf.image.decode_image(
        open(image_path + FLAGS.image, 'rb').read(), channels=3)
    except:
      try:
        FLAGS.image = filename + '.jpeg'
        img_raw = tf.image.decode_image(
        open(image_path + FLAGS.image, 'rb').read(), channels=3)
      except:
        print("wrong image")
        continue

  img = tf.expand_dims(img_raw, 0)
  img = transform_images(img, FLAGS.size)

  t1 = time.time()
  #boxes, scores, classes, nums = yolo(img)
  outputs = yolo(img)
  t2 = time.time()
  eval_file = write_eval_file(img, filename, outputs, class_names)
  eval_files.append(eval_file)

In [None]:
print("Number of validation samples: ",len(eval_files))

Dump JSON predictions into a **results.json** file

In [None]:
import json
with open('results.json', 'w',encoding='utf-8') as f:
    json.dump(eval_files, f, ensure_ascii=False, indent=4)

# Evaluating predictions

In [24]:
%matplotlib inline
import matplotlib.pyplot as plt
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import numpy as np
import skimage.io as io
import pylab
pylab.rcParams['figure.figsize'] = (10.0, 8.0)

In [25]:
annType = ['segm','bbox','keypoints']
annType = annType[1]      #specify type here
prefix = 'person_keypoints' if annType=='keypoints' else 'instances'
print ('Running demo for *%s* results.'%(annType))

Running demo for *bbox* results.


Initialize the path containing the JSON file for the ground truth Annotations:

In [26]:
#initialize COCO ground truth api
annFile = './valtiny.json'
cocoGt=COCO(annFile)

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!


Initialize the path containing the JSON file for the predicted Annotations:

In [27]:
#initialize COCO detections api
resFile = './valpredtiny.json'
cocoDt=cocoGt.loadRes(resFile)

Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!


In [28]:
imgIds=cocoGt.getImgIds()
imgIds=imgIds[0:100]
print(imgIds)
#imgId = imgIds[np.random.randint(100)]

['maksssksksss117']


Running evaluation

In [29]:
cocoEval = COCOeval(cocoGt,cocoDt,annType)
cocoEval.params.imgIds  = imgIds
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()

Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.00s).
Accumulating evaluation results...
DONE (t=0.01s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=