# This notebook: Build evaluation method
* Aim at >.90 accuracy

Currently it is tested with yolov5 prediction results. But it is compatible for all prediction outputs as long as they are in the form of .pandas().xywh. (see section `1.3` for examples)

Using `Google Colab` to view this notebook is highly recommended.

### Questions:
* Want the **big TACO**? i.e. the **unofficial TACO** that contains 5,000+ images. The label quality of the big TACO might be poor. I experimented with it and found a dozen errors in labels (annotations already).

* **Reduce target classes**? There are 60 categories and 28 super-categories. Currently we predict 60 classes, which is might be too many considering that we only have less than 1500 training images. Should we use the 28 super-categories as classes to be predicted? Or more radically, 5~10 classes of plastic, metal, glass, etc.

* Better **Train/Test split**? Currently I do a fully random 1300/100/100 split for train/val/test. This is obviously not the most common choice. Also, the current split is not stratified -- classes(categories)'s distribution in training and testing set will be different which might be a problem! Input are greatly welcomed!

In [1]:
mount_drive = True #mount only if you have weights and TACO images in your drive already

In [2]:
!nvidia-smi

Tue Oct 18 11:55:50 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   67C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# 0. Prep works, install yolov5, download and partition datasets

In [3]:
%cd /content/

/content


In [4]:
%%bash
find . \! -name 'rotated2.zip' -delete

In [5]:
%%capture
!git clone https://github.com/ultralytics/yolov5 
%cd yolov5
!pip install -r requirements.txt #wandb
%cd ..

In [6]:
from PIL import Image, ExifTags
from pycocotools.coco import COCO
from matplotlib.patches import Polygon, Rectangle
from matplotlib.collections import PatchCollection
import colorsys
import random
import pylab

import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from tqdm import tqdm

import shutil
import os
import re

import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils

In [7]:
if not mount_drive:
  # gdown a gdrive file too frequently triggers google's control and makes the file un-gdown-able
  # in this case, go to 1hq0KcSM31yrR4YlWqM_P29Y3YTuvuIom and 1X3O2v3GIPveq3ylWF6o1qHI5uzbN1vWA, manually
  # make a copy of them to your own drive and mount your drive to the colab instance, then you can manipulate freely
 
  !gdown 151cUWIawXdRkVPg5M-aFvlKD67_gENGh # download best trained yolov5x6 weights on original classes
  !gdown 1X3O2v3GIPveq3ylWF6o1qHI5uzbN1vWA # download organized TACO images (TACO itself, 1500 images, without unofficial images)

if mount_drive:
  from google.colab import drive
  drive.mount('/gdrive')
  %cp /gdrive/MyDrive/best_yolov5/exp/weights/best.pt /content/yolov5x6_best_weights.pt #get trained weights
  if not os.path.isfile('/content/rotated2.zip'):
    %cp /gdrive/MyDrive/rotated2_og.zip /content/rotated2.zip #get images

  


Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


In [8]:
!unzip -qq /content/rotated2.zip 
%mv /content/content/* /content/

In [9]:
%%capture
!wget https://raw.githubusercontent.com/pedropro/TACO/master/data/annotations.json
!wget https://raw.githubusercontent.com/pedropro/TACO/master/data/annotations_unofficial.json

In [10]:
nr_imgs=None
for root, dirnames, filenames in os.walk('./yoloTACO/labels/'):
  nr_imgs = len(filenames)
  break
print('Number of all images:\n'+str(nr_imgs))

## train test split
'''
train: images/train
val: images/val
test: images/test
'''
np.random.seed(4)
id_list=[i for i in range(nr_imgs)]
np.random.shuffle(id_list)
train_ids = id_list[:1300]
val_ids = id_list[1300:1400]
test_ids = id_list[1400:]

def move_helper(ids, desti):
  for id in ids:
    img_name = os.path.join( './yoloTACO/images', str(id)+'.jpg' )
    lbl_name = os.path.join( './yoloTACO/labels', str(id)+'.txt' )
    print(img_name)
    if os.path.isfile(img_name):
        shutil.copy( img_name, './yoloTACO/images/'+desti)
        shutil.copy( lbl_name, './yoloTACO/labels/'+desti)
    else :
        print('file does not exist', img_name)

Number of all images:
1500


In [11]:
%%capture
!mkdir yoloTACO/images/train
!mkdir yoloTACO/images/val
!mkdir yoloTACO/images/test
!mkdir yoloTACO/labels/train
!mkdir yoloTACO/labels/val
!mkdir yoloTACO/labels/test
move_helper(test_ids,'test')
move_helper(train_ids,'train')
move_helper(val_ids,'val')

In [12]:
%%bash
mkdir ./datasets
mv yoloTACO datasets/

In [13]:
reduced=False #True if using reduced classes (28 categories)

In [14]:
#@title yml

if reduced == True:

  with open('/content/yolov5/data/yoloTACO.yaml', mode='w') as fp:
    lines = '''path: ../datasets/yoloTACO  # dataset root dir
train: images/train  # train images 
val: images/val  # val images 
test: images/test # test images (optional)

# Classes
names:
  0: Aluminium foil
  1: Battery
  2: Blister pack
  3: Bottle
  4: Bottle cap
  5: Broken glass
  6: Can
  7: Carton
  8: Cup
  9: Food waste
  10: Glass jar
  11: Lid
  12: Other plastic
  13: Paper
  14: Paper bag
  15: Plastic bag & wrapper
  16: Plastic container
  17: Plastic glooves
  18: Plastic utensils
  19: Pop tab
  20: Rope & strings
  21: Scrap metal
  22: Shoe
  23: Squeezable tube
  24: Straw
  25: Styrofoam piece
  26: Unlabeled litter
  27: Cigarette'''
    fp.writelines(lines)

else: 
  with open('/content/yolov5/data/yoloTACO.yaml', mode='w') as fp:
    lines = '''path: ../datasets/yoloTACO  # dataset root dir
train: images/train  # train images (relative to 'path') 128 images
val: images/val  # val images (relative to 'path') 128 images
test: images/test # test images (optional)

# Classes
names:
  0: Aluminium foil
  1: Battery
  2: Aluminium blister pack
  3: Carded blister pack
  4: Other plastic bottle
  5: Clear plastic bottle
  6: Glass bottle
  7: Plastic bottle cap
  8: Metal bottle cap
  9: Broken glass
  10: Food Can
  11: Aerosol
  12: Drink can
  13: Toilet tube
  14: Other carton
  15: Egg carton
  16: Drink carton
  17: Corrugated carton
  18: Meal carton
  19: Pizza box
  20: Paper cup
  21: Disposable plastic cup
  22: Foam cup
  23: Glass cup
  24: Other plastic cup
  25: Food waste
  26: Glass jar
  27: Plastic lid
  28: Metal lid
  29: Other plastic
  30: Magazine paper
  31: Tissues
  32: Wrapping paper
  33: Normal paper
  34: Paper bag
  35: Plastified paper bag
  36: Plastic film
  37: Six pack rings
  38: Garbage bag
  39: Other plastic wrapper
  40: Single-use carrier bag
  41: Polypropylene bag
  42: Crisp packet
  43: Spread tub
  44: Tupperware
  45: Disposable food container
  46: Foam food container
  47: Other plastic container
  48: Plastic glooves
  49: Plastic utensils
  50: Pop tab
  51: Rope & strings
  52: Scrap metal
  53: Shoe
  54: Squeezable tube
  55: Plastic straw
  56: Paper straw
  57: Styrofoam piece
  58: Unlabeled litter
  59: Cigarette'''
    fp.writelines(lines)

In [15]:
%cd ./yolov5
!ls

/content/yolov5
benchmarks.py	 detect.py   models	       setup.cfg       val.py
classify	 export.py   README.md	       train.py
CONTRIBUTING.md  hubconf.py  requirements.txt  tutorial.ipynb
data		 LICENSE     segment	       utils


In [16]:
%pwd

'/content/yolov5'

# 1. Evaluate with our best trained weights so far

## 1.1 detect and eval with yolo default scripts

In [17]:
!python val.py --data yoloTACO.yaml --task test --weights /content/yolov5x6_best_weights.pt
#!python detect.py --weights /content/yolov5x6_best_weights.pt --source /content/datasets/yoloTACO/images/test

[34m[1mval: [0mdata=/content/yolov5/data/yoloTACO.yaml, weights=['/content/yolov5x6_best_weights.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=test, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.2-199-gf1482b0 Python-3.7.15 torch-1.12.1+cu113 CUDA:0 (Tesla T4, 15110MiB)

Fusing layers... 
Model summary: 416 layers, 140537980 parameters, 0 gradients, 209.1 GFLOPs
[34m[1mtest: [0mScanning '/content/datasets/yoloTACO/labels/test' images and labels...100 found, 0 missing, 0 empty, 0 corrupt: 100% 100/100 [00:00<00:00, 453.47it/s]
[34m[1mtest: [0mNew cache created: /content/datasets/yoloTACO/labels/test.cache
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 4/4 [00:23<00:00,  5.77s/it]
                   all        100        286     

Note that the default `MAP` is not the "wanted" metrics for our project, as our sponsor specifically requested a metrics under the name "accuracy" and a target score of >.90.

## 1.2 detect with torch framework manually

This is a necessary step to use our accuracy evaluator.

In [18]:
model = torch.hub.load('ultralytics/yolov5', 'custom', path='/content/yolov5x6_best_weights.pt',force_reload=True)  # load our local model

Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /root/.cache/torch/hub/master.zip
YOLOv5 🚀 2022-10-18 Python-3.7.15 torch-1.12.1+cu113 CUDA:0 (Tesla T4, 15110MiB)

Fusing layers... 
Model summary: 416 layers, 140537980 parameters, 0 gradients, 209.1 GFLOPs
Adding AutoShape... 


In [19]:
# Load test imgs
test_dir = '/content/datasets/yoloTACO/images/test/'
test_list = test_ids # [i[2] for i in os.walk(test_dir)][0] # or alternatively read from files # test_list = [re.findall(r'\d+',i)[0] for i in test_list]

test_read_img_list = [Image.open(test_dir+str(i)+'.jpg') for i in test_list] # alternatively use cv2: cv2.imread('target_path')[..., ::-1]  # OpenCV image (BGR to RGB)

In [20]:
# Inference
results = model(test_read_img_list) # batch of images
pred_pd = results.pandas().xywh

for j,i in enumerate(pred_pd):
  i=i.assign(image_id=[test_list[j]]*i.shape[0])
  pred_pd[j]=i

In [21]:
# clear GPU mem
def free_memory(to_delete: list, debug=False):
    import gc
    import inspect
    calling_namespace = inspect.currentframe().f_back
    if debug:
        print('Before:')
        torch.get_less_used_gpu(debug=True)

    for _var in to_delete:
        calling_namespace.f_locals.pop(_var, None)
        gc.collect()
        torch.cuda.empty_cache()
    if debug:
        print('After:')
        torch.get_less_used_gpu(debug=True)

free_memory([model])

In [22]:
%%capture
!wget -O data/annotations.json https://raw.githubusercontent.com/pedropro/TACO/master/data/annotations.json
anno_path = './data/annotations.json'
annos = COCO(annotation_file=anno_path)
with open(anno_path, 'r') as f:
    annos_json = json.loads(f.read())
no_to_clname = {i:j for i,j in enumerate([i['name'] for i in annos_json['categories']])}


In [23]:
truth_pd = []
for i in test_list:
  img_info = annos.loadImgs(i)[0]    
  img_height = img_info['height']
  img_width = img_info['width']

  cache = pd.read_csv('/content/datasets/yoloTACO/labels/'+str(i)+'.txt',header=None,
                      names = ['class','xcenter','ycenter','width','height'],delimiter=' ')
  cache["xcenter"] = img_width * cache["xcenter"]
  cache["ycenter"] = img_height * cache["ycenter"]
  cache["width"] = img_width * cache["width"]
  cache["height"] = img_height * cache["height"]

  cache = cache.assign(confidence = [1]*cache.shape[0])
  cache = cache.reindex(columns=['xcenter','ycenter','width','height','confidence','class'])
  cache = cache.assign(image_id = [i]*cache.shape[0])

  # cache = cache.assign(img_width = [width]*cache.shape[0])
  # cache = cache.assign(img_height = [height]*cache.shape[0])

  truth_pd.append(cache)

## 1.3 example prediction and truth

In [24]:
pred_pd[:2] 
# predictions for first two images
# there will be a list of two dataframes

[       xcenter      ycenter       width      height  confidence  class  \
 0  1160.030762  2049.695557  682.435669  740.454224    0.885453     36   
 
            name  image_id  
 0  Plastic film        86  ,
        xcenter     ycenter       width      height  confidence  class  \
 0  1587.685059  496.165466  143.758911  260.257538    0.824172     29   
 1  1053.790283  664.041016   94.993835   78.028870    0.748516     29   
 2  1585.460449  496.756775  138.394287  270.051270    0.541024     51   
 
              name  image_id  
 0   Other plastic       171  
 1   Other plastic       171  
 2  Rope & strings       171  ]

In [25]:
pred_pd[1]

Unnamed: 0,xcenter,ycenter,width,height,confidence,class,name,image_id
0,1587.685059,496.165466,143.758911,260.257538,0.824172,29,Other plastic,171
1,1053.790283,664.041016,94.993835,78.02887,0.748516,29,Other plastic,171
2,1585.460449,496.756775,138.394287,270.05127,0.541024,51,Rope & strings,171


In [26]:
truth_pd[1]

Unnamed: 0,xcenter,ycenter,width,height,confidence,class,image_id
0,1045.8,669.300384,110.0,81.000192,1,58,171
1,1634.8,500.399808,240.0,268.00032,1,51,171


# 2. Accuracy evaluation

Usually, `Object Detection` tasks are measured by mAP, which is also the default metrics YoloV5 uses. You can also check Yolo's Precision and Recall metrics. 

However, if an `accuracy` metric is specifically needed, the following codes will do it.

For each object with a truth bounding box in each image, if there is a prediction bounding box that has an IOU > threshold with that truth bounding box, it is counted as `detected`.

For overall model `accuracy`, we count total number of `detected` of all images over total number of `predictions` of all images.

**Definition of `Accuracy`:**

$$\text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total number of predictions}}$$

A comparison of `Accuracy, Precision, Recall`:
$$
\text{Accuracy} = \frac{TP+TN}{TP+TN+FP+FN}
$$

$$\text{Precision} = \frac{TP}{TP+FP}
$$

$$\text{Recall} = \frac{TP}{TP+FN}
$$
CC: https://developers.google.com/machine-learning/crash-course/classification/accuracy

In [27]:
def bbox_iou(box1, box2, eps=1e-7):
  """
  CITATION: adapted from YOLOV5 utils, author, cr: ultralytics
  Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)
  Get the coordinates of bounding boxes, transform from xywh to xyxy
  """
  (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, 1), box2.chunk(4, 1)
  w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
  b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
  b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_

  inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
          (torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
  union = w1 * h1 + w2 * h2 - inter + eps
  return inter / union  # return IoU
  
def each_pic(pred_df,truth_df,iou_th,must_class):
  """
  returns number of objects (truth) and number of detection
  e.g. if there are 5 pieces of trash in an image and we predicted 2, it will return 5,2
  """
  pred_df_ = pred_df.assign(matched=[0]*pred_df.shape[0])
  nr_preds = pred_df.shape[0]
  nr_dets = 0
  for i in truth_df.iterrows():
    tbox_tensor = torch.tensor([i[1].tolist()[:4]])
    tlabel = i[1].tolist()[5]
    
    row_counter=0
    for j in pred_df_.iterrows():
      pbox_tensor = torch.tensor([j[1].tolist()[:4]])
      plabel = j[1].tolist()[5]
      matched = j[1].tolist()[-1]
      if must_class==True: # if the detection has to assign a correct class name. 
        if bbox_iou(tbox_tensor,pbox_tensor)>iou_th and matched==0 and tlabel==plabel:
          nr_dets+=1
          pred_df_.iat[row_counter,-1]=1 # mark matched bbox, so one prediction bbox wont be counted as "detected" for two different objects
          continue
      else: 
        if bbox_iou(tbox_tensor,pbox_tensor)>iou_th and matched==0:
          nr_dets+=1
          pred_df_.iat[row_counter,-1]=1
          continue
      row_counter+=1
  return nr_preds,nr_dets

def get_accuracy(pred,truth,iou_th=0.5,must_class=False):
  """
  pred: prediction list of dataframe
  truth: truth list of dataframe
  iou_th IOU threshold you define suitable
  must_class: controls whether the category need to be predicted correctly
              when set to false, only consider whether predicted bbox bounded objects correctly, 
              without considering if the correct class is identified
  """
  preds,dets=0,0
  for i in tqdm(range(len(truth))):
    p,d=each_pic(pred_pd[i],truth_pd[i],iou_th,must_class)
    preds+=p
    dets+=d
  return np.round(dets/preds,6)

**Accuracy**

In [28]:
accuracy = get_accuracy(pred_pd,truth_pd,iou_th=0.5,must_class=True)
print('\nOur trained model has an accuracy of: '+str(accuracy*100)+'%')

100%|██████████| 100/100 [00:00<00:00, 265.55it/s]


Our trained model has an accuracy of: 37.5691%





**pseudo-Accuracy**

Some detection tasks care only about "having a bounding box over the target object," they do not care about if the model label the object with a correct class. If you want such accuracy, it can be obtained by setting `must_class` to `False`. 

Below is an example:

In [29]:
accuracy = get_accuracy(pred_pd,truth_pd,iou_th=0.5,must_class=False)
print('\nOur trained model has an accuracy of: '+str(accuracy*100)+'%')

100%|██████████| 100/100 [00:00<00:00, 290.03it/s]


Our trained model has an accuracy of: 77.3481%



