# Offline evaluation of ONNX Pose Estimation models

This example shows how to evaluate **exported** pose estimation models using pycocotools and onnxrunime package.
Although we provide a ready-to-use metric class to compute average precision (AP) and average recall (AR) scores, the 
evaluation protocol during validation is slightly different from what pycocotools suggests for academic evaluation.

In particular:

## SG

* In SG, during training/validation, we resize all images to a fixed size (Default is 640x640) using aspect-ratio preserving resize of the longest size + padding. 
* Our metric evaluate AP/AR in the resolution of the resized & padded images, **not in the resolution of original image**. 


## COCOEval

* In COCOEval all images are not resized and pose predictions are evaluated in the resolution of original image 

Because of this discrepancy, metrics reported by `PoseEstimationMetrics` class is usually a bit lower (Usually by ~1AP) than the ones 
you would get from the same model if computed with COCOEval. 

For this reason we provide this example to show how you can compute metrics using COCOEval for pose estimation models that are available in SuperGradients.

## Instantiate the model for evaluation

First, let's instantiate the model we are going to evaluate. 
You can use either pretrained models or provide a checkpoint path to your own trained checkpoint.

```python
# This is how you can load your custom checkpoint instead of pretrained one
model = models.get(
    Models.YOLO_NAS_POSE_L,
    num_classes=17,
    checkpoint_path="G:/super-gradients/checkpoints/coco2017_yolo_nas_pose_l_ckpt_best.pth",
)
```
In this example we will be using pretrained weights for simplicity.

In [1]:
from super_gradients.conversion import DetectionOutputFormatMode
from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(
    Models.YOLO_NAS_POSE_N,
    num_classes=17,
    checkpoint_path="G:\super-gradients\checkpoints\coco2017_yolo_nas_pose_final\yolo_nas_pose_n_coco_pose.pth",
).cuda()

result = model.export(
    "yolo_nas_pose_n.onnx",
    confidence_threshold=0.01,
    max_predictions_per_image=20, nms_threshold=0.7, output_predictions_format=DetectionOutputFormatMode.FLAT_FORMAT
)
result


[2023-10-12 12:40:08] INFO - crash_tips_setup.py - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it


The console stream is logged into C:\Users\ekhve\sg_logs\console.log


W1012 12:40:10.951724 19384 redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
W1012 12:40:14.608000 19384 env_sanity_check.py:31] [31mFailed to verify operating system: Deci officially supports only Linux kernels. Some features may not work as expected.[0m


verbose: False, log level: Level.ERROR

verbose: False, log level: Level.ERROR





Model exported successfully to yolo_nas_pose_n.onnx
Model expects input image of shape [1, 3, 640, 640]
Input image dtype is torch.uint8

Exported model already contains preprocessing (normalization) step, so you don't need to do it manually.
Preprocessing steps to be applied to input image are:
Sequential(
  (0): CastTensorTo(dtype=torch.float32)
  (1): ChannelSelect(channels_indexes=tensor([2, 1, 0], device='cuda:0'))
  (2): ApplyMeanStd(mean=[0.], scale=[255.])
)


Exported model contains postprocessing (NMS) step with the following parameters:
    num_pre_nms_predictions=1000
    max_predictions_per_image=20
    nms_threshold=0.7
    confidence_threshold=0.01
    output_predictions_format=flat


Exported model is in ONNX format and can be used with ONNXRuntime
To run inference with ONNXRuntime, please use the following code snippet:

    import onnxruntime
    import numpy as np
    session = onnxruntime.InferenceSession("yolo_nas_pose_n.onnx", providers=["CUDAExecutionProvider", 

## Prepare COCO validation data

Next, we obtain list of images in COCO2017 validation set and load their annotations.
You may want to either set the COCO_ROOT_DIR environment variable where COCO2017 data is located on your machine or edit the default path directylu

In [2]:
import os

COCO_DATA_DIR = os.environ.get("COCO_ROOT_DIR", "g:/coco2017")
os.listdir(COCO_DATA_DIR)

['annotations', 'images']

Once data is set we can load it

In [3]:
from pycocotools.cocoeval import COCOeval

In [4]:
from pycocotools.coco import COCO

images_path = os.path.join(COCO_DATA_DIR, "images/val2017")
image_files = [os.path.join(images_path, x) for x in os.listdir(images_path)]

gt_annotations_path = os.path.join(COCO_DATA_DIR, "annotations/person_keypoints_val2017.json")
gt = COCO(gt_annotations_path)

loading annotations into memory...


In [20]:
import onnxruntime
import numpy as np

session = onnxruntime.InferenceSession("yolo_nas_pose_n.onnx",
                                       providers=[
                                           "CUDAExecutionProvider", 
                                           "CPUExecutionProvider"
                                        ])
inputs = [o.name for o in session.get_inputs()]
outputs = [o.name for o in session.get_outputs()]
inputs, outputs


(['onnx::Cast_0'], ['graph2_flat_predictions'])

In [21]:
from super_gradients.training.utils.predict import PoseEstimationPrediction
from super_gradients.module_interfaces import PoseEstimationPredictions
import torch
from super_gradients.training.processing import ComposeProcessing
from super_gradients.training.processing.processing import KeypointsLongestMaxSizeRescale
from super_gradients.training.processing.processing import KeypointsBottomRightPadding
from super_gradients.training.processing import ImagePermute
import cv2
from tqdm import tqdm

image_processor = ComposeProcessing(
    [
        KeypointsLongestMaxSizeRescale(output_shape=(640, 640)),
        KeypointsBottomRightPadding(output_shape=(640, 640), pad_value=127),
        ImagePermute(permutation=(2, 0, 1)),
    ]
)

image_original = cv2.imread(image_files[3])[..., ::-1]  # BGR -> RGB

# Resize image to 640x640
image_resized, metadata = image_processor.preprocess_image(image_original)
model_input = np.expand_dims(image_resized, 0)  # [1,3,640,640]
[flat_predictions] = session.run(outputs, {inputs[0]: model_input})

pred_bboxes = flat_predictions[:, 1:5]
pred_scores = flat_predictions[:, 5]
pred_joints = flat_predictions[:, 6:].reshape(len(pred_bboxes), -1, 3)

p = PoseEstimationPrediction(
    poses=pred_joints,
    scores=pred_scores,
    bboxes_xyxy=pred_bboxes,
    edge_links=np.zeros((0,2)),
    edge_colors=np.zeros((0,3)),
    keypoint_colors=np.random.randint(0, 255, (pred_joints.shape[1], 3), dtype=np.uint8),
    image_shape=(model_input.shape[2], model_input.shape[3]),
)
metadata

ComposeProcessingMetadata(metadata_lst=[RescaleMetadata(original_shape=(500, 375), scale_factor_h=1.28, scale_factor_w=1.28), DetectionPadToSizeMetadata(padding_coordinates=PaddingCoordinates(top=0, bottom=0, left=0, right=160)), None])

In [22]:
p

PoseEstimationPrediction(poses=array([[[ 9.33892746e+01,  3.71910980e+02,  2.40387648e-01],
        [ 8.84057465e+01,  3.70263672e+02,  2.12954700e-01],
        [ 9.35746536e+01,  3.70570801e+02,  2.05087841e-01],
        ...,
        [ 9.88854828e+01,  4.10569458e+02,  2.41989732e-01],
        [ 8.94517212e+01,  4.17730042e+02,  1.46408051e-01],
        [ 1.00782990e+02,  4.17586487e+02,  1.42147750e-01]],

       [[ 1.18740906e+02,  3.46397156e+02,  4.15875465e-01],
        [ 1.18329468e+02,  3.43987854e+02,  2.89114654e-01],
        [ 1.17013855e+02,  3.44044250e+02,  4.05241132e-01],
        ...,
        [ 1.19035332e+02,  4.14153992e+02,  2.14025617e-01],
        [ 1.07612869e+02,  4.25690491e+02,  1.13747567e-01],
        [ 1.17378853e+02,  4.25458984e+02,  1.11210644e-01]],

       [[ 4.15635132e+02,  5.14472900e+02,  2.96225727e-01],
        [ 4.18026947e+02,  5.10613037e+02,  3.20348173e-01],
        [ 4.17845551e+02,  5.10465118e+02,  1.34563357e-01],
        ...,
        [ 4

In [23]:
p = image_processor.postprocess_predictions(p, metadata)
p

PoseEstimationPrediction(poses=array([[[ 7.29603729e+01,  2.90555450e+02,  2.40387648e-01],
        [ 6.90669861e+01,  2.89268494e+02,  2.12954700e-01],
        [ 7.31052017e+01,  2.89508423e+02,  2.05087841e-01],
        ...,
        [ 7.72542801e+01,  3.20757385e+02,  2.41989732e-01],
        [ 6.98841553e+01,  3.26351593e+02,  1.46408051e-01],
        [ 7.87367096e+01,  3.26239441e+02,  1.42147750e-01]],

       [[ 9.27663345e+01,  2.70622772e+02,  4.15875465e-01],
        [ 9.24449005e+01,  2.68740509e+02,  2.89114654e-01],
        [ 9.14170761e+01,  2.68784576e+02,  4.05241132e-01],
        ...,
        [ 9.29963531e+01,  3.23557800e+02,  2.14025617e-01],
        [ 8.40725555e+01,  3.32570709e+02,  1.13747567e-01],
        [ 9.17022324e+01,  3.32389832e+02,  1.11210644e-01]],

       [[ 3.24714935e+02,  4.01931946e+02,  2.96225727e-01],
        [ 3.26583557e+02,  3.98916443e+02,  3.20348173e-01],
        [ 3.26441833e+02,  3.98800873e+02,  1.34563357e-01],
        ...,
        [ 3

In [24]:
predictions = []

for image_file in tqdm(image_files):
    image_original = cv2.imread(image_file)[..., ::-1]  # BGR -> RGB

    # Resize image to 640x640
    image_resized, metadata = image_processor.preprocess_image(image_original)
    model_input = np.expand_dims(image_resized, 0)  # [1,3,640,640]
    [flat_predictions] = session.run(outputs, {inputs[0]: model_input})

    pred_bboxes = flat_predictions[:, 1:5]
    pred_scores = flat_predictions[:, 5]
    pred_joints = flat_predictions[:, 6:].reshape(len(pred_bboxes), -1, 3)

    p = PoseEstimationPrediction(
        poses=pred_joints,
        scores=pred_scores,
        bboxes_xyxy=pred_bboxes,
        edge_links=np.zeros((0,2)),
        edge_colors=np.zeros((0,3)),
        keypoint_colors=np.random.randint(0, 255, (pred_joints.shape[1], 3), dtype=np.uint8),
        image_shape=(model_input.shape[2], model_input.shape[3]),
    )
    p = image_processor.postprocess_predictions(p, metadata)

    predictions.append(PoseEstimationPredictions(
        poses=p.poses,
        scores=p.scores,
        bboxes_xyxy=p.bboxes_xyxy,
    ))


<xml><var name="_dummy_ipython_val"  />
<var name="_dummy_special_var"  />
<var name="COCO_DATA_DIR" type="str" qualifier="builtins" value="g%3A/coco2017" />
<var name="ComposeProcessing" type="ABCMeta" qualifier="abc" value="%3Cclass %27super_gradients.training.processing.processing.ComposeProcessing%27&gt;" isContainer="True" />
<var name="DetectionOutputFormatMode" type="EnumMeta" qualifier="enum" value="%3Cenum %27DetectionOutputFormatMode%27&gt;" isContainer="True" shape="2" />
<var name="ImagePermute" type="ABCMeta" qualifier="abc" value="%3Cclass %27super_gradients.training.processing.processing.ImagePermute%27&gt;" isContainer="True" />
<var name="KeypointsBottomRightPadding" type="ABCMeta" qualifier="abc" value="%3Cclass %27super_gradients.training.processing.processing.KeypointsBottomRightPadding%27&gt;" isContainer="True" />
<var name="KeypointsLongestMaxSizeRescale" type="ABCMeta" qualifier="abc" value="%3Cclass %27super_gradients.training.processing.processing.KeypointsLon

100%|█████████▉| 4999/5000 [07:33<00:00, 11.00it/s]

In [27]:
import copy
import json_tricks as json
import collections
import numpy as np
import tempfile


def predictions_to_coco(predictions, image_files):
    predicted_poses = []
    predicted_scores = []
    non_empty_image_ids = []
    for image_file, image_predictions in zip(image_files, predictions):
        non_empty_image_ids.append(int(os.path.splitext(os.path.basename(image_file))[0]))
        predicted_poses.append(image_predictions.poses)
        predicted_scores.append(image_predictions.scores)

    coco_pred = _coco_convert_predictions_to_dict(predicted_poses, predicted_scores, non_empty_image_ids)
    return coco_pred


def _coco_process_keypoints(keypoints):
    tmp = keypoints.copy()
    if keypoints[:, 2].max() > 0:
        num_keypoints = keypoints.shape[0]
        for i in range(num_keypoints):
            tmp[i][0:3] = [float(keypoints[i][0]), float(keypoints[i][1]), float(keypoints[i][2])]

    return tmp


def _coco_convert_predictions_to_dict(predicted_poses, predicted_scores, image_ids):
    kpts = collections.defaultdict(list)
    for poses, scores, image_id_int in zip(predicted_poses, predicted_scores, image_ids):

        for person_index, kpt in enumerate(poses):
            area = (np.max(kpt[:, 0]) - np.min(kpt[:, 0])) * (np.max(kpt[:, 1]) - np.min(kpt[:, 1]))
            kpt = _coco_process_keypoints(kpt)
            kpts[image_id_int].append(
                {"keypoints": kpt[:, 0:3], "score": float(scores[person_index]), "image": image_id_int, "area": area})

    oks_nmsed_kpts = []
    # image x person x (keypoints)
    for img in kpts.keys():
        # person x (keypoints)
        img_kpts = kpts[img]
        # person x (keypoints)
        # do not use nms, keep all detections
        keep = []
        if len(keep) == 0:
            oks_nmsed_kpts.append(img_kpts)
        else:
            oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep])

    classes = ["__background__", "person"]
    _class_to_coco_ind = {cls: i for i, cls in enumerate(classes)}

    data_pack = [
        {"cat_id": _class_to_coco_ind[cls], "cls_ind": cls_ind, "cls": cls, "ann_type": "keypoints",
         "keypoints": oks_nmsed_kpts}
        for cls_ind, cls in enumerate(classes)
        if not cls == "__background__"
    ]

    results = _coco_keypoint_results_one_category_kernel(data_pack[0], num_joints=17)
    return results


def _coco_keypoint_results_one_category_kernel(data_pack, num_joints: int):
    cat_id = data_pack["cat_id"]
    keypoints = data_pack["keypoints"]
    cat_results = []

    for img_kpts in keypoints:
        if len(img_kpts) == 0:
            continue

        _key_points = np.array([img_kpts[k]["keypoints"] for k in range(len(img_kpts))])
        key_points = np.zeros((_key_points.shape[0], num_joints * 3), dtype=np.float32)

        for ipt in range(num_joints):
            key_points[:, ipt * 3 + 0] = _key_points[:, ipt, 0]
            key_points[:, ipt * 3 + 1] = _key_points[:, ipt, 1]
            # keypoints score.
            key_points[:, ipt * 3 + 2] = _key_points[:, ipt, 2]

        for k in range(len(img_kpts)):
            kpt = key_points[k].reshape((num_joints, 3))
            left_top = np.amin(kpt, axis=0)
            right_bottom = np.amax(kpt, axis=0)

            w = right_bottom[0] - left_top[0]
            h = right_bottom[1] - left_top[1]

            cat_results.append(
                {
                    "image_id": img_kpts[k]["image"],
                    "category_id": cat_id,
                    "keypoints": list(key_points[k]),
                    "score": img_kpts[k]["score"],
                    "bbox": list([left_top[0], left_top[1], w, h]),
                }
            )

    return cat_results


coco_pred = predictions_to_coco(predictions, image_files)

with tempfile.TemporaryDirectory() as td:
    res_file = os.path.join(td, "keypoints_coco2017_results.json")

    with open(res_file, "w") as f:
        json.dump(coco_pred, f)

    coco_dt = copy.deepcopy(gt)
    coco_dt = coco_dt.loadRes(res_file)

    coco_evaluator = COCOeval(gt, coco_dt, iouType="keypoints")
    coco_evaluator.evaluate()  # run per image evaluation
    coco_evaluator.accumulate()  # accumulate per image results
    coco_evaluator.summarize()  # display summary metrics of results



Loading and preparing results...
DONE (t=4.60s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *keypoints*
DONE (t=17.84s).
Accumulating evaluation results...


In [28]:
coco_evaluator.stats

array([0.59636437, 0.83149004, 0.65547537, 0.53969412, 0.68500305,
       0.65593514, 0.87893577, 0.71095718, 0.5970773 , 0.73823857])