Skip to content

Latest commit

 

History

History
143 lines (119 loc) · 10.5 KB

01b_pose_estimation.rst

File metadata and controls

143 lines (119 loc) · 10.5 KB

Pose Estimation Models

List of Pose Estimation Models

The table below shows the pose estimation models available for each task category.

Category Model Documentation
Whole body HRNet :mod:`model.hrnet`
PoseNet :mod:`model.posenet`
MoveNet :mod:`model.movenet`

Benchmarks

Inference Speed

The table below shows the frames per second (FPS) of each model type.

Model Type Size CPU GPU
single multiple single multiple
PoseNet 50 225 64.46 51.95 136.31 89.37
75 225 57.62 47.01 132.84 83.73
100 225 44.70 37.60 132.73 81.24
resnet 225 18.77 17.21 73.15 51.65
HRNet (YOLO) (v4tiny) 256 × 192 (416) 5.86 1.09 21.91 13.86
MoveNet SinglePose Lightning 192 40.78 40.54 99.47 --
SinglePose Thunder 256 25.13 24.87 92.05 --
MultiPose Lightning 256 or multiple of 32 25.33 24.90 80.64 79.32

Hardware

The following hardware were used to conduct the FPS benchmarks:
- CPU: 2.8 GHz 4-Core Intel Xeon (2020, Cascade Lake) CPU and 16GB RAM
- GPU: NVIDIA A100, paired with 2.2 GHz 6-Core Intel Xeon CPU and 85GB RAM

Test Conditions

The following test conditions were followed:
- :mod:`input.visual`, the model of interest, and :mod:`dabble.fps` nodes were used to perform inference on videos
- 2 videos were used to benchmark each model, one with only 1 human (single), and the other with multiple humans (multiple)
- Both videos are about 1 minute each, recorded at ~30 FPS, which translates to about 1,800 frames to process per video
- 1280×720 (HD ready) resolution was used, as a bridge between 640×480 (VGA) of poorer quality webcams, and 1920×1080 (Full HD) of CCTVs

Model Accuracy

The table below shows the performance of our pose estimation models using the keypoint evaluation metrics from COCO. Description of these metrics can be found here.

Model Type Size AP AP OKS=.50 AP OKS=.75 AP medium AP large AR AR OKS=.50 AR OKS=.75 AR medium AR large
PoseNet 50 225 5.2 15.5 2.7 0.8 11.8 9.6 22.7 7.1 1.4 20.7
75 225 7.2 19.7 3.6 1.3 15.9 12.1 26.5 9.3 2.2 25.5
100 225 7.7 20.8 4.4 1.5 17.1 12.6 27.7 10.1 2.3 26.5
resnet 225 11.9 27.4 8.3 2.2 25.3 17.3 32.5 15.9 2.9 36.8
HRNet (YOLO) (v4tiny) 256 × 192 (416) 35.8 61.5 37.5 30.1 44.0 40.2 64.4 42.7 33.0 50.2
MoveNet singlepose_lightning 256 x 256 7.3 15.7 5.7 1.3 15.4 8.8 17.6 7.7 1.1 19.2
singlepose_thunder 256 x 256 11.6 21.3 10.7 3.0 23.1 13.1 22.5 12.8 2.8 27.1
multipose_lightning 256 x 256 18.7 36.8 16.3 9.0 31.8 21.0 38.5 19.2 9.3 37.0

Dataset

The MS COCO (val 2017) dataset is used. We integrated the COCO API into the PeekingDuck pipeline for loading the annotations and evaluating the outputs from the models. All values are reported in percentages.

All images from the "person" category in the MS COCO (val 2017) dataset were processed.

Test Conditions

The following test conditions were followed:
- The tests were performed using pycocotools on the MS COCO dataset
- The evaluation metrics have been compared with the original repository of the respective pose estimation models for consistency

Keypoint IDs

Whole Body

Keypoint ID Keypoint ID
nose 0 left wrist 9
left eye 1 right wrist 10
right eye 2 left hip 11
left ear 3 right hip 12
right ear 4 left knee 13
left shoulder 5 right knee 14
right shoulder 6 left ankle 15
left elbow 7 right ankle 16
right elbow 8