Skip to content

Based on "Human Pose Regression with Residual Log-likelihood Estimation"

License

Notifications You must be signed in to change notification settings

Junhojuno/rle-human-pose-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human Pose Regression with Tensorflow

Human Pose Regression(HPR) is simple to estimate keypoints of human since it does not have any postprocess that transforms heatmaps to coordinates. HPR has a drawback that its accuracy is much lower than that of heatmap-based models. but recently, with flow-based model, HPR has so improved that it can be worth replace heatmap-based model.

Human Pose Regression with Residual Log-likelihood Estimation
Jiefeng Li, Siyuan Bian, Ailing Zeng, Can Wang, Bo Pang, Wentao Liu, Cewu Lu
ICCV 2021 Oral


Looking into the officials, there are only basic sources for reproducing scores written on the paper. Ummm...those are also important but practical experiments should be executed, such as test with mobile backbone, mobile deployment, ... etc. Let's have these!

Results

COCO Validation Set

To compare with the official results, regression model(Tensorflow) has trained on MSCOCO and the official configuration.

Model input shape #Params
(M)
GFLOPs AP AP.5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
Benchmark
(ResNet50)
256x192 23.6 4.0 0.713 0.889 0.783 - - - - - - -
Ours
(ResNet50)
256x192 23.6 3.78 0.694 0.904 0.760 0.668 0.736 0.727 0.912 0.786 0.695 0.776
  • AP is calculated on flip_test=True

Look into more: lightweight backbones

The backbones used in the paper are ResNet50 and HRNet which are not suitable on mobile devices. There are some tests applying lightweight backbones on this model. The backbones are like the below.

  • Basically MoibleNetV2, which is the worldwide-used backbone network.
  • EfficientNet-B0, which has a considerable score with fast inference.
  • GhostNetV2, which has more params but, more efficient than any other backbones.

After training, something noticable is that there is a small amount of difference between flip=true and flip=false, which is much lower than that of heatmap-based models.

Model input shape #Params
(M)
GFLOPs model size
(MB)
latency
(fps)
AP(flip=True) AP(flip=False)
Ours
(MobileNetV2)
256x192 2.31 0.2935 4.7 10~11 0.614 0.600
Ours
(EfficientNet-B0)
256x192 4.09 0.3854 8.3 5~6 0.671 0.665
Ours
(GhostNetV2 1.0x)
256x192 3.71 0.1647 7.6 9~10 0.632 0.624
  • AP is calcualted flip=False, because the flip inference is not used on mobile.
  • The model is tested on Galaxy Tab A7 with num_threads=4.
  • GLOPs has no effect on FPS more than size of model and number of parameters in model.

Look into more: small inputs

Since Galaxy Tab A7 is less powerful than recent devices or iOS pads, it is hard to make its latency realtime-level even if our models are so lightweight. I think those models has more less latency on Galaxy Tab S7 above or iPad Pros.

Model input shape #Params
(M)
GFLOPs fps AP AP.5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
GhostNetV2 224x160 3.71 0.1187 10~11 0.597 0.859 0.670 0.574 0.638 0.635 0.871 0.701 0.604 0.681
EfficientNetB0 224x160 4.09 0.2810 6~7 0.645 0.882 0.717 0.623 0.680 0.680 0.893 0.746 0.651 0.723
GhostNetV2 192x128 3.71 0.0832 12~13 0.565 0.839 0.627 0.549 0.594 0.605 0.853 0.666 0.580 0.643
EfficientNetB0 192x128 4.09 0.1929 8~9 0.608 0.862 0.675 0.586 0.644 0.645 0.875 0.710 0.614 0.690

Setup

environment

All the things in this repo are based on Ubuntu 18.04, and before starting, docker, nvidia-docker should be installed.

docker build -t rle:tf .

project tree

Before cloning this repo, you have to set the dir tree like below. if not, the codes all will throw errors.

root
├── datasets
│   └── mscoco
│        ├── annotations
│        └── images
├── $project_dir
│   ├── src/
│   ├── train.py
│   ├── evaluate.py
│   ├── README
│   └── ...
└── ...

data

Train & evaluation are operated on tfrecord files. so download the raw dataset form https://cocodataset.org and convert it to .tfrecord.

# after running command below, `tfrecords` directory is made.
root
├── datasets
│   └── mscoco
│        ├── annotations
│        └── images
│        └── **tfrecords**
├── $project_dir
│   └── ...
└── ...

According to the dir tree mentioned above, it is easy to convert, just run the code below. If not following the tree, should change the current dir using -c option on command line.

python write_tfrecord.py

training

python train.py -c config/256x192_res50_regress-flow.yaml

export

python export.py -b ${BACKBONE_TYPE} -w ${WEIGHT_PATH}

# e.g.
python export.py -b resnet50 -w results/resnet50/ckpt/best_model.tf

More to improve accuracy

More to get faster

References

About

Based on "Human Pose Regression with Residual Log-likelihood Estimation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages