Human Pose Regression with Tensorflow

Human Pose Regression(HPR) is simple to estimate keypoints of human since it does not have any postprocess that transforms heatmaps to coordinates. HPR has a drawback that its accuracy is much lower than that of heatmap-based models. but recently, with flow-based model, HPR has so improved that it can be worth replace heatmap-based model.

Human Pose Regression with Residual Log-likelihood Estimation
Jiefeng Li, Siyuan Bian, Ailing Zeng, Can Wang, Bo Pang, Wentao Liu, Cewu Lu
ICCV 2021 Oral

Looking into the officials, there are only basic sources for reproducing scores written on the paper. Ummm...those are also important but practical experiments should be executed, such as test with mobile backbone, mobile deployment, ... etc. Let's have these!

Results

COCO Validation Set

To compare with the official results, regression model(Tensorflow) has trained on MSCOCO and the official configuration.

Model	input shape	#Params (M)	GFLOPs	AP	AP.5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
Benchmark (ResNet50)	256x192	23.6	4.0	0.713	0.889	0.783	-	-	-	-	-	-	-
Ours (ResNet50)	256x192	23.6	3.78	0.694	0.904	0.760	0.668	0.736	0.727	0.912	0.786	0.695	0.776

AP is calculated on flip_test=True

Look into more: lightweight backbones

The backbones used in the paper are ResNet50 and HRNet which are not suitable on mobile devices. There are some tests applying lightweight backbones on this model. The backbones are like the below.

Basically MoibleNetV2, which is the worldwide-used backbone network.
EfficientNet-B0, which has a considerable score with fast inference.
GhostNetV2, which has more params but, more efficient than any other backbones.

After training, something noticable is that there is a small amount of difference between flip=true and flip=false, which is much lower than that of heatmap-based models.

Model	input shape	#Params (M)	GFLOPs	model size (MB)	latency (fps)	AP(flip=True)	AP(flip=False)
Ours (MobileNetV2)	256x192	2.31	0.2935	4.7	10~11	0.614	0.600
Ours (EfficientNet-B0)	256x192	4.09	0.3854	8.3	5~6	0.671	0.665
Ours (GhostNetV2 1.0x)	256x192	3.71	0.1647	7.6	9~10	0.632	0.624

AP is calcualted flip=False, because the flip inference is not used on mobile.
The model is tested on Galaxy Tab A7 with num_threads=4.
GLOPs has no effect on FPS more than size of model and number of parameters in model.

Look into more: small inputs

Since Galaxy Tab A7 is less powerful than recent devices or iOS pads, it is hard to make its latency realtime-level even if our models are so lightweight. I think those models has more less latency on Galaxy Tab S7 above or iPad Pros.

Model	input shape	#Params (M)	GFLOPs	fps	AP	AP.5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
GhostNetV2	224x160	3.71	0.1187	10~11	0.597	0.859	0.670	0.574	0.638	0.635	0.871	0.701	0.604	0.681
EfficientNetB0	224x160	4.09	0.2810	6~7	0.645	0.882	0.717	0.623	0.680	0.680	0.893	0.746	0.651	0.723
GhostNetV2	192x128	3.71	0.0832	12~13	0.565	0.839	0.627	0.549	0.594	0.605	0.853	0.666	0.580	0.643
EfficientNetB0	192x128	4.09	0.1929	8~9	0.608	0.862	0.675	0.586	0.644	0.645	0.875	0.710	0.614	0.690

Setup

environment

All the things in this repo are based on Ubuntu 18.04, and before starting, docker, nvidia-docker should be installed.

docker build -t rle:tf .

project tree

Before cloning this repo, you have to set the dir tree like below. if not, the codes all will throw errors.

root
├── datasets
│   └── mscoco
│        ├── annotations
│        └── images
├── $project_dir
│   ├── src/
│   ├── train.py
│   ├── evaluate.py
│   ├── README
│   └── ...
└── ...

data

Train & evaluation are operated on tfrecord files. so download the raw dataset form https://cocodataset.org and convert it to .tfrecord.

# after running command below, `tfrecords` directory is made.
root
├── datasets
│   └── mscoco
│        ├── annotations
│        └── images
│        └── **tfrecords**
├── $project_dir
│   └── ...
└── ...

According to the dir tree mentioned above, it is easy to convert, just run the code below. If not following the tree, should change the current dir using -c option on command line.

python write_tfrecord.py

training

python train.py -c config/256x192_res50_regress-flow.yaml

export

python export.py -b ${BACKBONE_TYPE} -w ${WEIGHT_PATH}

# e.g.
python export.py -b resnet50 -w results/resnet50/ckpt/best_model.tf

More to improve accuracy

use other data and fine-tuning

More to get faster

lower FLOPs, better FLOPS

References

Jeff-sjtu/res-loglikelihood-regression

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
config		config
demo		demo
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
export.py		export.py
train.py		train.py
train.sh		train.sh
write_tfrecord.py		write_tfrecord.py

License

Junhojuno/rle-human-pose-regression

Folders and files

Latest commit

History

Repository files navigation