Skip to content

Extending towards Hrnet as 2D joint detector #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
timtensor opened this issue Oct 29, 2019 · 8 comments
Closed

Extending towards Hrnet as 2D joint detector #4

timtensor opened this issue Oct 29, 2019 · 8 comments
Labels
feature A Requested feature

Comments

@timtensor
Copy link

Hi , first of all great work. I was wondering if it could be extended to HrNET as it is supposed to highly accurate ? Here is an implementation of it . I think it is possible , to dump the json file per frame format in for the keypoints. It is based on coco keypoints . Link to the repo
simpleHRNET

There is a demo script here demo_script

The keypoints are outputted here keypoints
The keypoint is array type of Nx17x3 where N is number of persons. Please let me know what you think about it ?

@AmmarkoV
Copy link
Collaborator

AmmarkoV commented Oct 30, 2019

Hello!
Thank you for your kind words!

Any source of 2D joints can be used "out of the box" as long as it has the following Joints :
HIP,NECK,HEAD,RSHOULDER,RELBOW,RHAND,LSHOULDER,LELBOW,LHAND,RHIP,RKNEE,RFOOT,LHIP,LKNEE,LFOOT since they are the joints used to generate the NSDM matrices internally used by the neural network, as seen in the following illustrations.

image

image

Right now the use case was a real-time demo of a single person, but due to the fast-enough evaluation speed on the 2D to 3D part multiple persons could be handled using iterative runs for every skeleton detected ( framerate should be ok for 1-3 persons but gradually slower ).

The easiest way to do a conversion from an arbitrary 2D joint estimator I think is by using the CSV file format -> https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/dataset/sample.csv

If the output is dumped in a csv file using this format then output can be very quickly tested through MocapNET using :

./MocapNETJSON --from YourDataset.csv --visualize

The CSV file format is very easy to write and parse (especially from python), the only caveat and possible pitfall is that the csv file has normalized coordinates that are expected to have a 1.777* aspect ratio since the original cameras I am targeting are GoPro cameras configured for 1920x1080@120fps+. If you have a different video input resolution the normalization step will have to respect this aspect ratio. Of course the code that I use to preserve the aspect ratio regardless of input is included in the repository and can be used for reference https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/MocapNETLib/jsonMocapNETHelpers.cpp#L498 and using the normalizeWhileAlsoMatchingTrainingAspectRatio call https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/MocapNETLib/jsonMocapNETHelpers.cpp#L174

That being said, I will clone HRNET and try it out :)
2D accuracy is very important in two-stage 3D pose estimation!
Multiple person 3D tracking would also be very cool!

@timtensor
Copy link
Author

Thank you for the reply. I am trying out different permutations as well. Not really a pro , learning by trying. I must mention , with the yolov3 implementation the accruacy has greatly improved. One thing to note is , that is computationally quite heavy the 2D detection part of it .

@AmmarkoV
Copy link
Collaborator

Last version of yolo I had checked out was yolov2, but only for detection of objects and not persons. In any case testing with hrnet would be initially more like an offline experiment especially since hrnet is python/pytorch while this repo is C++/tensorflow

@timtensor
Copy link
Author

Yes i totally agree , as a start it should be done in locally saved videos .If i understand correctly, i might be wrong , you need the keypoints per frame as an input to the MocapNET module right .

@AmmarkoV
Copy link
Collaborator

AmmarkoV commented Oct 30, 2019

Yes, you need at least the hip, neck, head, rshoulder, relbow, rhand, lshoulder, lelbow, lhand, rhip, rknee, rfoot, lhip, lknee, lfoot joint 2D positions organized as 2DXhip,2DYhip,Vhip , ... where V is a visibility flag that is 1 when the joint is visible and 0 when joint is invisible.

The sample CSV file shows the full joint list received from OpenPose Body+Hands 2D Output

The full list of input has 171 elements ( 57 triplets of X2D,Y2D,VisibilityFlag )

by populating an std::vector with 171 values with the correct order and running the runMocapNET call you get back another vector with the full body BVH configuration that needs no inverse kinematics and cant be directly used to animate a model.

This can be also visualized from the main application of course
http://ammar.gr/mocapnet/mocapnetogl.ogv

@timtensor
Copy link
Author

Thanks a lot for the infromation. I still couldnt manage to extend it to the simple hrnet as 2D detector. I am doing everything off line at this moment

@AmmarkoV
Copy link
Collaborator

Hello, if you have a sample small CSV file you generated ( like this ) I can take a look at it to maybe help you resolve the problem..

@AmmarkoV AmmarkoV added the feature A Requested feature label Jun 19, 2020
@AmmarkoV
Copy link
Collaborator

I have given a CSV file example, that can be used to package any 2D estimator output and enable its processing by MocapNET, adding native support for multiple 2D estimators is beyond the scope of this repository so I am closing to this issue! :)
In case of questions on how to package 2D input feel for MocapNET free to open a new issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A Requested feature
Projects
None yet
Development

No branches or pull requests

2 participants