Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does anyone reproduce the plausible result using this source code? #17

Open
JECULAI opened this issue Apr 17, 2021 · 48 comments
Open

Does anyone reproduce the plausible result using this source code? #17

JECULAI opened this issue Apr 17, 2021 · 48 comments

Comments

@JECULAI
Copy link

JECULAI commented Apr 17, 2021

Does anyone reproduce the plausible result using this source code?
I have tried several times, but the generated skeleton always looks weird, can anyone help to figure that out?

@Eddie-Hwang
Copy link

Hi, I actually re-implemented ben's work using pytorch lightning. I can successfully trained the model but have some issue on prediction.

@JECULAI
Copy link
Author

JECULAI commented May 21, 2021

My partners and I also successfully train the model, but it seems that this proposed method just doesn't work. The results of inference look bad, and the author do not willing ti share the pretrained model or even the log files. I question the quality of this paper.

Repository owner deleted a comment from florinshen May 21, 2021
@BenSaunders27
Copy link
Owner

BenSaunders27 commented May 21, 2021

Hi all,

I am sorry that you are finding this code difficult to work with and are not getting great results. I apologise for the late response also.

I am looking into why the inference results are looking bad, as I have not found this before. Can I check that everyone is using Guassian Noise augmentation? By setting gaussian_noise: True and noise_rate: 5. Without this, the inference will be bad.

Do the Ground Truth skeletons look ok? If not, the issue may be with the data preparation. Please follow the Data section in the README and start with the example data provided in /Data/tmp. Unfortunately I cannot share the full data.

Thanks,
Ben

@Eddie-Hwang
Copy link

Eddie-Hwang commented May 21, 2021

is it right form of greedy decode?
in the source code, I found that the greedy decoding method was different from what I expected.
As you can see in your code, the model continues to take the reference pose and predict the next pose. (https://github.com/BenSaunders27/ProgressiveTransformersSLP/blob/master/search.py)

I have tested two different options (your greedy decoding vs. original greedy decoding)
Please see attached video files

Can you please explain how you decoded the model outputs?

25October_2010_Monday_tagesschau-17_normal_greedy.2.mp4
25October_2010_Monday_tagesschau-17_ben_greedy_.1.mp4

@JECULAI
Copy link
Author

JECULAI commented May 21, 2021

It should be noted that the greedy decode code is right, have you tried the setting called just counter?

@BenSaunders27
Copy link
Owner

Eddie,

I do not follow your issue with the greedy decoding. The _ben_greedy video looks to have decoded correctly to me? As JECULAI mentions, I believe the greedy decode code is correct as I can get valid output out. Please ensure you use an augmentation method, such as Gaussian Noise or Just Counter.

Thanks,
Ben

@BenSaunders27
Copy link
Owner

BenSaunders27 commented May 21, 2021

I have shared a Progressive Transformer checkpoint at https://www.dropbox.com/s/l4xmnybp7luz0l3/PreTrained_PTSLP_Model.ckpt?dl=0.

This model has a size of num_layers: 2, num_heads: 4 and embedding_dim: 512, as outlined in ./Configs/Base.yaml. It has been pre-trained on the full PHOENIX14T dataset with the data format as above.

I have updated the code to enable checkpoint loading, so please pull the latest version. To initialise a model from this checkpoint, pass the --ckpt ./PreTrained_PTSLP_Model.ckpt argument to either train or test modes. Additionally, to initialise the correct src_embed size, the config argument src_vocab: "./Configs/src_vocab.txt" must be set to the location of the src_vocab.txt, found under ./Configs.

Please let me know if this checkpoint cannot be downloaded or loaded correctly.

Thanks,
Ben

@BenSaunders27
Copy link
Owner

The dataset is the PHOENIX14T dataset, which contains 7096 training, 519 dev and 642 test sequences. Further info can be found in https://openaccess.thecvf.com/content_cvpr_2018/papers/Camgoz_Neural_Sign_Language_CVPR_2018_paper.pdf.

@JECULAI
Copy link
Author

JECULAI commented May 23, 2021

The pre-trained model link is invalid for me(ERR_CONNECTION_TIMED_OUT). Can you share the log file(and pre-trained model) of the other two augmentations? And can you just upload the ckpt file to the github?

@JECULAI
Copy link
Author

JECULAI commented May 24, 2021

I fine-tune the pre-trained model on my data, the results still look bad, the DTW score never be less than 12 . And I process the data in a correct way, which is then divided by 3 * length_of_shoulder for normalizing.

@JECULAI
Copy link
Author

JECULAI commented May 26, 2021

@Eddie-Hwang Hi, Euijun, Do you get the plausible results now ? Can you share some info of your training or testing?

@Eddie-Hwang
Copy link

@JECULAI I tested the implementation with joint values including facial landmarks, so the dtw score is completely different from what is in the train log.
I also tested with 150 joint value inputs, did a few iterations to see how the dtw score was changing, and the score definitely improved. However, I need at least 5000 epochs to achieve a desired output.
Now I am wondering what the output with 10 dtw scores looks like (can you please share the rendered output of yours?).
The sample from the conference video shows a pretty accurate sign pose, and I think the output is almost 5dtw or less.

I hope the author release backtranslation code sooner or later.
He did not report dtw scores on his paper.

@JECULAI
Copy link
Author

JECULAI commented May 27, 2021

@Eddie-Hwang Hi, Euijun, What does the desired output looks like? Is that the output of inference? Can you show me what the desired output looks like? I cannot get a pretty sign pose via inference at all, and I test skeletons which come from training(not inference), the DTW score is about 2 or less, and the skeletons come from training seems to be the same as the output showed in the paper(Filter out the edge details and noise). The rendered output of inference can be seen in this link https://www.yuque.com/docs/share/b9dc10ba-a7bc-43b1-8164-037ed9f606b8.

@cripac-sjx
Copy link

Hi,
I have problem with processing the data,
Can you share how you convert the 2D pose from OpenPose into 150 3D keypoints

Thanks!

@JECULAI
Copy link
Author

JECULAI commented May 28, 2021

@cripac-sjx Hi, xinjian, You can refer to https://github.com/gopeith/SignLanguageProcessing for more details

@cripac-sjx
Copy link

@cripac-sjx Hi, xinjian, You can refer to https://github.com/gopeith/SignLanguageProcessing for more details

Thanks!

I extracted 137 keypoints for each frame using openpose, including "pose_keypoints_2d", "face_keypoints_2d", "hand_right_keypoints_2d", "hand_left_keypoints_2d". But the examples provides 150 joints, what's the difference between them.

@JECULAI
Copy link
Author

JECULAI commented Jun 2, 2021

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

@cripac-sjx
Copy link

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

Why I extract 25 keypoints in "pose_keypoints_2d"

@JECULAI
Copy link
Author

JECULAI commented Jun 2, 2021

you should just use the joints of upper body trunk, and the num of them is 8

@cripac-sjx
Copy link

you should just use the joints of upper body trunk, and the num of them is 8

Got it! Thanks a lot!

@cripac-sjx
Copy link

you should just use the joints of upper body trunk, and the num of them is 8

Excuse me, how do you extract the 8 keypoints of upper body?

@JECULAI
Copy link
Author

JECULAI commented Jun 2, 2021

slice the whole extracted keypoints list........

@BenSaunders27
Copy link
Owner

Hi cripac-sjx,

Just a reminder that the 150 joints required for the base model are 2D OpenPose coordinates lifted to 3D using the Inverse Kinematics approach at https://github.com/gopeith/SignLanguageProcessing under 3DposeEstimator.

If you just use 2D OpenPose coordinates (resulting in 100 values - 50 joints * 2) , you will need to change the plot_videos function in the code to animate OpenPose skeletons. And set the trg_size to 100 in the config file.

@BenSaunders27
Copy link
Owner

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

Remember that OpenPose only provides 2D joints, so if you are using just OpenPose coordinates, the dim of features is actually 100 = 50*2

@cripac-sjx
Copy link

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

Remember that OpenPose only provides 2D joints, so if you are using just OpenPose coordinates, the dim of features is actually 100 = 50*2
Thanks, why is the ".skels" files in the examples provided in "./Data/tmp" divisible by 151 instead of 150.

@cripac-sjx
Copy link

slice the whole extracted keypoints list........

Thanks for you patience, and why is the ".skels" files in the examples provided in "./Data/tmp" divisible by 151 instead of 150.

@JECULAI
Copy link
Author

JECULAI commented Jun 3, 2021

The last one in 151 is the timing number

@BenSaunders27
Copy link
Owner

All,

Glad you've had more success with this implementation code. Pre-processed Phoenix14T data can now be requested via email at b.saunders@surrey.ac.uk if required.

Thanks,
Ben

@Tejaswini2612
Copy link

Hello Ben,

I used the 3D pose data provided by you and the recommended settings as discussed above and on the main page. However, I wasn't able to reproduce the results. I obtain the best checkpoint at step 80 with dtw around 11.9 as the best one in validation set, contrary to step 100000 and dtw 10 as seen in your log files.

I am not sure why I am not able to reproduce the results. The test video plots are not good as well, I see the same plot in almost all the videos.

But on the other hand, if I use the checkpoint provided by you the test videos are much better and decent.

I am not sure why the training is not working as well if the checkpoint is obtained using the same data and same settings.

Can you please suggest what I could possibly be missing (FYI - I had set Gaussian noise to true, noise rate to 5 while training)?

Thanks,
Tej

@clviegas
Copy link

@Tejaswini2612: Have you tried to change the validation_freq in the config file? I was able to reach 10 DTW by changing it to 10000.

@JECULAI
Copy link
Author

JECULAI commented Jul 28, 2021

@clviegas Can you show me some of your best results? I also trained some model which can reach 10 DTW(not exactly 10, but less than 11), but the output skeletons does not meet my expectations.

@divyachhipani
Copy link

Hi,

I changed validation_freq = 10000, gaussian_noise= True, noise_rate = 5. I am using pre-trained model checkpoint on the preprocessed dataset provided by Ben. But I am getting - Best validation result at step 90: 12.53 dtw. Is there anyway to achieve better results?

Thanks,
Divya

@divyachhipani
Copy link

Hi, I actually re-implemented ben's work using pytorch lightning. I can successfully trained the model but have some issue on prediction.

Hi Eddie,

Is it possible to share your code base?

Thanks,
Divya

@cripac-sjx
Copy link

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

Hi
I have converted 2D keypoints extracted by openpose to 3D keypoints using 3DposeEstimator.

Firstly, I concatenate the first 8 points of "pose_keypoints_2d" with "hand_left_keypoints_2d" and "hand_right_keypoints_2d", and process it into 3d keypoints by "demo.py" in "3DposeEstimator", but results in random and messy ground truth pose when visualizing.

Can you tell me what went wrong?

Thanks!

@clviegas
Copy link

@JECULAI Please find attached a validation video produced after training for 310000 steps. As mentioned before, when you change the validation frequency, the model we keep on training longer.

WIND_SCHWACH_MAESSIG_11_37.mp4

@thiagomcoutinho
Copy link

thiagomcoutinho commented Sep 23, 2021

Hi @Eddie-Hwang, I also find it strange that in the greedy_decode function the ground truth counter values and the first ground truth frame are fed to the network. Your finds are the same or you came to a different conclusion?

P.S.: I wonder if this is the case, then we couldn't do real inference based soly on text, right?

Looking forward to the answer, thanks.

@Eddie-Hwang
Copy link

Eddie-Hwang commented Sep 23, 2021 via email

@cripac-sjx
Copy link

Hi,

I changed validation_freq = 10000, gaussian_noise= True, noise_rate = 5. I am using pre-trained model checkpoint on the preprocessed dataset provided by Ben. But I am getting - Best validation result at step 90: 12.53 dtw. Is there anyway to achieve better results?

Thanks,
Divya

Hi, I meet the same question with you, do you deal with it?

@thiagomcoutinho
Copy link

@Eddie-Hwang yes, I'm also wondering if the same decoding is used in the subsequent papers.

When you say that only works for lab environment your are saying that we need the GT counter and first frame in order to make a prediction, right?

Best Regards,

@cripac-sjx
Copy link

is it right form of greedy decode?
in the source code, I found that the greedy decoding method was different from what I expected.
As you can see in your code, the model continues to take the reference pose and predict the next pose. (https://github.com/BenSaunders27/ProgressiveTransformersSLP/blob/master/search.py)

I have tested two different options (your greedy decoding vs. original greedy decoding)
Please see attached video files

Can you please explain how you decoded the model outputs?

25October_2010_Monday_tagesschau-17_normal_greedy.2.mp4
25October_2010_Monday_tagesschau-17_ben_greedy_.1.mp4

Hi, how did you synthesize such a perfect results? Do you have any changes to the code and can you share it?

Thanks!

@cripac-sjx
Copy link

Hello Ben,

I used the 3D pose data provided by you and the recommended settings as discussed above and on the main page. However, I wasn't able to reproduce the results. I obtain the best checkpoint at step 80 with dtw around 11.9 as the best one in validation set, contrary to step 100000 and dtw 10 as seen in your log files.

I am not sure why I am not able to reproduce the results. The test video plots are not good as well, I see the same plot in almost all the videos.

But on the other hand, if I use the checkpoint provided by you the test videos are much better and decent.

I am not sure why the training is not working as well if the checkpoint is obtained using the same data and same settings.

Can you please suggest what I could possibly be missing (FYI - I had set Gaussian noise to true, noise rate to 5 while training)?

Thanks,
Tej

I meet the same questions with you, do you fixed it?

Thanks

@ziangchengg
Copy link

The last one in 151 is the timing number

hi JECULAI, could you tell me how to get the timing number. i process my own dataset and get 150 points, but i have no idea about the 151st. how do i get it or how to change the program input into 150 points?
Thanks.

@jianzfb
Copy link

jianzfb commented Nov 8, 2021

I have shared a Progressive Transformer checkpoint at https://www.dropbox.com/s/l4xmnybp7luz0l3/PreTrained_PTSLP_Model.ckpt?dl=0.

This model has a size of num_layers: 2, num_heads: 4 and embedding_dim: 512, as outlined in ./Configs/Base.yaml. It has been pre-trained on the full PHOENIX14T dataset with the data format as above.

I have updated the code to enable checkpoint loading, so please pull the latest version. To initialise a model from this checkpoint, pass the --ckpt ./PreTrained_PTSLP_Model.ckpt argument to either train or test modes. Additionally, to initialise the correct src_embed size, the config argument src_vocab: "./Configs/src_vocab.txt" must be set to the location of the src_vocab.txt, found under ./Configs.

Please let me know if this checkpoint cannot be downloaded or loaded correctly.

Thanks, Ben

@BenSaunders27 ,I load your shared checkpoint, but the predicted sign pose not enough accurate. like this

REGEN_SCHNEE_REGION_10_79_h264.mp4

The data is your provided test data in folder Data/tmp。

Whether this result is normal? Thanks

@lukedalmau
Copy link

Greetings pals
@jianzfb or anynone has a ckpt of the model to use as pretrained? I wanna check if is useful to use another language pretrained model to ensure performance in new languages.

Kind regards to all and thxs for sharing your knowledge.

@16NightTimeRain
Copy link

这是贪婪解码的正确形式吗? 在源码中,我发现贪心解码的方式和我预想的不一样。 正如您在代码中所见,模型继续采用参考姿势并预测下一个姿势。( https://github.com/BenSaunders27/ProgressiveTransformersSLP/blob/master/search.py​​ )

我已经测试了两个不同的选项(你的贪婪解码与原始贪婪解码) 请看附件视频文件

你能解释一下你是如何解码模型输出的吗?

25October_2010_Monday_tagesschau-17_normal_greedy.2.mp4
25October_2010_Monday_tagesschau-17_ben_greedy_.1.mp4

How do you add facial key points to the network? How do you deal with the face in the 2D to 3D preprocessing? I have problems in this step. Please give me some advice

@FangSen9000
Copy link

FangSen9000 commented Feb 2, 2023

@jianzfb @JECULAI Now, the checkpoint file is not available for downloading. If you have reserved files, could you send me a copy? Thank you very much. fangsen2024@gmai.com

@hacker009-sudo
Copy link

@jianzfb @BenSaunders27 @JECULAI , can you please share the pre-trained model at ruchisharma11448000@gmail.com

@hacker009-sudo
Copy link

@thiagomcoutinho @jianzfb @Tejaswini2612 @clviegas , please share the pretrained model, if you have stored it somewhere. Request you to please help it is really really urgent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests