Does anyone reproduce the plausible result using this source code? #17

JECULAI · 2021-04-17T08:27:29Z

Does anyone reproduce the plausible result using this source code?
I have tried several times, but the generated skeleton always looks weird, can anyone help to figure that out？

Eddie-Hwang · 2021-05-18T01:52:02Z

Hi, I actually re-implemented ben's work using pytorch lightning. I can successfully trained the model but have some issue on prediction.

JECULAI · 2021-05-21T07:23:39Z

My partners and I also successfully train the model, but it seems that this proposed method just doesn't work. The results of inference look bad, and the author do not willing ti share the pretrained model or even the log files. I question the quality of this paper.

BenSaunders27 · 2021-05-21T08:51:10Z

Hi all,

I am sorry that you are finding this code difficult to work with and are not getting great results. I apologise for the late response also.

I am looking into why the inference results are looking bad, as I have not found this before. Can I check that everyone is using Guassian Noise augmentation? By setting gaussian_noise: True and noise_rate: 5. Without this, the inference will be bad.

Do the Ground Truth skeletons look ok? If not, the issue may be with the data preparation. Please follow the Data section in the README and start with the example data provided in /Data/tmp. Unfortunately I cannot share the full data.

Thanks,
Ben

Eddie-Hwang · 2021-05-21T09:09:02Z

is it right form of greedy decode?
in the source code, I found that the greedy decoding method was different from what I expected.
As you can see in your code, the model continues to take the reference pose and predict the next pose. (https://github.com/BenSaunders27/ProgressiveTransformersSLP/blob/master/search.py)

I have tested two different options (your greedy decoding vs. original greedy decoding)
Please see attached video files

Can you please explain how you decoded the model outputs?

25October_2010_Monday_tagesschau-17_normal_greedy.2.mp4

25October_2010_Monday_tagesschau-17_ben_greedy_.1.mp4

JECULAI · 2021-05-21T12:04:41Z

It should be noted that the greedy decode code is right, have you tried the setting called just counter?

BenSaunders27 · 2021-05-21T13:41:24Z

Eddie,

I do not follow your issue with the greedy decoding. The _ben_greedy video looks to have decoded correctly to me? As JECULAI mentions, I believe the greedy decode code is correct as I can get valid output out. Please ensure you use an augmentation method, such as Gaussian Noise or Just Counter.

Thanks,
Ben

BenSaunders27 · 2021-05-21T13:42:58Z

I have shared a Progressive Transformer checkpoint at https://www.dropbox.com/s/l4xmnybp7luz0l3/PreTrained_PTSLP_Model.ckpt?dl=0.

This model has a size of num_layers: 2, num_heads: 4 and embedding_dim: 512, as outlined in ./Configs/Base.yaml. It has been pre-trained on the full PHOENIX14T dataset with the data format as above.

I have updated the code to enable checkpoint loading, so please pull the latest version. To initialise a model from this checkpoint, pass the --ckpt ./PreTrained_PTSLP_Model.ckpt argument to either train or test modes. Additionally, to initialise the correct src_embed size, the config argument src_vocab: "./Configs/src_vocab.txt" must be set to the location of the src_vocab.txt, found under ./Configs.

Please let me know if this checkpoint cannot be downloaded or loaded correctly.

Thanks,
Ben

BenSaunders27 · 2021-05-21T16:02:22Z

The dataset is the PHOENIX14T dataset, which contains 7096 training, 519 dev and 642 test sequences. Further info can be found in https://openaccess.thecvf.com/content_cvpr_2018/papers/Camgoz_Neural_Sign_Language_CVPR_2018_paper.pdf.

JECULAI · 2021-05-23T06:50:42Z

The pre-trained model link is invalid for me（ERR_CONNECTION_TIMED_OUT）. Can you share the log file(and pre-trained model) of the other two augmentations? And can you just upload the ckpt file to the github?

JECULAI · 2021-05-24T13:59:26Z

I fine-tune the pre-trained model on my data, the results still look bad, the DTW score never be less than 12 . And I process the data in a correct way, which is then divided by 3 * length_of_shoulder for normalizing.

JECULAI · 2021-05-26T08:01:37Z

@Eddie-Hwang Hi, Euijun, Do you get the plausible results now ? Can you share some info of your training or testing?

Eddie-Hwang · 2021-05-27T04:04:21Z

@JECULAI I tested the implementation with joint values including facial landmarks, so the dtw score is completely different from what is in the train log.
I also tested with 150 joint value inputs, did a few iterations to see how the dtw score was changing, and the score definitely improved. However, I need at least 5000 epochs to achieve a desired output.
Now I am wondering what the output with 10 dtw scores looks like (can you please share the rendered output of yours?).
The sample from the conference video shows a pretty accurate sign pose, and I think the output is almost 5dtw or less.

I hope the author release backtranslation code sooner or later.
He did not report dtw scores on his paper.

JECULAI · 2021-05-27T05:10:06Z

@Eddie-Hwang Hi, Euijun, What does the desired output looks like? Is that the output of inference? Can you show me what the desired output looks like? I cannot get a pretty sign pose via inference at all, and I test skeletons which come from training(not inference), the DTW score is about 2 or less, and the skeletons come from training seems to be the same as the output showed in the paper(Filter out the edge details and noise). The rendered output of inference can be seen in this link https://www.yuque.com/docs/share/b9dc10ba-a7bc-43b1-8164-037ed9f606b8.

cripac-sjx · 2021-05-28T13:43:51Z

Hi,
I have problem with processing the data,
Can you share how you convert the 2D pose from OpenPose into 150 3D keypoints

Thanks!

JECULAI · 2021-05-28T13:47:57Z

@cripac-sjx Hi, xinjian, You can refer to https://github.com/gopeith/SignLanguageProcessing for more details

cripac-sjx · 2021-06-02T07:31:11Z

@cripac-sjx Hi, xinjian, You can refer to https://github.com/gopeith/SignLanguageProcessing for more details

Thanks!

I extracted 137 keypoints for each frame using openpose, including "pose_keypoints_2d", "face_keypoints_2d", "hand_right_keypoints_2d", "hand_left_keypoints_2d". But the examples provides 150 joints, what's the difference between them.

JECULAI · 2021-06-02T07:38:30Z

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

cripac-sjx · 2021-06-02T07:49:27Z

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

Why I extract 25 keypoints in "pose_keypoints_2d"

JECULAI · 2021-06-02T07:52:10Z

you should just use the joints of upper body trunk, and the num of them is 8

cripac-sjx · 2021-06-02T07:52:54Z

you should just use the joints of upper body trunk, and the num of them is 8

Got it! Thanks a lot!

cripac-sjx · 2021-06-02T08:47:52Z

you should just use the joints of upper body trunk, and the num of them is 8

Excuse me, how do you extract the 8 keypoints of upper body?

JECULAI · 2021-06-02T09:03:55Z

slice the whole extracted keypoints list........

BenSaunders27 · 2021-06-02T10:40:17Z

Hi cripac-sjx,

Just a reminder that the 150 joints required for the base model are 2D OpenPose coordinates lifted to 3D using the Inverse Kinematics approach at https://github.com/gopeith/SignLanguageProcessing under 3DposeEstimator.

If you just use 2D OpenPose coordinates (resulting in 100 values - 50 joints * 2) , you will need to change the plot_videos function in the code to animate OpenPose skeletons. And set the trg_size to 100 in the config file.

BenSaunders27 · 2021-06-02T10:41:22Z

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

Remember that OpenPose only provides 2D joints, so if you are using just OpenPose coordinates, the dim of features is actually 100 = 50*2

cripac-sjx · 2021-06-02T14:05:08Z

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

Remember that OpenPose only provides 2D joints, so if you are using just OpenPose coordinates, the dim of features is actually 100 = 50*2
Thanks, why is the ".skels" files in the examples provided in "./Data/tmp" divisible by 151 instead of 150.

cripac-sjx · 2021-06-02T14:05:39Z

slice the whole extracted keypoints list........

Thanks for you patience, and why is the ".skels" files in the examples provided in "./Data/tmp" divisible by 151 instead of 150.

JECULAI · 2021-06-03T07:59:08Z

The last one in 151 is the timing number

BenSaunders27 · 2021-06-17T12:18:05Z

All,

Glad you've had more success with this implementation code. Pre-processed Phoenix14T data can now be requested via email at b.saunders@surrey.ac.uk if required.

Thanks,
Ben

Tejaswini2612 · 2021-07-01T02:54:20Z

Hello Ben,

I used the 3D pose data provided by you and the recommended settings as discussed above and on the main page. However, I wasn't able to reproduce the results. I obtain the best checkpoint at step 80 with dtw around 11.9 as the best one in validation set, contrary to step 100000 and dtw 10 as seen in your log files.

I am not sure why I am not able to reproduce the results. The test video plots are not good as well, I see the same plot in almost all the videos.

But on the other hand, if I use the checkpoint provided by you the test videos are much better and decent.

I am not sure why the training is not working as well if the checkpoint is obtained using the same data and same settings.

Can you please suggest what I could possibly be missing (FYI - I had set Gaussian noise to true, noise rate to 5 while training)?

Thanks,
Tej

clviegas · 2021-07-28T14:23:08Z

@Tejaswini2612: Have you tried to change the validation_freq in the config file? I was able to reach 10 DTW by changing it to 10000.

JECULAI · 2021-07-28T14:40:50Z

@clviegas Can you show me some of your best results? I also trained some model which can reach 10 DTW(not exactly 10, but less than 11), but the output skeletons does not meet my expectations.

divyachhipani · 2021-08-15T15:16:26Z

Hi,

I changed validation_freq = 10000, gaussian_noise= True, noise_rate = 5. I am using pre-trained model checkpoint on the preprocessed dataset provided by Ben. But I am getting - Best validation result at step 90: 12.53 dtw. Is there anyway to achieve better results?

Thanks,
Divya

divyachhipani · 2021-08-15T20:30:28Z

Hi, I actually re-implemented ben's work using pytorch lightning. I can successfully trained the model but have some issue on prediction.

Hi Eddie,

Is it possible to share your code base?

Thanks,
Divya

cripac-sjx · 2021-08-25T02:12:38Z

"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150.

Hi
I have converted 2D keypoints extracted by openpose to 3D keypoints using 3DposeEstimator.

Firstly, I concatenate the first 8 points of "pose_keypoints_2d" with "hand_left_keypoints_2d" and "hand_right_keypoints_2d", and process it into 3d keypoints by "demo.py" in "3DposeEstimator", but results in random and messy ground truth pose when visualizing.

Can you tell me what went wrong?

Thanks!

clviegas · 2021-08-26T20:28:04Z

@JECULAI Please find attached a validation video produced after training for 310000 steps. As mentioned before, when you change the validation frequency, the model we keep on training longer.

WIND_SCHWACH_MAESSIG_11_37.mp4

thiagomcoutinho · 2021-09-23T02:22:51Z

Hi @Eddie-Hwang, I also find it strange that in the greedy_decode function the ground truth counter values and the first ground truth frame are fed to the network. Your finds are the same or you came to a different conclusion?

P.S.: I wonder if this is the case, then we couldn't do real inference based soly on text, right?

Looking forward to the answer, thanks.

Eddie-Hwang · 2021-09-23T03:17:56Z

Hi, I'm glad you found the same problem as me. This is a kind of weak supervised decoding and cannot be used in real world setting (only work for lab environment). It is unclear whether the author use the same decoding method in the subsequent studies. Best, Eddie

…

-----Original Message----- From: "Thiago Coutinho" ***@***.***> To: "BenSaunders27/ProgressiveTransformersSLP" ***@***.***>; Cc: "Eddie" ***@***.***>; "Mention" ***@***.***>; Sent: 2021-09-23 (목) 11:23:05 (UTC+09:00) Subject: Re: [BenSaunders27/ProgressiveTransformersSLP] Does anyone reproduce the plausible result using this source code? (#17) Hi @Eddie-Hwang https://github.com/Eddie-Hwang, I also find it strange that in the greedy_decode function the ground truth counter values and the first ground truth frame are fed to the network. Your finds are the same or you came to a different conclusion? Looking forward to the answer, thanks. —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub #17 (comment), or unsubscribe https://github.com/notifications/unsubscribe-auth/AOX5RVCB5G5YLLC6SUUROO3UDKFQLANCNFSM43C2XLYA.Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign=notification-email&utm_medium=email&utm_source=github.

cripac-sjx · 2021-09-23T13:58:57Z

Hi,

I changed validation_freq = 10000, gaussian_noise= True, noise_rate = 5. I am using pre-trained model checkpoint on the preprocessed dataset provided by Ben. But I am getting - Best validation result at step 90: 12.53 dtw. Is there anyway to achieve better results?

Thanks,
Divya

Hi, I meet the same question with you, do you deal with it?

thiagomcoutinho · 2021-09-23T14:03:48Z

@Eddie-Hwang yes, I'm also wondering if the same decoding is used in the subsequent papers.

When you say that only works for lab environment your are saying that we need the GT counter and first frame in order to make a prediction, right?

Best Regards,

cripac-sjx · 2021-09-23T14:10:17Z

is it right form of greedy decode?
in the source code, I found that the greedy decoding method was different from what I expected.
As you can see in your code, the model continues to take the reference pose and predict the next pose. (https://github.com/BenSaunders27/ProgressiveTransformersSLP/blob/master/search.py)

I have tested two different options (your greedy decoding vs. original greedy decoding)
Please see attached video files

Can you please explain how you decoded the model outputs?

25October_2010_Monday_tagesschau-17_normal_greedy.2.mp4
25October_2010_Monday_tagesschau-17_ben_greedy_.1.mp4

Hi, how did you synthesize such a perfect results? Do you have any changes to the code and can you share it?

Thanks!

cripac-sjx · 2021-09-23T14:14:36Z

Hello Ben,

I used the 3D pose data provided by you and the recommended settings as discussed above and on the main page. However, I wasn't able to reproduce the results. I obtain the best checkpoint at step 80 with dtw around 11.9 as the best one in validation set, contrary to step 100000 and dtw 10 as seen in your log files.

I am not sure why I am not able to reproduce the results. The test video plots are not good as well, I see the same plot in almost all the videos.

But on the other hand, if I use the checkpoint provided by you the test videos are much better and decent.

I am not sure why the training is not working as well if the checkpoint is obtained using the same data and same settings.

Can you please suggest what I could possibly be missing (FYI - I had set Gaussian noise to true, noise rate to 5 while training)?

Thanks,
Tej

I meet the same questions with you, do you fixed it?

Thanks

ziangchengg · 2021-10-13T04:37:21Z

The last one in 151 is the timing number

hi JECULAI, could you tell me how to get the timing number. i process my own dataset and get 150 points, but i have no idea about the 151st. how do i get it or how to change the program input into 150 points?
Thanks.

jianzfb · 2021-11-08T08:40:32Z

I have shared a Progressive Transformer checkpoint at https://www.dropbox.com/s/l4xmnybp7luz0l3/PreTrained_PTSLP_Model.ckpt?dl=0.

This model has a size of num_layers: 2, num_heads: 4 and embedding_dim: 512, as outlined in ./Configs/Base.yaml. It has been pre-trained on the full PHOENIX14T dataset with the data format as above.

I have updated the code to enable checkpoint loading, so please pull the latest version. To initialise a model from this checkpoint, pass the --ckpt ./PreTrained_PTSLP_Model.ckpt argument to either train or test modes. Additionally, to initialise the correct src_embed size, the config argument src_vocab: "./Configs/src_vocab.txt" must be set to the location of the src_vocab.txt, found under ./Configs.

Please let me know if this checkpoint cannot be downloaded or loaded correctly.

Thanks, Ben

@BenSaunders27 ，I load your shared checkpoint, but the predicted sign pose not enough accurate. like this

REGEN_SCHNEE_REGION_10_79_h264.mp4

The data is your provided test data in folder Data/tmp。

Whether this result is normal? Thanks

lukedalmau · 2022-11-16T15:26:40Z

Greetings pals
@jianzfb or anynone has a ckpt of the model to use as pretrained? I wanna check if is useful to use another language pretrained model to ensure performance in new languages.

Kind regards to all and thxs for sharing your knowledge.

16NightTimeRain · 2022-12-30T17:46:58Z

这是贪婪解码的正确形式吗？在源码中，我发现贪心解码的方式和我预想的不一样。正如您在代码中所见，模型继续采用参考姿势并预测下一个姿势。( https://github.com/BenSaunders27/ProgressiveTransformersSLP/blob/master/search.py )

我已经测试了两个不同的选项（你的贪婪解码与原始贪婪解码）请看附件视频文件

你能解释一下你是如何解码模型输出的吗？

25October_2010_Monday_tagesschau-17_normal_greedy.2.mp4
25October_2010_Monday_tagesschau-17_ben_greedy_.1.mp4

How do you add facial key points to the network? How do you deal with the face in the 2D to 3D preprocessing? I have problems in this step. Please give me some advice

FangSen9000 · 2023-02-02T15:36:36Z

@jianzfb @JECULAI Now, the checkpoint file is not available for downloading. If you have reserved files, could you send me a copy？ Thank you very much. fangsen2024@gmai.com

hacker009-sudo · 2023-08-22T05:32:46Z

@jianzfb @BenSaunders27 @JECULAI , can you please share the pre-trained model at ruchisharma11448000@gmail.com

hacker009-sudo · 2023-09-30T12:07:43Z

@thiagomcoutinho @jianzfb @Tejaswini2612 @clviegas , please share the pretrained model, if you have stored it somewhere. Request you to please help it is really really urgent.

Repository owner deleted a comment from florinshen May 21, 2021

Does anyone reproduce the plausible result using this source code? #17

Does anyone reproduce the plausible result using this source code? #17

Comments

JECULAI commented Apr 17, 2021

Eddie-Hwang commented May 18, 2021

JECULAI commented May 21, 2021

BenSaunders27 commented May 21, 2021 • edited

Eddie-Hwang commented May 21, 2021 • edited

JECULAI commented May 21, 2021

BenSaunders27 commented May 21, 2021

BenSaunders27 commented May 21, 2021 • edited

BenSaunders27 commented May 21, 2021

JECULAI commented May 23, 2021

JECULAI commented May 24, 2021

JECULAI commented May 26, 2021

Eddie-Hwang commented May 27, 2021

JECULAI commented May 27, 2021 • edited

cripac-sjx commented May 28, 2021

JECULAI commented May 28, 2021

cripac-sjx commented Jun 2, 2021

JECULAI commented Jun 2, 2021

cripac-sjx commented Jun 2, 2021

JECULAI commented Jun 2, 2021

cripac-sjx commented Jun 2, 2021

cripac-sjx commented Jun 2, 2021

JECULAI commented Jun 2, 2021

BenSaunders27 commented Jun 2, 2021

BenSaunders27 commented Jun 2, 2021

cripac-sjx commented Jun 2, 2021

cripac-sjx commented Jun 2, 2021

JECULAI commented Jun 3, 2021

BenSaunders27 commented Jun 17, 2021

Tejaswini2612 commented Jul 1, 2021

clviegas commented Jul 28, 2021

JECULAI commented Jul 28, 2021

divyachhipani commented Aug 15, 2021

divyachhipani commented Aug 15, 2021

cripac-sjx commented Aug 25, 2021

clviegas commented Aug 26, 2021

thiagomcoutinho commented Sep 23, 2021 • edited

Eddie-Hwang commented Sep 23, 2021 via email

cripac-sjx commented Sep 23, 2021

thiagomcoutinho commented Sep 23, 2021

cripac-sjx commented Sep 23, 2021

cripac-sjx commented Sep 23, 2021

ziangchengg commented Oct 13, 2021

jianzfb commented Nov 8, 2021 • edited

lukedalmau commented Nov 16, 2022

16NightTimeRain commented Dec 30, 2022

FangSen9000 commented Feb 2, 2023 • edited

hacker009-sudo commented Aug 22, 2023

hacker009-sudo commented Sep 30, 2023

BenSaunders27 commented May 21, 2021 •

edited

Eddie-Hwang commented May 21, 2021 •

edited

BenSaunders27 commented May 21, 2021 •

edited

JECULAI commented May 27, 2021 •

edited

thiagomcoutinho commented Sep 23, 2021 •

edited

jianzfb commented Nov 8, 2021 •

edited

FangSen9000 commented Feb 2, 2023 •

edited