-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does anyone reproduce the plausible result using this source code? #17
Comments
Hi, I actually re-implemented ben's work using pytorch lightning. I can successfully trained the model but have some issue on prediction. |
My partners and I also successfully train the model, but it seems that this proposed method just doesn't work. The results of inference look bad, and the author do not willing ti share the pretrained model or even the log files. I question the quality of this paper. |
Hi all, I am sorry that you are finding this code difficult to work with and are not getting great results. I apologise for the late response also. I am looking into why the inference results are looking bad, as I have not found this before. Can I check that everyone is using Guassian Noise augmentation? By setting Do the Ground Truth skeletons look ok? If not, the issue may be with the data preparation. Please follow the Data section in the README and start with the example data provided in /Data/tmp. Unfortunately I cannot share the full data. Thanks, |
is it right form of greedy decode? I have tested two different options (your greedy decoding vs. original greedy decoding) Can you please explain how you decoded the model outputs? 25October_2010_Monday_tagesschau-17_normal_greedy.2.mp425October_2010_Monday_tagesschau-17_ben_greedy_.1.mp4 |
It should be noted that the greedy decode code is right, have you tried the setting called just counter? |
Eddie, I do not follow your issue with the greedy decoding. The _ben_greedy video looks to have decoded correctly to me? As JECULAI mentions, I believe the greedy decode code is correct as I can get valid output out. Please ensure you use an augmentation method, such as Gaussian Noise or Just Counter. Thanks, |
I have shared a Progressive Transformer checkpoint at https://www.dropbox.com/s/l4xmnybp7luz0l3/PreTrained_PTSLP_Model.ckpt?dl=0. This model has a size of I have updated the code to enable checkpoint loading, so please pull the latest version. To initialise a model from this checkpoint, pass the Please let me know if this checkpoint cannot be downloaded or loaded correctly. Thanks, |
The dataset is the PHOENIX14T dataset, which contains 7096 training, 519 dev and 642 test sequences. Further info can be found in https://openaccess.thecvf.com/content_cvpr_2018/papers/Camgoz_Neural_Sign_Language_CVPR_2018_paper.pdf. |
The pre-trained model link is invalid for me(ERR_CONNECTION_TIMED_OUT). Can you share the log file(and pre-trained model) of the other two augmentations? And can you just upload the ckpt file to the github? |
I fine-tune the pre-trained model on my data, the results still look bad, the DTW score never be less than 12 . And I process the data in a correct way, which is then divided by 3 * length_of_shoulder for normalizing. |
@Eddie-Hwang Hi, Euijun, Do you get the plausible results now ? Can you share some info of your training or testing? |
@JECULAI I tested the implementation with joint values including facial landmarks, so the dtw score is completely different from what is in the train log. I hope the author release backtranslation code sooner or later. |
@Eddie-Hwang Hi, Euijun, What does the desired output looks like? Is that the output of inference? Can you show me what the desired output looks like? I cannot get a pretty sign pose via inference at all, and I test skeletons which come from training(not inference), the DTW score is about 2 or less, and the skeletons come from training seems to be the same as the output showed in the paper(Filter out the edge details and noise). The rendered output of inference can be seen in this link https://www.yuque.com/docs/share/b9dc10ba-a7bc-43b1-8164-037ed9f606b8. |
Hi, Thanks! |
@cripac-sjx Hi, xinjian, You can refer to https://github.com/gopeith/SignLanguageProcessing for more details |
Thanks! I extracted 137 keypoints for each frame using openpose, including "pose_keypoints_2d", "face_keypoints_2d", "hand_right_keypoints_2d", "hand_left_keypoints_2d". But the examples provides 150 joints, what's the difference between them. |
"pose_keypoints_2d" contains 8 joints, "hand_right_keypoints_2d" and "hand_left_keypoints_2d" each contains 21 key points, combined them all, we could get 50 points, thus the dim of feature is 50 * 3 = 150. |
Why I extract 25 keypoints in "pose_keypoints_2d" |
you should just use the joints of upper body trunk, and the num of them is 8 |
Got it! Thanks a lot! |
Excuse me, how do you extract the 8 keypoints of upper body? |
slice the whole extracted keypoints list........ |
Hi cripac-sjx, Just a reminder that the 150 joints required for the base model are 2D OpenPose coordinates lifted to 3D using the Inverse Kinematics approach at https://github.com/gopeith/SignLanguageProcessing under 3DposeEstimator. If you just use 2D OpenPose coordinates (resulting in 100 values - 50 joints * 2) , you will need to change the plot_videos function in the code to animate OpenPose skeletons. And set the trg_size to 100 in the config file. |
Remember that OpenPose only provides 2D joints, so if you are using just OpenPose coordinates, the dim of features is actually 100 = 50*2 |
|
Thanks for you patience, and why is the ".skels" files in the examples provided in "./Data/tmp" divisible by 151 instead of 150. |
The last one in 151 is the timing number |
All, Glad you've had more success with this implementation code. Pre-processed Phoenix14T data can now be requested via email at b.saunders@surrey.ac.uk if required. Thanks, |
Hello Ben, I used the 3D pose data provided by you and the recommended settings as discussed above and on the main page. However, I wasn't able to reproduce the results. I obtain the best checkpoint at step 80 with dtw around 11.9 as the best one in validation set, contrary to step 100000 and dtw 10 as seen in your log files. I am not sure why I am not able to reproduce the results. The test video plots are not good as well, I see the same plot in almost all the videos. But on the other hand, if I use the checkpoint provided by you the test videos are much better and decent. I am not sure why the training is not working as well if the checkpoint is obtained using the same data and same settings. Can you please suggest what I could possibly be missing (FYI - I had set Gaussian noise to true, noise rate to 5 while training)? Thanks, |
@Tejaswini2612: Have you tried to change the validation_freq in the config file? I was able to reach 10 DTW by changing it to 10000. |
@clviegas Can you show me some of your best results? I also trained some model which can reach 10 DTW(not exactly 10, but less than 11), but the output skeletons does not meet my expectations. |
Hi, I changed validation_freq = 10000, gaussian_noise= True, noise_rate = 5. I am using pre-trained model checkpoint on the preprocessed dataset provided by Ben. But I am getting - Best validation result at step 90: 12.53 dtw. Is there anyway to achieve better results? Thanks, |
Hi Eddie, Is it possible to share your code base? Thanks, |
Hi Firstly, I concatenate the first 8 points of "pose_keypoints_2d" with "hand_left_keypoints_2d" and "hand_right_keypoints_2d", and process it into 3d keypoints by "demo.py" in "3DposeEstimator", but results in random and messy ground truth pose when visualizing. Can you tell me what went wrong? Thanks! |
@JECULAI Please find attached a validation video produced after training for 310000 steps. As mentioned before, when you change the validation frequency, the model we keep on training longer. WIND_SCHWACH_MAESSIG_11_37.mp4 |
Hi @Eddie-Hwang, I also find it strange that in the greedy_decode function the ground truth counter values and the first ground truth frame are fed to the network. Your finds are the same or you came to a different conclusion? P.S.: I wonder if this is the case, then we couldn't do real inference based soly on text, right? Looking forward to the answer, thanks. |
Hi,
I'm glad you found the same problem as me.
This is a kind of weak supervised decoding and cannot be used in real world setting (only work for lab environment).
It is unclear whether the author use the same decoding method in the subsequent studies.
Best,
Eddie
…-----Original Message-----
From: "Thiago Coutinho" ***@***.***>
To: "BenSaunders27/ProgressiveTransformersSLP" ***@***.***>;
Cc: "Eddie" ***@***.***>; "Mention" ***@***.***>;
Sent: 2021-09-23 (목) 11:23:05 (UTC+09:00)
Subject: Re: [BenSaunders27/ProgressiveTransformersSLP] Does anyone reproduce the plausible result using this source code? (#17)
Hi @Eddie-Hwang https://github.com/Eddie-Hwang, I also find it strange that in the greedy_decode function the ground truth counter values and the first ground truth frame are fed to the network. Your finds are the same or you came to a different conclusion?
Looking forward to the answer, thanks.
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub #17 (comment), or unsubscribe https://github.com/notifications/unsubscribe-auth/AOX5RVCB5G5YLLC6SUUROO3UDKFQLANCNFSM43C2XLYA.Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign=notification-email&utm_medium=email&utm_source=github.
|
Hi, I meet the same question with you, do you deal with it? |
@Eddie-Hwang yes, I'm also wondering if the same decoding is used in the subsequent papers. When you say that only works for lab environment your are saying that we need the GT counter and first frame in order to make a prediction, right? Best Regards, |
Hi, how did you synthesize such a perfect results? Do you have any changes to the code and can you share it? Thanks! |
I meet the same questions with you, do you fixed it? Thanks |
hi JECULAI, could you tell me how to get the timing number. i process my own dataset and get 150 points, but i have no idea about the 151st. how do i get it or how to change the program input into 150 points? |
@BenSaunders27 ,I load your shared checkpoint, but the predicted sign pose not enough accurate. like this REGEN_SCHNEE_REGION_10_79_h264.mp4The data is your provided test data in folder Data/tmp。 Whether this result is normal? Thanks |
Greetings pals Kind regards to all and thxs for sharing your knowledge. |
How do you add facial key points to the network? How do you deal with the face in the 2D to 3D preprocessing? I have problems in this step. Please give me some advice |
@jianzfb @JECULAI Now, the checkpoint file is not available for downloading. If you have reserved files, could you send me a copy? Thank you very much. fangsen2024@gmai.com |
@jianzfb @BenSaunders27 @JECULAI , can you please share the pre-trained model at ruchisharma11448000@gmail.com |
@thiagomcoutinho @jianzfb @Tejaswini2612 @clviegas , please share the pretrained model, if you have stored it somewhere. Request you to please help it is really really urgent. |
Does anyone reproduce the plausible result using this source code?
I have tried several times, but the generated skeleton always looks weird, can anyone help to figure that out?
The text was updated successfully, but these errors were encountered: