Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of motions seems less than the Real one #29

Open
rd20karim opened this issue Jul 25, 2023 · 4 comments
Open

Number of motions seems less than the Real one #29

rd20karim opened this issue Jul 25, 2023 · 4 comments

Comments

@rd20karim
Copy link

Hello, I have a question regarding the method used to compute the number of motion. While investigating your Motiondataset and dataloader, it appears that the motion numbers during training are counted as the total number of motion snippets across all training subset, divided by the batch size. I can understand the reasoning behind this approach, but it leads to a significant reduction on the real data size. Specifically, the number of motions in the training set was initially 23,384. After removing motions under window_size 64, it decreased to 20,942, and further reduced to 14,435 training motions using the aforementioned method.

Your clarification on this behavior would be greatly appreciated. Thank you.

@EricGuo5513
Copy link
Owner

EricGuo5513 commented Jul 25, 2023 via email

@rd20karim
Copy link
Author

The final value of cumsum is the number of motion snippets of all a given split

image

For example when dataloader is constructed for validation
We have the real number of motions 1300, but when we display the len(val_loader) it show’s 911

image

The length method of batch sampler is called, line 240 (around (116698/128) = 911) which assign as the number of motions during training the VQ_VAE on humanML3D

image

While the total the number of motion is before the augmentation process, so the mirrored motion is not counted

@EricGuo5513
Copy link
Owner

EricGuo5513 commented Jul 27, 2023 via email

@rd20karim
Copy link
Author

Okay, thanks for clarifying. It seems that you trained the autoencoder on each sampled motion of 64 frames, considering them individually from all the training data.

Regarding your work on text generation using TM2T with HumanML3D, the preprocessing was limited to filtering motions that have less than 3 text descriptions. Additionally, the motion length was constrained to be between 40 and 200 frames. I wanted to know if I am missing any other details. Is there any constraint on the maximum text length generated during inference or training?

I was also curious about how the given splitting was generated because it seems to significantly impact the overall performance when the split is changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants