Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition between generated gesture #1

Open
YoungSeng opened this issue May 9, 2023 · 2 comments
Open

Transition between generated gesture #1

YoungSeng opened this issue May 9, 2023 · 2 comments

Comments

@YoungSeng
Copy link
Owner

The segments we trained are all 4s long, and it is difficult to generalize to arbitrary length gestures by positional encoding alone.
MDM-based models that require time-awareness (arbitrarily long inference) require a smooth transition between the generated sequences. The following practices can be referred to:

  1. Our approach is to add seed poses for smooth transitions.
  2. Its follow-up work PriorMDM uses DoubleTake for long motion generation.
  3. EDGE enforces temporal consistency between multiple sequences.
@sh-taheri
Copy link

sh-taheri commented Oct 25, 2023

Hi, thanks for the great work!

Regarding your approach:

I am wondering if this is a bug in sample.py when smoothing the transitions here:

As you have commented yourself, the size of varaible last_poses is (1, model.njoints, 1, args.n_seed), so len(last_poses) is always 1. I think len(last_poses) should be replaced with np.size(last_poses, axis=-1) which is args.n_seed (30 frames by default). This way, it combines the first frames of the new prediction with the last frames of previous prediction, something like this:

for j in range(np.size(last_poses, axis=-1)):
n = np.size(last_poses, axis=-1)
prev = last_poses[..., j]
next = sample[..., j]
sample[..., j] = prev * (n - j) / (n + 1) + next * (j + 1) / (n + 1)

Am I right? Would appreciate your feedback.
Thanks a lot

@YoungSeng
Copy link
Owner Author

Yes, when I reproduced it later I remembered that there was a minor problem in this region, but it didn't seem to have much effect on the results. Also:

  1. the length of last_poses is not 1, but n_seed, where the first 1 indicates the batch size and the second 1 extends the dimensions, which has no real meaning.
  2. the follow-up DiffuseStyleGesture+ definitely fixed this, see: here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants