Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about conditional generation #61

Open
AI-Guru opened this issue Apr 8, 2023 · 2 comments
Open

Questions about conditional generation #61

AI-Guru opened this issue Apr 8, 2023 · 2 comments

Comments

@AI-Guru
Copy link
Contributor

AI-Guru commented Apr 8, 2023

Hi!

I have worked with unconditional generation using this fine repo. It is a lot of fun! I will do latent diffusion next. I am already looking forward to it.

Text conditional generation promises a lot of fun. I have a few questions.

  • In the README, in the conditional section, we can read "Text conditioning, one element per batch", this means "one text per waveform" and thus "a batch of texts for a batch of waveforms", right? Not "one text for a batch of waveforms"?

  • I believe latent diffusion and text conditioning to be orthogonal. Is it safe to assume that DiffuserAE would work with text conditioning by just adding the right kwargs?

  • What would be necessary in order to replace the T5 embeddings with something else?

  • What would be the consequences of extending the number of tokens for T5?

This is so cool!

Best,
Tristan

@flavioschneider
Copy link
Member

  1. That's correct
  2. Yes
  3. You'd have to use use_text_conditioning=False and provide your own embedding with embedding=.... See here if you want to make your own plugin for the UNet
  4. More tokens would mean that each sequence at each layer in the UNet would have to cross attend to the provided embedding. This would be a bit slower depending on how many more tokens you have, but possibly carry more information for the UNet.

@SuperiorDtj
Copy link

the num of paras in text condition model is only 562M rather than 857M in mousai paper, is there any extra config in text condition model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants