Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about tT_loss #18

Open
ccchobits opened this issue Jan 8, 2023 · 2 comments
Open

Question about tT_loss #18

ccchobits opened this issue Jan 8, 2023 · 2 comments

Comments

@ccchobits
Copy link

ccchobits commented Jan 8, 2023

I am still confused about issue #17. The content of this issue has been duplicated as follow:

There is a tT_loss term in the final loss:
DiffuSeq/diffuseq/gaussian_diffusion.py
Lines 629 to 630 in 901f860

 out_mean, _, _ = self.q_mean_variance(x_start, th.LongTensor([self.num_timesteps - 1]).to(x_start.device)) 
 tT_loss =  mean_flat(out_mean ** 2) 

What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?

As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.

@summmeer
Copy link
Collaborator

Hi,
Yes, tT_loss does not pass the transformer layer, but there are still learnable params, i.e. the params of word embedding (from x_start). We can regard it as a kind of regularization, so in Eq.17 we move it to $R(||x_0||^2)$.

@Dawn-LX
Copy link

Dawn-LX commented Feb 21, 2023

I am still confused about issue #17. The content of this issue has been duplicated as follow:

There is a tT_loss term in the final loss: DiffuSeq/diffuseq/gaussian_diffusion.py Lines 629 to 630 in 901f860

 out_mean, _, _ = self.q_mean_variance(x_start, th.LongTensor([self.num_timesteps - 1]).to(x_start.device)) 
 tT_loss =  mean_flat(out_mean ** 2) 

What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?

As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.

just by the way, it's issue #16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants