Question about tT_loss #18

ccchobits · 2023-01-08T10:16:39Z

I am still confused about issue #17. The content of this issue has been duplicated as follow:

There is a tT_loss term in the final loss:
DiffuSeq/diffuseq/gaussian_diffusion.py
Lines 629 to 630 in 901f860

 out_mean, _, _ = self.q_mean_variance(x_start, th.LongTensor([self.num_timesteps - 1]).to(x_start.device)) 
 tT_loss =  mean_flat(out_mean ** 2)

What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?

As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.

The text was updated successfully, but these errors were encountered:

summmeer · 2023-01-13T03:46:17Z

Hi,
Yes, tT_loss does not pass the transformer layer, but there are still learnable params, i.e. the params of word embedding (from x_start). We can regard it as a kind of regularization, so in Eq.17 we move it to $R(||x_0||^2)$.

Dawn-LX · 2023-02-21T09:13:36Z

I am still confused about issue #17. The content of this issue has been duplicated as follow:

There is a tT_loss term in the final loss: DiffuSeq/diffuseq/gaussian_diffusion.py Lines 629 to 630 in 901f860
 out_mean, _, _ = self.q_mean_variance(x_start, th.LongTensor([self.num_timesteps - 1]).to(x_start.device)) 
 tT_loss =  mean_flat(out_mean ** 2) 
What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?

As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.

just by the way, it's issue #16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about tT_loss #18

Question about tT_loss #18

ccchobits commented Jan 8, 2023 •

edited

Loading

summmeer commented Jan 13, 2023

Dawn-LX commented Feb 21, 2023

Question about tT_loss #18

Question about tT_loss #18

Comments

ccchobits commented Jan 8, 2023 • edited Loading

summmeer commented Jan 13, 2023

Dawn-LX commented Feb 21, 2023

ccchobits commented Jan 8, 2023 •

edited

Loading