You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?
As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.
The text was updated successfully, but these errors were encountered:
Hi,
Yes, tT_loss does not pass the transformer layer, but there are still learnable params, i.e. the params of word embedding (from x_start). We can regard it as a kind of regularization, so in Eq.17 we move it to $R(||x_0||^2)$.
What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?
As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.
I am still confused about issue #17. The content of this issue has been duplicated as follow:
There is a tT_loss term in the final loss:
DiffuSeq/diffuseq/gaussian_diffusion.py
Lines 629 to 630 in 901f860
What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?
As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.
The text was updated successfully, but these errors were encountered: