Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper & implementation differences #6

Open
man-sean opened this issue Feb 5, 2023 · 4 comments
Open

Paper & implementation differences #6

man-sean opened this issue Feb 5, 2023 · 4 comments

Comments

@man-sean
Copy link

man-sean commented Feb 5, 2023

Hi,
There are a few differences between the paper and this repository and it will be wonderful if you could clarify for me the reasons behind them:

  1. The reported gaussain-noisy experiments in the paper use sigma_y=0.05, and indeed in the config files config['noise']['sigma']=0.05.
    But while the images are stretchered from [0,1] to [-1,1], the sigma is unchanged – meaning that in practice the noise added is with std sigma/2, i.e. y_n is cleaner compared to the reported settings in the paper.
    This can be easily checked by computing torch.std(y-yn) after the creation of y and y_n in sample_condition.py.
  2. The paper defines the step-size scalar as a constant divided by the norm of the gradient (Appendix C.2), meaning that we always normalize the gradient before scaling it.
    In the code, the constant is defined in config['conditioning']['params']['scale'] and used in PosteriorSampling.conditioning() to scale the gradient, but we never normalized the gradient in the first place (in PosteriorSampling.grad_and_value() for example).
    By adding the gradient normalization the method seems to break.
  3. For the gaussian FFHQ-SRx4 case, Appendix D.1 defines the scale as 1.0, but configs/super_resolution_config.yaml uses 0.3.

Thank you for your time and effort!

@berthyf96
Copy link

For (2), I think the authors apply the normalization factor before taking the gradient. If you look at ConditioningMethod.grad_and_value (here), they take the gradient of the norm, not the norm squared.

I believe there's another difference between Alg. 1 of the paper and the code. In EpsilonXMeanProcessor.predict_xstart (here), the coefficient applied to the score-model output is different from the coefficient in line 4 of Alg. 1. In the paper, the coefficient is $(1-\bar{\alpha}_i)/\sqrt{\bar{\alpha}_i}$, but in the code, it is $-1/\sqrt{\bar{\alpha}_i-1}$.

@claroche-r
Copy link

claroche-r commented Mar 2, 2023

@berthyf96, for your second point regarding "EpsilonXMeanProcessor.predict_xstart", I also did not understand the difference until I realized that the score function $\widehat{s}(x_t)$ associated with a noise predictor $\epsilon_\theta(x_t)$ is:
$$\widehat{s}(x_t) = \nabla_{x_t} \log p_\theta(x_t) = - \frac{1}{\sqrt{1-\bar{\alpha_t}}} \epsilon_\theta(x_t) $$
See Equation (11) here.
Injecting this result into the expression of $\widehat{x}_0$ of Alg 1 gives the implemented results.

@berthyf96
Copy link

@claroche-r thanks so much for clarifying that!

@Mally-cj
Copy link

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants