Paper & implementation differences #6

man-sean · 2023-02-05T09:28:18Z

Hi,
There are a few differences between the paper and this repository and it will be wonderful if you could clarify for me the reasons behind them:

The reported gaussain-noisy experiments in the paper use sigma_y=0.05, and indeed in the config files config['noise']['sigma']=0.05.
But while the images are stretchered from [0,1] to [-1,1], the sigma is unchanged – meaning that in practice the noise added is with std sigma/2, i.e. y_n is cleaner compared to the reported settings in the paper.
This can be easily checked by computing torch.std(y-yn) after the creation of y and y_n in sample_condition.py.
The paper defines the step-size scalar as a constant divided by the norm of the gradient (Appendix C.2), meaning that we always normalize the gradient before scaling it.
In the code, the constant is defined in config['conditioning']['params']['scale'] and used in PosteriorSampling.conditioning() to scale the gradient, but we never normalized the gradient in the first place (in PosteriorSampling.grad_and_value() for example).
By adding the gradient normalization the method seems to break.
For the gaussian FFHQ-SRx4 case, Appendix D.1 defines the scale as 1.0, but configs/super_resolution_config.yaml uses 0.3.

Thank you for your time and effort!

The text was updated successfully, but these errors were encountered:

berthyf96 · 2023-02-08T20:10:10Z

For (2), I think the authors apply the normalization factor before taking the gradient. If you look at ConditioningMethod.grad_and_value (here), they take the gradient of the norm, not the norm squared.

I believe there's another difference between Alg. 1 of the paper and the code. In EpsilonXMeanProcessor.predict_xstart (here), the coefficient applied to the score-model output is different from the coefficient in line 4 of Alg. 1. In the paper, the coefficient is $(1-\bar{\alpha}_i)/\sqrt{\bar{\alpha}_i}$, but in the code, it is $-1/\sqrt{\bar{\alpha}_i-1}$.

claroche-r · 2023-03-02T08:51:55Z

@berthyf96, for your second point regarding "EpsilonXMeanProcessor.predict_xstart", I also did not understand the difference until I realized that the score function $\widehat{s}(x_t)$ associated with a noise predictor $\epsilon_\theta(x_t)$ is:
$$\widehat{s}(x_t) = \nabla_{x_t} \log p_\theta(x_t) = - \frac{1}{\sqrt{1-\bar{\alpha_t}}} \epsilon_\theta(x_t) $$
See Equation (11) here.
Injecting this result into the expression of $\widehat{x}_0$ of Alg 1 gives the implemented results.

berthyf96 · 2023-03-02T20:15:28Z

@claroche-r thanks so much for clarifying that!

Mally-cj · 2024-04-27T04:37:31Z

thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper & implementation differences #6

Paper & implementation differences #6

man-sean commented Feb 5, 2023 •

edited

Loading

berthyf96 commented Feb 8, 2023

claroche-r commented Mar 2, 2023 •

edited

Loading

berthyf96 commented Mar 2, 2023

Mally-cj commented Apr 27, 2024

Paper & implementation differences #6

Paper & implementation differences #6

Comments

man-sean commented Feb 5, 2023 • edited Loading

berthyf96 commented Feb 8, 2023

claroche-r commented Mar 2, 2023 • edited Loading

berthyf96 commented Mar 2, 2023

Mally-cj commented Apr 27, 2024

man-sean commented Feb 5, 2023 •

edited

Loading

claroche-r commented Mar 2, 2023 •

edited

Loading