Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is a negative_slope=5 for leakyReLU is introduced here? #71

Open
fjiang9 opened this issue Jan 15, 2021 · 1 comment
Open

Why is a negative_slope=5 for leakyReLU is introduced here? #71

fjiang9 opened this issue Jan 15, 2021 · 1 comment

Comments

@fjiang9
Copy link

fjiang9 commented Jan 15, 2021

pulse/PULSE.py

Line 44 in 40cacb9

latent_out = torch.nn.LeakyReLU(5)(mapping(latent))

@MikeLasz
Copy link

MikeLasz commented Oct 5, 2023

Hi @fjiang9,
I stumbled across the same question... What I found to be, most likely, related to that question is Section 2.4 in the appendix of the paper:

StyleGAN begins with a uniform distribution on $S^{511} \subset \mathbb{R}^{512}$, which is
pushed forward by the mapping network to a transformed probability distribution
over $\mathbb{R}^{512}$. Therefore, another requirement to ensure that $S([v_1 , ..., v_{18}], \eta)$ is a
realistic image is that each $v_i$ is sampled from this pushforward distribution.
While analyzing this distribution, we found that we could transform this back to a distribution on
the unit sphere without the mapping network by simply applying a single linear layer with a leaky-ReLU activation–
an entirely invertible transformation. We therefore inverted this function to obtain a sampling procedure for this
distribution. First, we generate a latent w from $S^{511}$, and then apply the inverse of our transformation.

While I do not fully understand this paragraph, I believe that the leakyRelu(5) is here because it is the inverse of the leakyRelu(0.2) used in here, for instance.

And the reason why they are doing this seems to be that samples

$$\text{lReLU}_{0.2} \biggl( \text{avg} \bigl( \text{lReLU}_{5}(w) \bigr) + w_{\text{input}} * \text{std} \bigl( \text{lReLU}_{5}(w) \bigr) \biggr)$$

are closer to samples from the actual distribution of the mapping network. According to the cited text in the paper, it seems to be more an empirical observation, however.

I hope that I could help, and I am happy about further clarifications on that.

Best regards
Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants