Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PatchGAN loss calculated incorrectly (perhaps) #26

Open
alexander-soare opened this issue May 21, 2020 · 3 comments
Open

PatchGAN loss calculated incorrectly (perhaps) #26

alexander-soare opened this issue May 21, 2020 · 3 comments

Comments

@alexander-soare
Copy link

Thanks for the great work @aitorzip
I think I've spotted a mistake in the way the GAN loss is calculated.

In the forward of your Discriminator class you average the activations first, and then you calculate the loss based on the resulting value. Actually, I think you should calculate the per neuron loss first then average the result of that. When you do the math one way vs the other you will find the losses are not the same.

I noticed you haven't committed since 2017 but if you see this and agree I'd be happy to make a PR.

@naga-karthik
Copy link

@alexander-soare I was wondering whether if it makes too much of a difference in the results? Have you tried comparing the two?

@alexander-soare
Copy link
Author

@naga-karthik it's been a while since but I think I remember determining that the impact was equivalent to a scalar multiple of the gradient, therefore it only affects the learning rate parameter. Don't take me word for it though as it's been a while.

@jbeck9
Copy link

jbeck9 commented Jan 24, 2022

@naga-karthik it's been a while since but I think I remember determining that the impact was equivalent to a scalar multiple of the gradient, therefore it only affects the learning rate parameter. Don't take me word for it though as it's been a while.

Late to the party here but you are correct. This really needs to be changed. The current implementation misses the point of the patchGAN paper entirely. If you are going to average the patches before performing mse, you might has well add a dense layer and make a global discriminator because you just obscured all the local information.

Unfortunately, this is not just a matter of scaling the learning rate. This definitely affects the gradient propogation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants