New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient flow #27
Comments
Thanks! We use two different optimizers for the autoencoder part and the discriminator part, respectively. Furthermore, both optimizers have different optimiziation steps specified by the |
thanks for getting back to me on this one. I understand the procedure you described, the optimizer index being owed to the fact that you use Lightning and all. However, I am still convinced that the critic is receiving gradients in the backward pass before performing the step of the first optimizer since there is no disabling of its gradients anywhere before. That would mean that when its the critics turn to get its weights updated it would do this with gradients from both the backward pass through it all the way down to the VQVAE and from its own backward pass in the second stage of optimizing. Unless I am missing sth. here (maybe lightning does something behind the curtain) I would find this rather peculiar since usually the critics weights are frozen when it is just judging the generators outcome... |
That is not correct, since the reconstructions are detached from the computation graph before passing them to the discriminator: if cond is None:
logits_real = self.discriminator(inputs.contiguous().detach())
logits_fake = self.discriminator(reconstructions.contiguous().detach())
else:
logits_real = self.discriminator(torch.cat((inputs.contiguous().detach(), cond), dim=1))
logits_fake = self.discriminator(torch.cat((reconstructions.contiguous().detach(), cond), dim=1)) So the gradients just flow through the discriminator and then stop at the point where the inputs were detached, and no need to freeze the model. (And of course, when training the vqVAE you cannot freeze the discriminator since you want the gradients to flow up to the model) |
@rromb Simple question, |
I was asking myself the same thing. From what I understand from the lightning docs, they are called sequentially one after another in turn. At least unless users choose to overwrite this behaviour which I did not find in this repo. |
@CDitzel Umm, I see. Thank you for your answer :) |
The code defining the two optimizers is in the VQGAN pl module. Just in case somone has the same issue. taming-transformers/taming/models/vqgan.py Line 121 in 2426893
|
Pytorch-lighting do |
Hi guys, first of all, impressive work you have donehere.
Skimming through the repo I noticed that the critic/discriminator receives gradients through both losses on account of it not having its gradients frozen, when the autoencoder part is optimized. Do I see that correctly? and if so, why did you choose to do that?
The text was updated successfully, but these errors were encountered: