Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the style encoder maps all the images to the same style code? #28

Closed
ammar-deep opened this issue Dec 8, 2021 · 7 comments
Closed

Comments

@ammar-deep
Copy link

Hello. Could you please give some explanation and advise for the scenario explained below.

I am not using TUNIT but a very similar architecture where I have a GAN jointly trained with a style encoder. The style encoder is trained as a classifier in a supervised setting and its middle layer features are injected as a style code in the generator. During training the validation results are pretty good (which I assume is down to the L1 loss I am using because of the paired supervision).
During inference the style encoder maps all images to a same style code.

The exact same scenario is mentioned in section 3.3. of your paper where you define the style contrastive loss for the generator. Quoting from your paper "This loss guides the generated image G(x, ˜s) to have a style similar to the reference
image x˜ and dissimilar to negative (other) samples. By doing so, we avoid the degenerated solution where the encoder
maps all the images to the same style code of the reconstruction loss [5] based on L1 or L2 norm." Where reference [5] is the starGAN-v2.

In my case I am also using a style classification loss on the style image for the generator however it seems to completely ignore it during inference.

Could you please explain

  1. Why your referred to StarGAN-v2 for this particular scenario?
  2. Can you give an advise so that my style encoder doesn't ignore the style code at inference?
@FriedRonaldo
Copy link
Collaborator

  1. Because StarGAN-v2 uses L1 norm for the reconstruction loss. I think that other works that use L1 or L2 norms might be cited in the same line.

  2. Well... I think it is not an issue related to the training scheme or objectives if the results at the very last iteration look good. I mean L1 loss does not a problem but the issue might be related to the inference.

It might be ...
i) The checkpoint is not properly loaded (especially batch normalization layer)
ii) The images are not properly processed as the training phase (e.g. resizing size, normalization ... )

@ammar-deep
Copy link
Author

  1. Because StarGAN-v2 uses L1 norm for the reconstruction loss. I think that other works that use L1 or L2 norms might be cited in the same line.
    Are you talking about the style reconstruction loss used in StarGAN-v2? If yes then if we use Style reconstruction loss in TUNIT as a replacement to G's style contrastive loss what behavior do you expect? (Will it have the same problem I am facing? as you mentioned in the paper "By doing so, we avoid the degenerated solution where the encoder
    maps all the images to the same style code of the reconstruction loss [5] based on L1 or L2 norm." )

  2. Actually my inference looks fine
    a) style encoder gives around 99% testing accuracy on the classification task when loaded from the checkpoints
    b) images are perfectly processed. :)

@ammar-deep
Copy link
Author

@FriedRonaldo BTW the quality of the generated samples at inference are great they only lack the style 😄

@FriedRonaldo
Copy link
Collaborator

  1. Yes, the style reconstruction loss. In many cases, the results will be similar because the optimization does not always reach a degenerated solution but the contrastive-based loss can avoid the degenerated solution.

  2. I might not understand the issue exactly before. You mean, the outputs reflect the style image very well (If you use a black cat as a style image, then, is the output a black cat?) but the style codes from the different images are the same point. Is it right? The scenario sounds very weird.


After BTW the quality of the generated samples at inference are great they only lack the style

The meaning of "if the results at the very last iteration look good" is not just about the quality but how well the samples reflect the style images.

If you use a black cat as a style image, then, is the output a black cat?

If yes in the training and if not in the inference, more information should be provided to diagnose the issue. Blind diagnosis is somewhat difficult...

@ammar-deep
Copy link
Author

Sorry for causing some confusions.

If you use a black cat as a style image, then, is the output a black cat?
Yes (only in training but not inference). As you mentioned during the later iterations of training phase the output reflects the black cat. However during inference the style codes from the different images map to the same point.

I did an experiment where I overfit the model during testing i.e. gave the model a seen content and seen style image which it perfectly generated during training, but surprisingly it failed during inference as well i.e. style code doesn't match the style image however the content is fine.

If yes in the training and if not in the inference, more information should be provided to diagnose the issue. Blind diagnosis is somewhat difficult
Could you please let me know what information should I provide you?

@FriedRonaldo
Copy link
Collaborator

In this case, the problem is just about the inference, so using the style contrastive loss does not solve the problem.

In my experience, there is an issue in the batch normalization layer and if I use "eval()", the model does not work, but without it, the model works well. It might be related to the normalization layer. Or EMA might raise the problems. I recommand to use the generator that is non-EMA version.

Because it is not related to our source code, I am sorry for that I can not help you more.

@ammar-deep
Copy link
Author

@FriedRonaldo Thank you for helping me this far and giving some ideas. I will try with your feedback hopefully I can resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants