Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MASK LOSS to 0 #32

Closed
imlixinyang opened this issue Oct 1, 2019 · 20 comments
Closed

MASK LOSS to 0 #32

imlixinyang opened this issue Oct 1, 2019 · 20 comments

Comments

@imlixinyang
Copy link

I apply your code to celeba-hq, but found the loss of attention turn to 0 and so the networks do nothing. Why?
the d_cls is about 10 while g_cls go to 30 but networks seem to never optim this.
Another question is that why you set the parameter of DE fixed?

@affromero
Copy link
Collaborator

Hello,
I need more details here.

  1. Are you using celeba-hq with 256x256?
  2. We discuss the DE fixed in Section 5.1 of our paper.

@imlixinyang
Copy link
Author

yes, i use 256x256 and just fix some difference for the difference of the formats at Celeba.py. The output shows that mask ->0 and G_cls around 30 .

@imlixinyang
Copy link
Author

I've tried to run for several epochs but this problem seems to never improve.

@affromero
Copy link
Collaborator

May I know what kind of labels are you using for that dataset?

@imlixinyang
Copy link
Author

Yes, i use 'Black_Hair', 'Blond_Hair', 'Brown_Hair', 'Male', 'Young', 'Smiling', 'Eyeglasses', 'Goatee', 'Pale_Skin', 'Bangs', the same setting of my model, i'm ready to cite your paper and do some comparison, so it's a little urge.

@affromero
Copy link
Collaborator

Yes but as far as I am aware CelebA-HQ (at least the original implementation from here) does not have any labels.

@affromero
Copy link
Collaborator

Can you send me a screenshot or something regarding the problem, so I can give you a more detailled feedback?

@imlixinyang
Copy link
Author

Some one has collect the mapping from celeba-hq to celeba. So the labels are available somewhere.
I don't know how to upload pictures at github, i can just see that after some iters, the samples show that the mask is all white and translation do nothing but keep all the source image.

@imlixinyang
Copy link
Author

imlixinyang commented Oct 1, 2019

Actually, because of the different format, i used to run successfully once but the order of labels is wrong (csv begins at 1 but txt begins at 0, the first line of all labels). I fixed it and then the training begins to fail.

@affromero
Copy link
Collaborator

affromero commented Oct 1, 2019

Now I see. So it is not that the attention loss goes to zero but it remains saturated: if λ_mask=0.1, then L_attn is saturated at 0.2 and never goes down.This means that the mask is always 1.0, so from the attention equation we have always the real image:

Normally it is a problem of random initialization that should be easily fixed by --seed in a different number. Or try with a larger λ_mask=1.0. For fine-grained translations (tested on CelebA, EmotioNet, and BP4D), it is saturated mostly during the first epoch, and before the end of the first epoch L_attn goes down.

Please try this and let me know.

@imlixinyang
Copy link
Author

Thank you, i'll run for one night to see if this problem can be solved.

@imlixinyang
Copy link
Author

Hello, i've tried to run for one night, around 18 epochs with another seed, but this problem is still not improved. This time the G_cls is about 40 and D_cls about 4, why?

@imlixinyang
Copy link
Author

I refreshed all files and try to find the reason.
SMIT-master/solver.py", line 373, in label2embedding
assert target.max() == 1 and target.min() == 0
In last time, i just delete this. What is this for? Does it matter?

@imlixinyang
Copy link
Author

imlixinyang commented Oct 2, 2019

And i do think the attention loss is not what i thought it is. In Ganimation, the output is:
$$x_f = (1 - A) x_f + A x_r$$
and attention loss is $|A|$.
so the the change part is forced to be small, but in your paper, it seems to the opposite. I don't know if there is something i missed.

@affromero
Copy link
Collaborator

I do not know what you are doing.

assert target.max() == 1 and target.min() == 0

This is to ensure you are inserting a one hot encoding vector. If you are not inserting it, can I know what kind of labels are you using? If that was the problem it would have raised an error.

And i do think the attention loss is not what i thought it is. In Ganimation, the output is:
$$x_f = (1 - A) x_f + A x_r$$
and attention loss is $|A|$.

Why do you say so? This is exactly what we do as we mention it in the paper in section 3.2.1 and in the code here and here.

I do not know if you were able to fix it. Without screenshots or more detailled instructions to reproduce it I cannot help you much.

@imlixinyang
Copy link
Author

Yes, i know your code is doing what you show in paper, but this is different from the attention loss from other models, e.g. GANimation. In their papers, the change part should be minimized.
I have fix this by set $x_f = M x_f + (1 - M) x_r$. And the training seems to be improved.
Please check this point carefully.
Actually when i first use this loss function, i made a mistake, too. So it doesn't matter, and i really appreciate your work!
I would email you if i can not reproduce in this dataset still.

@affromero
Copy link
Collaborator

Perfect. Let me know if anything good or bad happens and feel free to close the issue or keep it open until we fix this.

@imlixinyang
Copy link
Author

Okay, i would feedback timely. And thank you so much for your timely reply, too.

@imlixinyang
Copy link
Author

Hello, this time the results are good enough for me to compare.
It's interesting that I corrected the mask loss to minimize the cover part but the cover part become larger than maximizing it.
And I do think it's necessary for you to correct this in your official code, too. It it simple but benefits the training a lot.

@affromero
Copy link
Collaborator

Thanks for your nice feedback. I will have that in mind from now on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants