Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tips on Small Complex Datasets #25

Closed
jlmarrugom opened this issue Dec 11, 2021 · 14 comments
Closed

Tips on Small Complex Datasets #25

jlmarrugom opened this issue Dec 11, 2021 · 14 comments

Comments

@jlmarrugom
Copy link

Hi, I'm very impressed with the results of this paper and also the insightful approach to gain a significant boost in computational efficiency.

Right now I'm testing the model with a custom dataset of humans in various poses, families, and people in general, and I noticed that the textures, the colors, and the image overall is really good compared with other models, also, it trains in 1/10 of the time. But, the generated faces don't look as good as the other aspects of the image. Here is an example of a generated grid at kimg 200:

image

My question is: How can I improve the results, especially on the faces?

Currently, I'm using the FastGAN backbone because the dataset is around 2100 images of 256x256, 1 GPU, mirror=1, and the other parameters with default values.

@Mut1nyJD
Copy link

Funny seeing the same effect. I have single people posing dataset though and my dataset is a lot bigger 40k. But the faces so far are all a total mess the rest is not bad. But the FID score feel not right to me.

Especially if I compare it to other GANs which have a higher FID but the overall images looks better (especially the face).

Btw I am wondering if some local attention in the generator would help.

@jlmarrugom
Copy link
Author

jlmarrugom commented Dec 11, 2021

I've read in another issue, that using the styleGAN backbone is useful for the quality of the faces, but the sec/kimg is a little bigger and that adds up in all the training, making it slower, and in the paper, it seems that it needs more data and training time to converge, so I'm not sure if that's the only solution.

Here is some progress on the generation with the FastGAN Backbone, kimg=576, it seems like a little improvement with more training time.
image

@xl-sr
Copy link
Contributor

xl-sr commented Dec 13, 2021

Hi :)

First of all, the FastGAN backbone seems to have a harder time with faces as we showed in the paper. The StyleGAN samples already look better to me, so maybe it is simply a matter of training longer.

Your dataset seems to be pretty hard as you have unaligned persons, high diversity, and only few images. Getting details such as faces right on such a dataset is hard, and might not be possible (yet). You can notice this when looking at eg. samples by BigGAN on ImageNet classes with humans, which is similar to your setting.

One thing you could try is to initialize the SG2 backbone with pretrained weights, eg. from FFHQ.

@jlmarrugom
Copy link
Author

Can I use Pretrained models that were training without a projected architecture?, i.e. the ones on the StyleGAN2 repo? Is it the same for fastGAN?

@xl-sr
Copy link
Contributor

xl-sr commented Dec 13, 2021

Yes, you can simply use the models from the official StyleGAN2 repo, they are compatible.

Of course, you should only copy the weight for G and G_ema, the discriminator should be initialized randomly. I haven't tried this myself, but the PG discriminators should be able to catch up very quickly since they are so lightweight.

@xl-sr
Copy link
Contributor

xl-sr commented Dec 29, 2021

closing this now, feel free to update/reopen with new results.

@xl-sr xl-sr closed this as completed Dec 29, 2021
@Mut1nyJD
Copy link

Hi :)

First of all, the FastGAN backbone seems to have a harder time with faces as we showed in the paper. The StyleGAN samples already look better to me, so maybe it is simply a matter of training longer.

Any idea why it struggles with faces so much? Seems very odd is it because of the symmetry but then I would expect you would see similar struggles in LSUN Bedroom, too.

I tried a few different settings even added local self attention to the generator but unfortunately that did not help much at all.

So my guess the problem is somewhere in the discriminator maybe in the feature network it uses itself?

@xl-sr
Copy link
Contributor

xl-sr commented Dec 29, 2021

If you look at samples of the original FastGAN (without PG), the samples on FFHQ are already quite a bit worse than the ones of StyleGAN. Simply adding attention layers does not lead to improvements in my experience.

It is definitely that the problem might be in the feature network itself so finding a better alternative is definitely an interesting research direction. However, in this case, it might just be a very difficult dataset.

@Mut1nyJD
Copy link

If you look at samples of the original FastGAN (without PG), the samples on FFHQ are already quite a bit worse than the ones of StyleGAN. Simply adding attention layers does not lead to improvements in my experience.

It is definitely that the problem might be in the feature network itself so finding a better alternative is definitely an interesting research direction. However, in this case, it might just be a very difficult dataset.

Hmm interesting good point I will have a look with pure FastGAN implementation and see what happens it is just funny I've not seen such a behavior with any GAN architecture before which is extremely weird. Either the whole image is completely bad or ok/good. But in yours it seems faces are completely bad, but the rest is actually fine.

Also the reason why I think the discriminator is the problem because I added lightweight gan as an additional generator and I see the same behavior coupled with your discriminator architecture there too. But I will give pure FastGAN a try.

Maybe also worth trying is the same feature network but maybe trained on CelebA instead

@xl-sr
Copy link
Contributor

xl-sr commented Dec 30, 2021

yes, as I said, looking into specialized feature networks is definitely worth a try :)

@jlmarrugom
Copy link
Author

Hi, I'm back with some results.

I've confirmed that the metrics in the projected architecture despite having lower values than the NVIDIA Stylegan2-ADA model or StyleGAN3, don't produce images of the corresponding quality.

For example, a Projected FastGAN_Lite model with Kid50k_full 0.002046 produces the following results:
image

A StyleGAN2-ADA from the NVIDIA repo with Kid50k_full of 0.007476 produces the following results:
image

I'm not sure if I'm wrong, but for me, the quality of the faces of StyleGAN2-ADA is a little better despite having a higher KID.

Another thing is that I choose to use the KID (Kernel Inception Distance) instead of the FID because in the StyleGAN3 paper it is said that it represents better the quality of the images on small datasets, and it matches my experience with the outputs that I see, the lower KID, the better the results.

Neither using projected GANs or StyleGAN2-ADA or StyleGAN3-t gives me the results that I wanted to get on the faces, this could be due to the size of my dataset (2k images, 4k images using mirror=1).

So the final approach that I used in my project was to choose the model with the best metrics against itself, the best Stylegan2-ADA, the best Projected GAN, and compare the two outputs. This time I chose StyleGAN2-ADA because of the results that I showed you above. After that I found this model: https://github.com/yangxy/GPEN, It works on facial reconstruction, so I use it to reconstruct faces. To get a good result with the synthetic faces, you should downscale the images from 256 to 128, pass the image through the GPEN model and upscale it again to 256.

With the above procedure I got the following results on the full image:
image

Also, the GPEN model, enhance face by face and produces output:
image
image
image

I keep the doubt about the metrics calculation, I don't know yet why the same metric gives different results with the same dataset.

I hope to facilitate your work if you are facing the same problem as I was, for now, I think that the combined result pleases me.

@woctezuma
Copy link

woctezuma commented Jan 3, 2022

Very interesting! Thanks!

For info, I like GPEN as well. However, there is another model which works quite well, not necessarily better, nor worse, as it is hard to say (GPEN tends to add black marks on the face, but performs better for teeth in my experience). It is called TencentARC/GFPGAN You could try it on a sample and see which one of GPEN and GFPGAN you prefer.

@Mut1nyJD
Copy link

Mut1nyJD commented Jan 3, 2022

@jlmarrugom
Interesting hack!

Yes I agree something feels a bit off with the metrics in projected-fastgan. I got same low metrics which do not reflect the overall quality I've seen on other GANs with much higher metrics. Anyway my tries with pure FastGAN so far had been rather unsuccessful. So far it hasn't produced anything reasonable at all so might have to play a bit with the hyperparameters.

@xl-sr
Copy link
Contributor

xl-sr commented Jan 4, 2022

thanks for the insights, very interesting! :)

My thoughts on the metrics: On one hand it is hard to judge the diversity of the samples, which is where projected GANs are usually getting the gains, FID/KID is favoring diverse samples over high fidelity. On the other hand, the samples for PG show distorted faces, whereas for StyleGAN2-ADA you seem to get disembodied floating faces quite frequently. It seems that KID penalizes these artifacts differently. By the way, the disconnect between metric and sample quality on face datasets is something we addressed in our limitation section in the paper.

As I mentioned earlier, it would be interesting to try a face/body-specific feature network in your case. As you have shown, a network trained specifically for face reconstruction yields unsurprisingly better results,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants