Tips on Small Complex Datasets #25

jlmarrugom · 2021-12-11T01:38:03Z

Hi, I'm very impressed with the results of this paper and also the insightful approach to gain a significant boost in computational efficiency.

Right now I'm testing the model with a custom dataset of humans in various poses, families, and people in general, and I noticed that the textures, the colors, and the image overall is really good compared with other models, also, it trains in 1/10 of the time. But, the generated faces don't look as good as the other aspects of the image. Here is an example of a generated grid at kimg 200:

My question is: How can I improve the results, especially on the faces?

Currently, I'm using the FastGAN backbone because the dataset is around 2100 images of 256x256, 1 GPU, mirror=1, and the other parameters with default values.

Mut1nyJD · 2021-12-11T11:49:50Z

Funny seeing the same effect. I have single people posing dataset though and my dataset is a lot bigger 40k. But the faces so far are all a total mess the rest is not bad. But the FID score feel not right to me.

Especially if I compare it to other GANs which have a higher FID but the overall images looks better (especially the face).

Btw I am wondering if some local attention in the generator would help.

jlmarrugom · 2021-12-11T12:56:36Z

I've read in another issue, that using the styleGAN backbone is useful for the quality of the faces, but the sec/kimg is a little bigger and that adds up in all the training, making it slower, and in the paper, it seems that it needs more data and training time to converge, so I'm not sure if that's the only solution.

Here is some progress on the generation with the FastGAN Backbone, kimg=576, it seems like a little improvement with more training time.

xl-sr · 2021-12-13T12:04:15Z

Hi :)

First of all, the FastGAN backbone seems to have a harder time with faces as we showed in the paper. The StyleGAN samples already look better to me, so maybe it is simply a matter of training longer.

Your dataset seems to be pretty hard as you have unaligned persons, high diversity, and only few images. Getting details such as faces right on such a dataset is hard, and might not be possible (yet). You can notice this when looking at eg. samples by BigGAN on ImageNet classes with humans, which is similar to your setting.

One thing you could try is to initialize the SG2 backbone with pretrained weights, eg. from FFHQ.

jlmarrugom · 2021-12-13T13:02:17Z

Can I use Pretrained models that were training without a projected architecture?, i.e. the ones on the StyleGAN2 repo? Is it the same for fastGAN?

xl-sr · 2021-12-13T15:08:27Z

Yes, you can simply use the models from the official StyleGAN2 repo, they are compatible.

Of course, you should only copy the weight for G and G_ema, the discriminator should be initialized randomly. I haven't tried this myself, but the PG discriminators should be able to catch up very quickly since they are so lightweight.

xl-sr · 2021-12-29T10:13:12Z

closing this now, feel free to update/reopen with new results.

Mut1nyJD · 2021-12-29T12:08:30Z

Hi :)

First of all, the FastGAN backbone seems to have a harder time with faces as we showed in the paper. The StyleGAN samples already look better to me, so maybe it is simply a matter of training longer.

Any idea why it struggles with faces so much? Seems very odd is it because of the symmetry but then I would expect you would see similar struggles in LSUN Bedroom, too.

I tried a few different settings even added local self attention to the generator but unfortunately that did not help much at all.

So my guess the problem is somewhere in the discriminator maybe in the feature network it uses itself?

xl-sr · 2021-12-29T15:05:46Z

If you look at samples of the original FastGAN (without PG), the samples on FFHQ are already quite a bit worse than the ones of StyleGAN. Simply adding attention layers does not lead to improvements in my experience.

It is definitely that the problem might be in the feature network itself so finding a better alternative is definitely an interesting research direction. However, in this case, it might just be a very difficult dataset.

Mut1nyJD · 2021-12-30T10:30:20Z

If you look at samples of the original FastGAN (without PG), the samples on FFHQ are already quite a bit worse than the ones of StyleGAN. Simply adding attention layers does not lead to improvements in my experience.

It is definitely that the problem might be in the feature network itself so finding a better alternative is definitely an interesting research direction. However, in this case, it might just be a very difficult dataset.

Hmm interesting good point I will have a look with pure FastGAN implementation and see what happens it is just funny I've not seen such a behavior with any GAN architecture before which is extremely weird. Either the whole image is completely bad or ok/good. But in yours it seems faces are completely bad, but the rest is actually fine.

Also the reason why I think the discriminator is the problem because I added lightweight gan as an additional generator and I see the same behavior coupled with your discriminator architecture there too. But I will give pure FastGAN a try.

Maybe also worth trying is the same feature network but maybe trained on CelebA instead

xl-sr · 2021-12-30T12:24:02Z

yes, as I said, looking into specialized feature networks is definitely worth a try :)

jlmarrugom · 2022-01-03T18:11:22Z

Hi, I'm back with some results.

I've confirmed that the metrics in the projected architecture despite having lower values than the NVIDIA Stylegan2-ADA model or StyleGAN3, don't produce images of the corresponding quality.

For example, a Projected FastGAN_Lite model with Kid50k_full 0.002046 produces the following results:

A StyleGAN2-ADA from the NVIDIA repo with Kid50k_full of 0.007476 produces the following results:

I'm not sure if I'm wrong, but for me, the quality of the faces of StyleGAN2-ADA is a little better despite having a higher KID.

Another thing is that I choose to use the KID (Kernel Inception Distance) instead of the FID because in the StyleGAN3 paper it is said that it represents better the quality of the images on small datasets, and it matches my experience with the outputs that I see, the lower KID, the better the results.

Neither using projected GANs or StyleGAN2-ADA or StyleGAN3-t gives me the results that I wanted to get on the faces, this could be due to the size of my dataset (2k images, 4k images using mirror=1).

So the final approach that I used in my project was to choose the model with the best metrics against itself, the best Stylegan2-ADA, the best Projected GAN, and compare the two outputs. This time I chose StyleGAN2-ADA because of the results that I showed you above. After that I found this model: https://github.com/yangxy/GPEN, It works on facial reconstruction, so I use it to reconstruct faces. To get a good result with the synthetic faces, you should downscale the images from 256 to 128, pass the image through the GPEN model and upscale it again to 256.

With the above procedure I got the following results on the full image:

Also, the GPEN model, enhance face by face and produces output:

I keep the doubt about the metrics calculation, I don't know yet why the same metric gives different results with the same dataset.

I hope to facilitate your work if you are facing the same problem as I was, for now, I think that the combined result pleases me.

woctezuma · 2022-01-03T18:38:35Z

Very interesting! Thanks!

For info, I like GPEN as well. However, there is another model which works quite well, not necessarily better, nor worse, as it is hard to say (GPEN tends to add black marks on the face, but performs better for teeth in my experience). It is called TencentARC/GFPGAN You could try it on a sample and see which one of GPEN and GFPGAN you prefer.

Mut1nyJD · 2022-01-03T22:04:49Z

@jlmarrugom
Interesting hack!

Yes I agree something feels a bit off with the metrics in projected-fastgan. I got same low metrics which do not reflect the overall quality I've seen on other GANs with much higher metrics. Anyway my tries with pure FastGAN so far had been rather unsuccessful. So far it hasn't produced anything reasonable at all so might have to play a bit with the hyperparameters.

xl-sr · 2022-01-04T08:04:34Z

thanks for the insights, very interesting! :)

My thoughts on the metrics: On one hand it is hard to judge the diversity of the samples, which is where projected GANs are usually getting the gains, FID/KID is favoring diverse samples over high fidelity. On the other hand, the samples for PG show distorted faces, whereas for StyleGAN2-ADA you seem to get disembodied floating faces quite frequently. It seems that KID penalizes these artifacts differently. By the way, the disconnect between metric and sample quality on face datasets is something we addressed in our limitation section in the paper.

As I mentioned earlier, it would be interesting to try a face/body-specific feature network in your case. As you have shown, a network trained specifically for face reconstruction yields unsurprisingly better results,

xl-sr closed this as completed Dec 29, 2021

woctezuma mentioned this issue Dec 30, 2021

Mode Collapse on a New Dataset #34

Closed

xl-sr mentioned this issue Feb 3, 2022

Not an issue - question on out of the box use case autonomousvision/stylegan-xl#1

Closed

woctezuma mentioned this issue Jun 21, 2022

Poor image quality when running the sample inference command autonomousvision/stylegan-xl#68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tips on Small Complex Datasets #25

Tips on Small Complex Datasets #25

jlmarrugom commented Dec 11, 2021

Mut1nyJD commented Dec 11, 2021

jlmarrugom commented Dec 11, 2021 •

edited

xl-sr commented Dec 13, 2021

jlmarrugom commented Dec 13, 2021

xl-sr commented Dec 13, 2021

xl-sr commented Dec 29, 2021

Mut1nyJD commented Dec 29, 2021

xl-sr commented Dec 29, 2021 •

edited

Mut1nyJD commented Dec 30, 2021

xl-sr commented Dec 30, 2021

jlmarrugom commented Jan 3, 2022

woctezuma commented Jan 3, 2022 •

edited

Mut1nyJD commented Jan 3, 2022

xl-sr commented Jan 4, 2022

Tips on Small Complex Datasets #25

Tips on Small Complex Datasets #25

Comments

jlmarrugom commented Dec 11, 2021

Mut1nyJD commented Dec 11, 2021

jlmarrugom commented Dec 11, 2021 • edited

xl-sr commented Dec 13, 2021

jlmarrugom commented Dec 13, 2021

xl-sr commented Dec 13, 2021

xl-sr commented Dec 29, 2021

Mut1nyJD commented Dec 29, 2021

xl-sr commented Dec 29, 2021 • edited

Mut1nyJD commented Dec 30, 2021

xl-sr commented Dec 30, 2021

jlmarrugom commented Jan 3, 2022

woctezuma commented Jan 3, 2022 • edited

Mut1nyJD commented Jan 3, 2022

xl-sr commented Jan 4, 2022

jlmarrugom commented Dec 11, 2021 •

edited

xl-sr commented Dec 29, 2021 •

edited

woctezuma commented Jan 3, 2022 •

edited