New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tips on Small Complex Datasets #25
Comments
Funny seeing the same effect. I have single people posing dataset though and my dataset is a lot bigger 40k. But the faces so far are all a total mess the rest is not bad. But the FID score feel not right to me. Especially if I compare it to other GANs which have a higher FID but the overall images looks better (especially the face). Btw I am wondering if some local attention in the generator would help. |
Hi :) First of all, the FastGAN backbone seems to have a harder time with faces as we showed in the paper. The StyleGAN samples already look better to me, so maybe it is simply a matter of training longer. Your dataset seems to be pretty hard as you have unaligned persons, high diversity, and only few images. Getting details such as faces right on such a dataset is hard, and might not be possible (yet). You can notice this when looking at eg. samples by BigGAN on ImageNet classes with humans, which is similar to your setting. One thing you could try is to initialize the SG2 backbone with pretrained weights, eg. from FFHQ. |
Can I use Pretrained models that were training without a projected architecture?, i.e. the ones on the StyleGAN2 repo? Is it the same for fastGAN? |
Yes, you can simply use the models from the official StyleGAN2 repo, they are compatible. Of course, you should only copy the weight for |
closing this now, feel free to update/reopen with new results. |
Any idea why it struggles with faces so much? Seems very odd is it because of the symmetry but then I would expect you would see similar struggles in LSUN Bedroom, too. I tried a few different settings even added local self attention to the generator but unfortunately that did not help much at all. So my guess the problem is somewhere in the discriminator maybe in the feature network it uses itself? |
If you look at samples of the original FastGAN (without PG), the samples on FFHQ are already quite a bit worse than the ones of StyleGAN. Simply adding attention layers does not lead to improvements in my experience. It is definitely that the problem might be in the feature network itself so finding a better alternative is definitely an interesting research direction. However, in this case, it might just be a very difficult dataset. |
Hmm interesting good point I will have a look with pure FastGAN implementation and see what happens it is just funny I've not seen such a behavior with any GAN architecture before which is extremely weird. Either the whole image is completely bad or ok/good. But in yours it seems faces are completely bad, but the rest is actually fine. Also the reason why I think the discriminator is the problem because I added lightweight gan as an additional generator and I see the same behavior coupled with your discriminator architecture there too. But I will give pure FastGAN a try. Maybe also worth trying is the same feature network but maybe trained on CelebA instead |
yes, as I said, looking into specialized feature networks is definitely worth a try :) |
Hi, I'm back with some results. I've confirmed that the metrics in the projected architecture despite having lower values than the NVIDIA Stylegan2-ADA model or StyleGAN3, don't produce images of the corresponding quality. For example, a Projected FastGAN_Lite model with Kid50k_full 0.002046 produces the following results: A StyleGAN2-ADA from the NVIDIA repo with Kid50k_full of 0.007476 produces the following results: I'm not sure if I'm wrong, but for me, the quality of the faces of StyleGAN2-ADA is a little better despite having a higher KID. Another thing is that I choose to use the KID (Kernel Inception Distance) instead of the FID because in the StyleGAN3 paper it is said that it represents better the quality of the images on small datasets, and it matches my experience with the outputs that I see, the lower KID, the better the results. Neither using projected GANs or StyleGAN2-ADA or StyleGAN3-t gives me the results that I wanted to get on the faces, this could be due to the size of my dataset (2k images, 4k images using mirror=1). So the final approach that I used in my project was to choose the model with the best metrics against itself, the best Stylegan2-ADA, the best Projected GAN, and compare the two outputs. This time I chose StyleGAN2-ADA because of the results that I showed you above. After that I found this model: https://github.com/yangxy/GPEN, It works on facial reconstruction, so I use it to reconstruct faces. To get a good result with the synthetic faces, you should downscale the images from 256 to 128, pass the image through the GPEN model and upscale it again to 256. With the above procedure I got the following results on the full image: Also, the GPEN model, enhance face by face and produces output: I keep the doubt about the metrics calculation, I don't know yet why the same metric gives different results with the same dataset. I hope to facilitate your work if you are facing the same problem as I was, for now, I think that the combined result pleases me. |
Very interesting! Thanks! For info, I like GPEN as well. However, there is another model which works quite well, not necessarily better, nor worse, as it is hard to say (GPEN tends to add black marks on the face, but performs better for teeth in my experience). It is called |
@jlmarrugom Yes I agree something feels a bit off with the metrics in projected-fastgan. I got same low metrics which do not reflect the overall quality I've seen on other GANs with much higher metrics. Anyway my tries with pure FastGAN so far had been rather unsuccessful. So far it hasn't produced anything reasonable at all so might have to play a bit with the hyperparameters. |
thanks for the insights, very interesting! :) My thoughts on the metrics: On one hand it is hard to judge the diversity of the samples, which is where projected GANs are usually getting the gains, FID/KID is favoring diverse samples over high fidelity. On the other hand, the samples for PG show distorted faces, whereas for StyleGAN2-ADA you seem to get disembodied floating faces quite frequently. It seems that KID penalizes these artifacts differently. By the way, the disconnect between metric and sample quality on face datasets is something we addressed in our limitation section in the paper. As I mentioned earlier, it would be interesting to try a face/body-specific feature network in your case. As you have shown, a network trained specifically for face reconstruction yields unsurprisingly better results, |
Hi, I'm very impressed with the results of this paper and also the insightful approach to gain a significant boost in computational efficiency.
Right now I'm testing the model with a custom dataset of humans in various poses, families, and people in general, and I noticed that the textures, the colors, and the image overall is really good compared with other models, also, it trains in 1/10 of the time. But, the generated faces don't look as good as the other aspects of the image. Here is an example of a generated grid at kimg 200:
My question is: How can I improve the results, especially on the faces?
Currently, I'm using the FastGAN backbone because the dataset is around 2100 images of 256x256, 1 GPU, mirror=1, and the other parameters with default values.
The text was updated successfully, but these errors were encountered: