Problem in reproducing fid score #22

howtowhy · 2023-04-06T15:52:32Z

Thank you for amazing work again.
I have some trouble to reproducing fid score..

I fail to download tfrecord and used 7GB public 256 dataset as a zip file.
(https://www.kaggle.com/datasets/denislukovnikov/ffhq256-images-only)
I guess the ffhq256x256.zip file format is same as preprocessed one.
I trained with this data and checked the training score.
the fid goes up when I train.. from 20 to 25
Could you give me some advice about this situation?
(I will try again with 1025 90G version with your advice..
Also,, if you give me some personal help please email me.. I will so appreciate your help. howtowhy@gmail.com
I really want to refer your research but.. dataset problem is not easy for me... )

Thank you so much.

Xiaoming-Zhao · 2023-04-07T07:37:40Z

I guess the ffhq256x256.zip file format is same as preprocessed one.

This one I am unsure as I have never tried the link you referred to before.

I trained with this data and checked the training score.

This is weird. Do you mind sharing the hardware setup? E.g., the number of GPUs, etc. For GAN's training, the batch size matters a lot. If possible, please use the same batch size as specified in the paper.

I really want to refer your research but.. dataset problem is not easy for me...

I am sorry and I definitely understand that this is a headache. BTW, I recently encountered a tool, i.e., rclone, and it can interact with GDrive well (on the remote server, etc). Please refer to the documentation for how to set it up. The only thing that needs some effort is to set up a Google Cloud API, which is not hard by following its instruction.

Hope these help.

parkjh688 · 2023-04-11T06:18:54Z

Hi, @Xiaoming-Zhao!

I'm facing a similar issue. I trained my model with the following configuration: 256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002}, using the ffhq256x256 dataset from Kaggle. However, my FID score keeps increasing after the first 1000 steps, as shown in the graph below:

In the original paper, the model was trained with a batch size of 64, so I also tried that, but the FID score still increased. Previously, when I trained with a batch size of 8, the FID score was around 20 and remained stable until the end of 5000 steps. However, this time, even though I started with a lower FID score of 18, it kept increasing after around 1000 steps.

I'm wondering if using ffhq256x256 data from Kaggle instead of real 256-sized data could be causing overfitting. Because I think the size of the Kaggle set is a little smaller. Are there any other possible reasons for this behavior?

Thanks!

Xiaoming-Zhao · 2023-04-11T07:21:04Z

Hi @parkjh688, I need several more information if possible.

In the original paper, the model was trained with a batch size of 64, so I also tried that

How many GPUs did you use to train GMPI? The reason I am asking is that the batch_size specified in the curriculums.py is for batch size per GPU. The batch size of 64 stated in the paper comes from 8 (per GPU) x 8 (#GPUs) = 64.

Previously, when I trained with a batch size of 8, the FID score was around 20 and remained stable until the end of 5000 steps.

What is the dataset you used for this training? And how many GPUs were you using?

I'm wondering if using ffhq256x256 data from Kaggle instead of real 256-sized data could be causing overfitting.

One caveat I could see is that the Kaggle dataset uses a different resizing method from the one the pre-trained StyleGAN2 uses. Specifically:
a. The Kaggle one uses bicubic as specified on the webpage.
b. The StyleGAN2 is trained on images downscaled with lanczos (see here).

So maybe you want to process the dataset following the official instruction to have a double check.

I trained my model with the following configuration: 256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002}

I noticed that you have batch_split of 16. This means that you will split the 64 images into 16 mini-batches and accumulate gradients 16 times. Theoretically, this should be fine. However, I would recommend using only one single forward pass if possible to avoid any hidden issues.

Hope these help.

parkjh688 · 2023-04-12T00:32:46Z

The reason I am asking is that the batch_size specified in the curriculums.py is for batch size per GPU. The batch size of 64 stated in the paper comes from 8 (per GPU) x 8 (#GPUs) = 64.

Oh, I didn't know that. I used 6 GPUs and my configuration in curriculums.py was 256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002} this. Then if I want to train the model with 64 batch size I have to train it with 4 GPUs and the configuration will be 256: {'batch_size': 16, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002} like this. 16 (per GPU) x 4 (#GPUs) = 64

What is the dataset you used for this training? And how many GPUs were you using?

I use this kaggle dataset..

Thanks!

Xiaoming-Zhao · 2023-04-12T21:57:50Z

Got you. So the Kaggle dataset is indeed able to reproduce FID based on your previous statement

Previously, when I trained with a batch size of 8, the FID score was around 20 and remained stable until the end of 5000 steps.

Then I would recommend reducing the batch_split to see whether this is the culprit for the weird FID curve you showed before. As I mentioned before, theoretically, having batch_split = 16 should be fine but I am not sure whether there will be some hidden issues there.

Hope this helps.

howtowhy · 2023-04-19T09:28:42Z

Hello, thank you for your detailed help.
I downloaded the 1024 image and preprocessing to 256 with script.
And I used following option with 8 GPU

"res_dict": {
256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002},
512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
},

"res_dict_learnable_param": {
    256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002},
    512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
    1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
},

But the fid goes up and the result was like this.
Could you advice for this situation?

Xiaoming-Zhao · 2023-04-19T09:55:40Z

Do you mind trying the default configuration:

ml-gmpi/gmpi/curriculums.py

Lines 90 to 94 in 672294b

    
           "res_dict": { 
        
               256: {'batch_size': 8, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002}, 
        
               512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002}, 
        
               1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002}, 
        
           },

See the discussion above in this issue. Essentially:

We use batch_size of 8 as it is batch size per GPU.
a. One caveat is that a larger batch size does not always mean better results. Though a larger batch size could contribute to the generator's learning, it could also provide the discriminator with more power to break the balance between the generator and the discriminator.
b. I have not tried with a batch size of 64 x 8 = 512 therefore I am not sure whether this could work.
Maybe reduce the batch_split from 16 to 1 if your GPU memory allows it. Or maybe 2. Theoretically, having batch_split = 16 should be fine but I am not sure whether there will be some hidden issues there.

Hope these help.

howtowhy · 2023-04-21T08:28:06Z

Hello! Thank you for your kind help.
I run the script with 8 GPU and batchsize you suggested.

"res_dict": {
        256: {'batch_size': 8, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
        512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
        1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
    },

    "res_dict_learnable_param": {
        256: {'batch_size': 8, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
        512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
        1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
    },

The fid score with 256 was 18.92-21.52. (with one peak pertubation)
But the paper suggest its fid to 11.4
Is there any problem I missed?
Thank you for your quick help so much.

Xiaoming-Zhao · 2023-04-21T09:19:06Z

This curve looks reasonable to me. I am not sure about the peak but I guess it may due to some randomness.

Regarding the FID: FID score largely depends on the number of images used to compute the score. The more images you use, the large probability you will obtain a lower score.

However, FID with plenty of images is costly to compute. Therefore, during training, we use a small number of images to get a sense of the FID trend:

#real_images = 8000:

ml-gmpi/gmpi/train.py

Line 1013 in 672294b

fid_evaluation.setup_evaluation(
#fake_images=2048:

ml-gmpi/gmpi/train.py

Line 1025 in 672294b

fid_evaluation.output_images(

During the full evaluation, we use 50k fake and real images as stated in the paper and this follows StyleGAN's papers:

ml-gmpi/gmpi/eval/eval.sh

Line 17 in 672294b

N_IMGS=50000

Hope this resolves your confusion.

Xiaoming-Zhao closed this as completed Apr 8, 2023

Xiaoming-Zhao added the help wanted Extra attention is needed label Apr 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in reproducing fid score #22

Problem in reproducing fid score #22

howtowhy commented Apr 6, 2023

Xiaoming-Zhao commented Apr 7, 2023 •

edited

Loading

parkjh688 commented Apr 11, 2023 •

edited

Loading

Xiaoming-Zhao commented Apr 11, 2023 •

edited

Loading

parkjh688 commented Apr 12, 2023

Xiaoming-Zhao commented Apr 12, 2023

howtowhy commented Apr 19, 2023 •

edited

Loading

Xiaoming-Zhao commented Apr 19, 2023 •

edited

Loading

howtowhy commented Apr 21, 2023

Xiaoming-Zhao commented Apr 21, 2023 •

edited

Loading

Problem in reproducing fid score #22

Problem in reproducing fid score #22

Comments

howtowhy commented Apr 6, 2023

Xiaoming-Zhao commented Apr 7, 2023 • edited Loading

parkjh688 commented Apr 11, 2023 • edited Loading

Xiaoming-Zhao commented Apr 11, 2023 • edited Loading

parkjh688 commented Apr 12, 2023

Xiaoming-Zhao commented Apr 12, 2023

howtowhy commented Apr 19, 2023 • edited Loading

Xiaoming-Zhao commented Apr 19, 2023 • edited Loading

howtowhy commented Apr 21, 2023

Xiaoming-Zhao commented Apr 21, 2023 • edited Loading

Xiaoming-Zhao commented Apr 7, 2023 •

edited

Loading

parkjh688 commented Apr 11, 2023 •

edited

Loading

Xiaoming-Zhao commented Apr 11, 2023 •

edited

Loading

howtowhy commented Apr 19, 2023 •

edited

Loading

Xiaoming-Zhao commented Apr 19, 2023 •

edited

Loading

Xiaoming-Zhao commented Apr 21, 2023 •

edited

Loading