Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about evaluation #38

Closed
SUDA-HLT-ywfang opened this issue Oct 20, 2021 · 11 comments
Closed

about evaluation #38

SUDA-HLT-ywfang opened this issue Oct 20, 2021 · 11 comments

Comments

@SUDA-HLT-ywfang
Copy link

Hi,
How do you get "26.0" FID on mscoco using DM-GAN? Because the official result reported in https://github.com/MinfengZhu/DM-GAN is 26.55.
I ran DM-GAN myself and managed to get a similar result(26.54), instead of "26.0".

@Sleepychord
Copy link
Contributor

We actually follow the evaluation of DALL-E. Since the 30,000 captions are sampled at random, I think the difference is normal. It is possible that DM-GAN performs better on our sampled sub-datasets. Maybe I should change the performance to the official number, thank you.

@SUDA-HLT-ywfang
Copy link
Author

SUDA-HLT-ywfang commented Oct 20, 2021

Thank you for your quick reply.
Could you share more details about the sample procedure please? for example, mscoco val has about 5 captions for each image, so you sample 30000 from (5 * 40504) captions?

@Sleepychord
Copy link
Contributor

@FrankCast1e Hi, after comparing details with previous works, we find our sampling is slightly different from the previous works, but should be equal in the view of evaluation:
we remove the duplicated captions from coco, but sampled from the merged set of train and validation(~120,000). Since CogView is never trained on COCO and the two sets are split at random, the expectation should be the same.

@SUDA-HLT-ywfang
Copy link
Author

SUDA-HLT-ywfang commented Oct 21, 2021

  1. So the sampling process you describe should be as follows:
    a. mix all the captions from mscoco train and validation set
    b. remove duplicated ones
    c. sample 30000 captions
    Am I correct?
  2. In this case, there may be multiple captions in the sampling that belong to the same image.
    Am I correct?
  3. For FID-(1,2,4,8), one set is blurred images that model generate(30000), what is the other set? (Train + val) or val?

@Sleepychord
Copy link
Contributor

@FrankCast1e

  1. yes.
  2. No. Apparently, there will not be duplicated ones after removal.
  3. I think you misunderstand the process of evaluation. the blurred images are blurred original images, and the other set are generated images. Their captions are the same.

@SUDA-HLT-ywfang
Copy link
Author

sorry, I'm confused.
2. For example, there are two images, A and B. A has 3 captions(no.1, no.2, no.3). B has 3 captions(no.4, no.5, no.6). Sample two captions from 6(3+3) captions. The sampled set may be (no.1, no.2), which belong to the same image A.
3. Following dalle, I think generated images should be applied with a gaussian filter too?

@Sleepychord
Copy link
Contributor

@FrankCast1e Hi,
2. as discussed above, we removed the duplicated images, which means there is only no.1 & no.4.
3. Yes, the generated samples are blurred too, sorry to forget to mention it in the last reply.

@SUDA-HLT-ywfang
Copy link
Author

SUDA-HLT-ywfang commented Oct 25, 2021

Hi, thanks a lot.
But I still can't reproduce the dm-gan results reported in your paper. I don't know what I get wrong. Could you please share your dm-gan test code and sampled data?

@Sleepychord
Copy link
Contributor

@FrankCast1e , you can email me according to the address in the paper

@SUDA-HLT-ywfang
Copy link
Author

ok. An email has been sent.

@SUDA-HLT-ywfang
Copy link
Author

Hi, sorry to bother you again. Have you received my mail? Looking forward to hearing from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants