Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you provide codes for calculating FID, Time (s/im), and Memory (GB/im) #55

Open
Doraemonzm opened this issue Jan 23, 2024 · 3 comments

Comments

@Doraemonzm
Copy link

Thanks for your excellent work. Could you please provide codes for calculating FID, Time (s/im), and Memory (GB/im)?

@dbolya
Copy link
Owner

dbolya commented Jan 24, 2024

For FID, I used pytorch-fid (see the details in the paper for what sets I compared).

For time taken, I simply timed how long a full 2000 image generation run took (though you could probably use 100-200 images instead) with the largest batch size I could fit, then divided the total time by the total number of images generated.

Finally, for memory, the easiest way is just to keep increasing the batch size until you run out of memory, then divide the peak memory used by pytorch (queried using nvidia-smi) by the final batch size to get GB/im. Note, maximizing the batch size is important because pytorch can allocate more memory than it needs.

Was kind of a manual process, so no code. But should be simple enough to replicate.

@Doraemonzm
Copy link
Author

Thanks a lot for your reply. I am new to the field of Stable Diffusion and now I get it that FID is computed between two datasets.
However, I still don't know how these two datasets come from. Could you provide any scripts or guidance for generating these datasets?
Moreover, I recently noticed Agent Attention for Stable Diffusion, which replace the Softmax attention employed in ToMeSD model with agent attention to further enhance speed.
I would like to know if I have some other attention modules, how should I compare them in the same way as in AgentSD.

@dbolya
Copy link
Owner

dbolya commented Jan 25, 2024

As stated in the paper:
image
where [19] is just the pytorch-fid package. For the "5000 class balanced samples", I just took the ImageNet val set, and used the first 5 images in each class (for a total of 5000). The other 2000 image set is generated using those settings with the prompts "A high quality photograph of a {_cls}." where {_cls} is an ImageNet class name (2 samples per class).

As for AgentSD, I'm not sure. I haven't dabbled in this stuff too much after ToMeSD's release so you should ask them instead if you had questions about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants