Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could the author share the code for calculating the model parameters(Param.) and the model computational complexity(MACs) of the pipeline. #53

Closed
StormArcher opened this issue Feb 29, 2024 · 7 comments · Fixed by #54

Comments

@StormArcher
Copy link

Could the author share the code for calculating the model parameters(Param.) and the model computational complexity(MACs) of the pipeline. very thank you!

@bokyeong1015
Copy link
Member

Hi, we've added the code and please run:

pip install thop==0.1.1.post2209072238
python src/count_macs_params.py
Click for the results:

== CompVis/stable-diffusion-v1-4 | 512x512 img generation ==
[Text Enc] MACs: 6.5G = 6545882112
[Text Enc] Params: 123.1M = 123060480
[U-Net] MACs: 338.7G = 338749194240
[U-Net] Params: 859.5M = 859520964
[Img Dec] MACs: 1240.1G = 1240079532032
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 1585.4G = 1585374608384
[Total] Params: 1032.1M = 1032071623

== nota-ai/bk-sdm-base | 512x512 img generation ==
[Text Enc] MACs: 6.5G = 6545882112
[Text Enc] Params: 123.1M = 123060480
[U-Net] MACs: 223.8G = 223755632640
[U-Net] Params: 579.4M = 579384964
[Img Dec] MACs: 1240.1G = 1240079532032
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 1470.4G = 1470381046784
[Total] Params: 751.9M = 751935623

== nota-ai/bk-sdm-small | 512x512 img generation ==
[Text Enc] MACs: 6.5G = 6545882112
[Text Enc] Params: 123.1M = 123060480
[U-Net] MACs: 217.7G = 217727959040
[U-Net] Params: 482.3M = 482346884
[Img Dec] MACs: 1240.1G = 1240079532032
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 1464.4G = 1464353373184
[Total] Params: 654.9M = 654897543

== nota-ai/bk-sdm-tiny | 512x512 img generation ==
[Text Enc] MACs: 6.5G = 6545882112
[Text Enc] Params: 123.1M = 123060480
[U-Net] MACs: 205.0G = 205035274240
[U-Net] Params: 323.4M = 323384964
[Img Dec] MACs: 1240.1G = 1240079532032
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 1451.7G = 1451660688384
[Total] Params: 495.9M = 495935623

== runwayml/stable-diffusion-v1-5 | 512x512 img generation ==
[Text Enc] MACs: 6.5G = 6545882112
[Text Enc] Params: 123.1M = 123060480
[U-Net] MACs: 338.7G = 338749194240
[U-Net] Params: 859.5M = 859520964
[Img Dec] MACs: 1240.1G = 1240079532032
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 1585.4G = 1585374608384
[Total] Params: 1032.1M = 1032071623

== stabilityai/stable-diffusion-2-1-base | 512x512 img generation ==
[Text Enc] MACs: 22.3G = 22299160576
[Text Enc] Params: 340.4M = 340387840
[U-Net] MACs: 339.2G = 339241205760
[U-Net] Params: 865.9M = 865910724
[Img Dec] MACs: 1240.1G = 1240079532032
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 1601.6G = 1601619898368
[Total] Params: 1255.8M = 1255788743

== stabilityai/stable-diffusion-2-1 | 768x768 img generation ==
[Text Enc] MACs: 22.3G = 22299160576
[Text Enc] Params: 340.4M = 340387840
[U-Net] MACs: 760.8G = 760797839360
[U-Net] Params: 865.9M = 865910724
[Img Dec] MACs: 1240.1G = 1240079532032
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 2023.2G = 2023176531968
[Total] Params: 1255.8M = 1255788743

@StormArcher
Copy link
Author

We followed the author code for testing, and the experimental results show that the results of MACs differ too much from the paper.
Is there a problem writing code in this?

== CompVis/stable-diffusion-v1-4 | 512x512 img generation ==
[Text Enc] MACs: 6.5G = 6545882112
[Text Enc] Params: 123.1M = 123060480
[U-Net] MACs: 0.2G = 232980480
[U-Net] Params: 859.5M = 859520964
[Img Dec] MACs: 1.0G = 981467136
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 7.8G = 7760329728
[Total] Params: 1032.1M = 1032071623

@bokyeong1015 bokyeong1015 reopened this Mar 1, 2024
@bokyeong1015
Copy link
Member

We obtained the results below with the refactored and uploaded code, which are identical to those presented in our paper.

== CompVis/stable-diffusion-v1-4 | 512x512 img generation ==
[Text Enc] MACs: 6.5G = 6545882112
[Text Enc] Params: 123.1M = 123060480
[U-Net] MACs: 338.7G = 338749194240
[U-Net] Params: 859.5M = 859520964
[Img Dec] MACs: 1240.1G = 1240079532032
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 1585.4G = 1585374608384
[Total] Params: 1032.1M = 1032071623

Could you share the output of pip show thop to check the version and ensure that pip install thop==0.1.1.post2209072238? If you could share the exact procedure or code that you've run, it would help us in reproducing your issue.

@StormArcher
Copy link
Author

StormArcher commented Mar 1, 2024

  1. I ran your code directly, just for "CompVis/stable-diffusion-v1-4",as follows:
    get_macs_params(model_id="CompVis/stable-diffusion-v1-4", img_size=512, txt_emb_size=768, device=device)

2 . the version is same as you.
import thop
print(thop.version)
0.1.1

  1. Why is running step 1?
    batch_size=1
    dummy_timesteps = torch.zeros(batch_size).to(device)

How to control the sample step?

The code as follows:

@@ -0,0 +1,62 @@

------------------------------------------------------------------------------------

Copyright 2024. Nota Inc. All Rights Reserved.

------------------------------------------------------------------------------------

import torch
from diffusers import StableDiffusionPipeline
from thop import profile

def count_params(model):
return sum(p.numel() for p in model.parameters())

def get_macs_params(model_id, img_size=512, txt_emb_size=768, device="cuda", batch_size=1):
pipeline = StableDiffusionPipeline.from_pretrained(model_id).to(device)
text_encoder = pipeline.text_encoder
unet = pipeline.unet
vae_decoder = pipeline.vae.decoder

# text encoder    
dummy_input_ids = torch.zeros(batch_size, 77).long().to(device)  # (1,77)
macs_txt_enc, _ = profile(text_encoder, inputs=(dummy_input_ids,))  # (1,77)
macs_txt_enc = macs_txt_enc/batch_size
params_txt_enc = count_params(text_encoder)

# unet
dummy_noisy_latents = torch.zeros(batch_size, 4, int(img_size/8), int(img_size/8)).to(device)  # (1, 4, 512/8, 512/8) = (1, 4, 64, 64)
dummy_timesteps = torch.zeros(batch_size).to(device)  # 1
dummy_text_emb = torch.zeros(batch_size, 77, txt_emb_size).to(device)  # (1, 77, 768)
# key (1, 4, 64, 64)(1)(1, 77, 768)
macs_unet, _ = profile(unet, inputs= (dummy_noisy_latents, dummy_timesteps, dummy_text_emb))  # (1, 4, 64, 64) (1) (1, 77, 768)
macs_unet = macs_unet/batch_size
params_unet = count_params(unet)

# image decoder
dummy_latents = torch.zeros(batch_size, 4, 64, 64).to(device)  # (1, 4, 64, 64)
macs_img_dec, _ = profile(vae_decoder, inputs= (dummy_latents,))
macs_img_dec = macs_img_dec/batch_size
params_img_dec = count_params(vae_decoder)

# total
macs_total = macs_txt_enc+macs_unet+macs_img_dec
params_total = params_txt_enc+params_unet+params_img_dec

# print
print(f"== {model_id} | {img_size}x{img_size} img generation ==")
print(f"  [Text Enc] MACs: {(macs_txt_enc/1e9):.1f}G = {int(macs_txt_enc)}")
print(f"  [Text Enc] Params: {(params_txt_enc/1e6):.1f}M = {int(params_txt_enc)}")
print(f"  [U-Net] MACs: {(macs_unet/1e9):.1f}G = {int(macs_unet)}")
print(f"  [U-Net] Params: {(params_unet/1e6):.1f}M = {int(params_unet)}")
print(f"  [Img Dec] MACs: {(macs_img_dec/1e9):.1f}G = {int(macs_img_dec)}")
print(f"  [Img Dec] Params: {(params_img_dec/1e6):.1f}M = {int(params_img_dec)}")    
print(f"  [Total] MACs: {(macs_total/1e9):.1f}G = {int(macs_total)}")
print(f"  [Total] Params: {(params_total/1e6):.1f}M = {int(params_total)}")

if name == "main":
device="cuda:0"
get_macs_params(model_id="CompVis/stable-diffusion-v1-4", img_size=512, txt_emb_size=768, device=device)
# get_macs_params(model_id="nota-ai/bk-sdm-base", img_size=512, txt_emb_size=768, device=device)
# get_macs_params(model_id="nota-ai/bk-sdm-small", img_size=512, txt_emb_size=768, device=device)
# get_macs_params(model_id="nota-ai/bk-sdm-tiny", img_size=512, txt_emb_size=768, device=device)

# get_macs_params(model_id="runwayml/stable-diffusion-v1-5", img_size=512, txt_emb_size=768, device=device)
# get_macs_params(model_id="stabilityai/stable-diffusion-2-1-base", img_size=512, txt_emb_size=1024, device=device)
# get_macs_params(model_id="stabilityai/stable-diffusion-2-1", img_size=768, txt_emb_size=1024, device=device)

@bokyeong1015
Copy link
Member

bokyeong1015 commented Mar 1, 2024

1 - Thanks for checking.
3 - Do you mean how to control the number of denoising steps? We calculated MACs for a single denoising step and then multiplied it with the total number of steps, which is 25. dummy_timesteps is a single scalar value for the timestep index.


2 - Umm, when we did pip install thop==0.1.1.post2209072238, we obtained the below log with your code. Please share:

  • the output of pip show thop (0.1.1 seems to have multiple tags)
  • the output of pip show diffusers
  • the full log you've obtained

the log we obtained

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register count_normalization() for <class 'torch.nn.modules.normalization.LayerNorm'>.
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register count_normalization() for <class 'torch.nn.modules.normalization.LayerNorm'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
== CompVis/stable-diffusion-v1-4 | 512x512 img generation ==
  [Text Enc] MACs: 6.5G = 6545882112
  [Text Enc] Params: 123.1M = 123060480
  [U-Net] MACs: 338.7G = 338749194240
  [U-Net] Params: 859.5M = 859520964
  [Img Dec] MACs: 1240.1G = 1240079532032
  [Img Dec] Params: 49.5M = 49490179
  [Total] MACs: 1585.4G = 1585374608384
  [Total] Params: 1032.1M = 1032071623

pip show thop 0.1.1.post2209072238

Name: thop
Version: 0.1.1.post2209072238
Summary: A tool to count the FLOPs of PyTorch model.
Home-page: https://github.com/Lyken17/pytorch-OpCounter/

pip show diffusers 0.15.0

Name: diffusers
Version: 0.15.0
Summary: Diffusers
Home-page: https://github.com/huggingface/diffusers

@StormArcher
Copy link
Author

StormArcher commented Mar 1, 2024

  1. I think the author may have forgotten to add “/8” for img_size of input of UNet, so the computational complexity of unet is (339G), if "img_size/8" should be the same as mine (0.2G)

if img_size/8, is 0.2G
if img_size/4, is 0.9G
if img_size/2, is 3.7G

I think the version of thop and diffuers is same with you, but the MACs og my UNet for "runwayml/stable-diffusion-v1-5" is

-> the output of pip show thop
Name: thop
Version: 0.1.1.post2209072238
Summary: A tool to count the FLOPs of PyTorch model.
Home-page: https://github.com/Lyken17/pytorch-OpCounter/
Author: Ligeng Zhu
Author-email: ligeng.zhu+github@gmail.com
License: MIT
Location: /opt/conda/lib/python3.8/site-packages
Requires: torch
Required-by:

-> the output of pip show diffusers
Name: diffusers
Version: 0.15.0.dev0
Summary: Diffusers
Home-page: https://github.com/huggingface/diffusers
Author: The HuggingFace team
Author-email: patrick@huggingface.co
License: Apache
Location: /opt/conda/lib/python3.8/site-packages
Editable project location: /home/pansiyuan/.jupyter/diffusers
Requires: filelock, huggingface-hub, importlib-metadata, numpy, Pillow, regex, requests
Required-by:

== CompVis/stable-diffusion-v1-4 | 512x512 img generation ==
[Text Enc] MACs: 6.5G = 6545882112
[Text Enc] Params: 123.1M = 123060480
[U-Net] MACs: 0.2G = 232980480
[U-Net] Params: 859.5M = 859520964
[Img Dec] MACs: 1.0G = 981467136
[Img Dec] Params: 49.5M = 49490179
[Total] MACs: 7.8G = 7760329728
[Total] Params: 1032.1M = 1032071623

@bokyeong1015
Copy link
Member

I think the author may have forgotten to add “/8” for img_size of input of UNet, so the computational complexity of unet is (339G), if "img_size/8" should be the same as mine (0.2G)

We didn't get this point. We think that the division by 8 ("/8") is correctly considered in our code.

  • If we put img_size=512, the latent input size for the U-Net becomes 1x4x64x64, which is correct.
dummy_noisy_latents = torch.zeros(batch_size, 4, int(img_size/8), int(img_size/8)).to(device)

Futhermore, if we changed img_size as you mentioned, the following results were obtained, and we were not able to reproduce if img_size/8, is 0.2G:

# get_macs_params(model_id="CompVis/stable-diffusion-v1-4", img_size=512, txt_emb_size=768, device=device)

== CompVis/stable-diffusion-v1-4 | 512x512 img generation ==
  [U-Net] MACs: 338.7G = 338749194240
  [U-Net] Params: 859.5M = 859520964
# get_macs_params(model_id="CompVis/stable-diffusion-v1-4", img_size=64, txt_emb_size=768, device=device)

== CompVis/stable-diffusion-v1-4 | 64x64 img generation ==
  [U-Net] MACs: 6.8G = 6773345280
  [U-Net] Params: 859.5M = 859520964

Thanks for your check. Unfortunately, we are unsure if we can provide further assistance at this moment, as we were not able to reproduce the issue you described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants