Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about constants in node_profilers #7

Closed
erelon opened this issue Sep 28, 2022 · 3 comments
Closed

Clarification about constants in node_profilers #7

erelon opened this issue Sep 28, 2022 · 3 comments
Labels
question Further information is requested

Comments

@erelon
Copy link

erelon commented Sep 28, 2022

Hi,

I'm trying to udrestand the number of MAC's coming from the tool.

Where are all of the constants coming from?

For example -

  1. Addition is said to cost 0.5 a Mac, but in reality the GPU or CPU will need to waste the full Mac operation for 1 calculation.
  2. Why does POW cost 32 MACS? Is this the worst case scenario of 2^32?
  3. Why does SIN/COS cost 14?
    etc.

Is there a reference somewhere in the web?

Thanks in advance.

@ThanatosShinji
Copy link
Owner

They are all estimated from the x86 assembly latency. It's not accurate for each platform ISA.
You can refer to the repo https://github.com/reyoung/avx_mathfun for the calculation assembly instructions for POW or SIN.

@ThanatosShinji ThanatosShinji added the question Further information is requested label Sep 28, 2022
@erelon
Copy link
Author

erelon commented Sep 29, 2022

Understood.

In any case, because this is the best repo for MAC calculation for ONNX I really want to discuss some issues.

I'll just name a few of the other repos I have encountered (not only for onnx):
(a) onnx-opcounter
(b) pytorch-OpCounter [for pytorch, has ONNX too]
(c) pytorch-estimate-flops [for pytorch, seems to use ONNX]
(d) fvcore [from facebook, for pytorch using tracing]
(e) tensorflow flop counter [from TF, only FLOPs]
(f) pytorch profiler [for pytorch, only mm and 2dConv)

Most of them only calculate a very small amount of ONNX operators, this is bad, especially in transformers. This repo is much richer and works strait on ONNX, so we can (theoretically) calculate any NN from any framework.

So, let's discuss the issues I see.

  1. Even if we discuss X86 architecture, ADD or any other simple operation can't be half a MAC. A MAC is a one full assembly call line, we can't cut it to two. You can see from the other repos - when they do calculate ADD they give it 1 entire MAC. Although I agree this 0.5 MAC is technically true, I believe that the operation is 1 MAC (even if it's 1 FLOP too).
  2. The same goes for the ReLU operation, even to a greater extent (0.2 MAC). You can see that in most repos, they give it a full MAC. Besides, can we say that for 1 pixel, Relu operation is 1 FLOP but 0.2 MAC?
  3. X86 is not always the architecture for NN inference. If GPU is used, will those constants change? maybe this should be a configurable parameter?
  4. onnxruntime added some operators of their own here and mainly here. Are you planning to support them or give a way to add operators without editing the repo?

@ThanatosShinji
Copy link
Owner

1&2. The constant values you observed from the py script are almost estimated from the x86 instruction's latency and throughput. There is no good way to estimate all the ops properly since it heavily depends on how you compute this op. But what you suggested, 1MAC should be a more proper value. Because vectorized assemblies are almost identity, like AVX's vaddps, vfmadd231ps, and vcmpps.
3. GPU is even more complicated and not transparent. So I think a better way is to statistic the fma instructions, the cmp instructions, and the add instructions for the low-level developers. But for most researchers, I think one simple standard should work.
4. I think this repo is driven by the AI models. If some popular models are created but not supported, I will support them, like stable diffusion. I also provided NODEPROFILER_REGISTRY, so anyone can add a new op's profiler without editing this repo. You can read the profile.md for the details.

At last, thanks for your valuable suggestions and interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants