Clarification about constants in node_profilers #7

erelon · 2022-09-28T13:15:17Z

Hi,

I'm trying to udrestand the number of MAC's coming from the tool.

Where are all of the constants coming from?

For example -

Addition is said to cost 0.5 a Mac, but in reality the GPU or CPU will need to waste the full Mac operation for 1 calculation.
Why does POW cost 32 MACS? Is this the worst case scenario of 2^32?
Why does SIN/COS cost 14?
etc.

Is there a reference somewhere in the web?

Thanks in advance.

ThanatosShinji · 2022-09-28T15:42:45Z

They are all estimated from the x86 assembly latency. It's not accurate for each platform ISA.
You can refer to the repo https://github.com/reyoung/avx_mathfun for the calculation assembly instructions for POW or SIN.

erelon · 2022-09-29T07:29:59Z

Understood.

In any case, because this is the best repo for MAC calculation for ONNX I really want to discuss some issues.

I'll just name a few of the other repos I have encountered (not only for onnx):
(a) onnx-opcounter
(b) pytorch-OpCounter [for pytorch, has ONNX too]
(c) pytorch-estimate-flops [for pytorch, seems to use ONNX]
(d) fvcore [from facebook, for pytorch using tracing]
(e) tensorflow flop counter [from TF, only FLOPs]
(f) pytorch profiler [for pytorch, only mm and 2dConv)

Most of them only calculate a very small amount of ONNX operators, this is bad, especially in transformers. This repo is much richer and works strait on ONNX, so we can (theoretically) calculate any NN from any framework.

So, let's discuss the issues I see.

Even if we discuss X86 architecture, ADD or any other simple operation can't be half a MAC. A MAC is a one full assembly call line, we can't cut it to two. You can see from the other repos - when they do calculate ADD they give it 1 entire MAC. Although I agree this 0.5 MAC is technically true, I believe that the operation is 1 MAC (even if it's 1 FLOP too).
The same goes for the ReLU operation, even to a greater extent (0.2 MAC). You can see that in most repos, they give it a full MAC. Besides, can we say that for 1 pixel, Relu operation is 1 FLOP but 0.2 MAC?
X86 is not always the architecture for NN inference. If GPU is used, will those constants change? maybe this should be a configurable parameter?
onnxruntime added some operators of their own here and mainly here. Are you planning to support them or give a way to add operators without editing the repo?

ThanatosShinji · 2022-09-29T14:00:08Z

1&2. The constant values you observed from the py script are almost estimated from the x86 instruction's latency and throughput. There is no good way to estimate all the ops properly since it heavily depends on how you compute this op. But what you suggested, 1MAC should be a more proper value. Because vectorized assemblies are almost identity, like AVX's vaddps, vfmadd231ps, and vcmpps.
3. GPU is even more complicated and not transparent. So I think a better way is to statistic the fma instructions, the cmp instructions, and the add instructions for the low-level developers. But for most researchers, I think one simple standard should work.
4. I think this repo is driven by the AI models. If some popular models are created but not supported, I will support them, like stable diffusion. I also provided NODEPROFILER_REGISTRY, so anyone can add a new op's profiler without editing this repo. You can read the profile.md for the details.

At last, thanks for your valuable suggestions and interest.

ThanatosShinji added the question Further information is requested label Sep 28, 2022

ThanatosShinji closed this as completed Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification about constants in node_profilers #7

Clarification about constants in node_profilers #7

erelon commented Sep 28, 2022

ThanatosShinji commented Sep 28, 2022

erelon commented Sep 29, 2022

ThanatosShinji commented Sep 29, 2022

Clarification about constants in node_profilers #7

Clarification about constants in node_profilers #7

Comments

erelon commented Sep 28, 2022

ThanatosShinji commented Sep 28, 2022

erelon commented Sep 29, 2022

ThanatosShinji commented Sep 29, 2022