-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification about constants in node_profilers #7
Comments
They are all estimated from the x86 assembly latency. It's not accurate for each platform ISA. |
Understood. In any case, because this is the best repo for MAC calculation for ONNX I really want to discuss some issues. I'll just name a few of the other repos I have encountered (not only for onnx): Most of them only calculate a very small amount of ONNX operators, this is bad, especially in transformers. This repo is much richer and works strait on ONNX, so we can (theoretically) calculate any NN from any framework. So, let's discuss the issues I see.
|
1&2. The constant values you observed from the py script are almost estimated from the x86 instruction's latency and throughput. There is no good way to estimate all the ops properly since it heavily depends on how you compute this op. But what you suggested, 1MAC should be a more proper value. Because vectorized assemblies are almost identity, like AVX's vaddps, vfmadd231ps, and vcmpps. At last, thanks for your valuable suggestions and interest. |
Hi,
I'm trying to udrestand the number of MAC's coming from the tool.
Where are all of the constants coming from?
For example -
etc.
Is there a reference somewhere in the web?
Thanks in advance.
The text was updated successfully, but these errors were encountered: