Different op types / weighted operations count #152

tombackstrom · 2021-10-27T09:33:15Z

Hey, An exciting project you have here, looking forward to using it!

A question; I'd be interested in a more detailed estimation of complexity, in particular such that different non-linearities would have different weight. Typically, for example, a log- or exp-operation performed on a CPU is 25 times more expensive than a regular multiply-add-with-carry (MAC). So, basically, I'm thinking that there could be a configuration file for the profiler, with a table stating the proportional complexity of different non-linearities. Such weighting would make the profiler output better reflect the true cost of execution. Alternatively, the profiler could give as an optional output a subdivision of operation types which have been used.

So, what do you think?

For a short example, see Table 1 in my wiki at https://wiki.aalto.fi/display/ITSP/Other+performance+measures
The more detailed information which I use can be found on page 259 of https://www.itu.int/rec/T-REC-G.191-200911-S/en
a newer version of that is available on page 277 of file STLmanual.pdf from https://www.itu.int/rec/T-REC-G.191-201901-I/en

I'm aware that for best accuracy, we should also include for-loops and if-statements in the counting, but I would assume that they have less impact in big models. I'm primarily interested on the nonlinear operations, since in my experience they have a large contribution to the overall complexity in the models I use.

If this has wider interest, then I can contribute something to it, but I don't have time to do it myself completely.

cheers,

Tom
https://research.aalto.fi/en/persons/tom-b%C3%A4ckstr%C3%B6m

Lyken17 · 2021-10-29T08:10:50Z

Hi Tom,

Thanks for your constructive suggestions and I do agree that different operators shall have different complexity just as the log example you mentioned.

However, it is non-trival to set a proper configuration as the execution time can differ much on runtime given different hardware and implementation. For example, on X86 CPU, the log can takes around 25x than MAC, but this may not hold for GPU.

MACs do not always reflect to latency, and so is the THOP package. I am afraid that even we introduce the configuration, it still cannot precisly estimate the latency. Maybe a good way is to report detailed operators (e.g., how many adds, muls, divs, subs, log-, exp- are used in the model) and let the users to choose how to use.

tombackstrom · 2021-10-29T12:29:54Z

Yes, that was my thinking exactly. The first step would be to do a sub-division of operations and give them as a table. The weighting can then be applied as desired as a simple mapping function. The list of weights above is just something which is used by ITU-T in the standardization of protocols for mobile communication, so I think it is a fairly credible compromise in that particular area of application, but not directly generalizable to GPUs. It really depends on what is the target application - the CPUs and GPUs vary a lot depending on the area of application. That's why any mapping has to be a separate configuration file which the user can choose.
Anyway, if that sub-division of operations would be available, then I could easily contribute the mapping function to a weighted-OP count.

Lyken17 · 2021-10-30T15:32:16Z

Got your point tom. This may take some time to support all operators. I will add this to my todoist and gradually support all operators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different op types / weighted operations count #152

Different op types / weighted operations count #152

tombackstrom commented Oct 27, 2021

Lyken17 commented Oct 29, 2021

tombackstrom commented Oct 29, 2021

Lyken17 commented Oct 30, 2021

Different op types / weighted operations count #152

Different op types / weighted operations count #152

Comments

tombackstrom commented Oct 27, 2021

Lyken17 commented Oct 29, 2021

tombackstrom commented Oct 29, 2021

Lyken17 commented Oct 30, 2021