See how much bloat is generated by template class #17

stgatilov · 2018-05-11T06:03:36Z

A lot of code bloat comes from generic template classes. For instance, let it be MyVector defined in MyVector.h. If would be great if SymbolSort would allow to see how much code was generated by such class.

Right now it is possible to analyze object files (COMDAT), but there is no way to group symbols by class or by header file in such case. Also, it is possible to analyze PDB, but then duplication of symbols across object files is not taken into account (and it is important for analyzing build times).

I see two approaches to implement this feature:

Extract classes from symbol names. Ideally, they can be extracted with namespaces, e.g. std::_XTree, and then grouped like SymbolSort does for paths. This is perhaps the best approach, but given how many special types of symbols exist, it becomes very hard to do it right. In fact, it is necessary to implement full-fledged parser of symbol names (and perhaps decorated symbols are even easier to parse than undecorated ones) to do it right.
Attribute each symbol to the source file where its code is located. This information is absent in object files, but it is present in PDB files. So it is possible to read object file dumps for the main data, then read PDB files solely for setting proper code location to symbols. This approach has some disadvantages: mainly, not all symbols are present in PDB, and not all symbols have any location in source code.

stgatilov · 2018-05-11T06:04:56Z

I have implemented the second approach in my fork. You can see the full set of changes here.

Please let me know if pull request is welcome.

P.S. The approach 2 has some additional advantages. For instance, in theory it is possible to produce annotated version of source files, where count/total stats are added as comment before each function.

stgatilov · 2018-05-12T13:01:53Z

I have also implemented the first approach, i.e. extracting classpath from symbol name. It works like this:

Take raw symbol name (i.e. mangled/decorated one).
Undecorate it partially, omitting return value and function parameters (and probably smth else).
Parse undecorated name using several templates, regexes, and other dirty stuff like that.

First I tried to use UnDecorateSymbolName for point 2, but it is located in dbghelp.dll, which has not been updated for quite a long time. It cannot handle C++11 features like Rvalue references. This implementation is currently in classpath branch.

Then I switched to calling undname.exe util from MSVC distribution. It works perfectly (it is perhaps the only official way to demangle MSVC symbols today). The code is in classpath2 branch. All the differences can be see here.

adrianstone55 · 2018-05-17T21:13:41Z

Hi, sorry for the slow response, but I've been away on vacation. I think you're analysis of the problem is spot on. PDBs are interesting, but to analyze code bloat from weak instantiations you need to look at the OBJ files. I would probably lean towards the second approach, because trying to correlate input from two different sources could get messy, but there are advantages and disadvantages both ways.

If you want to put together a pull request, I'll happily consider it, but I might be a bit slow because I'm not actively maintaining the code anymore and I haven't even used it more than a couple times in the past five years.

stgatilov · 2018-05-19T17:23:23Z

Both approaches already work for me. Surely, both has pluses and minuses.

In classpath approach, analysis relies on hacky regexes for parsing symbol names. Despite that, almost all symbols are taken into account.
In the pdb filepath approach, not all symbols actually have location in PDB. About 20-30% of symbols are usually implicitly generated stuff or some data. On the bonus side, it gives per-directory stats, so it is very simple to see code bloat from whole STL.

My plan is to write a blog article about these two options. Then it will be easier to make decision.
P.S. As for now, continuing to post small pull requests...

stgatilov · 2018-05-26T17:26:45Z

Ok, finished with article.

Here is the full article.
To not waste time, I suggest you to start reading from Improvements section.

Now I'll prepare pull requests for both features.

stgatilov changed the title ~~See how much bloat is generated by template~~ See how much bloat is generated by template class May 11, 2018

This was referenced May 27, 2018

Analyze .obj files with source file info taken from .pdb #25

Open

Extract classpath from symbol names for per-class/per-namespace statistics #26

Open

parbo mentioned this issue Sep 14, 2018

Running SymbolSort with VS2017? #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

See how much bloat is generated by template class #17

See how much bloat is generated by template class #17

stgatilov commented May 11, 2018

stgatilov commented May 11, 2018 •

edited

stgatilov commented May 12, 2018

adrianstone55 commented May 17, 2018

stgatilov commented May 19, 2018 •

edited

stgatilov commented May 26, 2018

See how much bloat is generated by template class #17

See how much bloat is generated by template class #17

Comments

stgatilov commented May 11, 2018

stgatilov commented May 11, 2018 • edited

stgatilov commented May 12, 2018

adrianstone55 commented May 17, 2018

stgatilov commented May 19, 2018 • edited

stgatilov commented May 26, 2018

stgatilov commented May 11, 2018 •

edited

stgatilov commented May 19, 2018 •

edited