Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Time trace profiler output support (-ftime-trace) #2
I have written about how existing
This change adds hierarchical "time trace" profiling blocks that can be visualized in Chrome, in a "flame chart" style. Each profiling block can have a "detail" string that for example indicates the file being processed, template name being instantiated, function being optimized etc.
Here are some examples in how the output looks like in Chrome trace viewer:
Here's one that found really bad recursive macros case in our codebase, that made Clang spend ~8s just at including a seemingly simple file. I would have never guessed it being a problem without this view; almost all the time is spent in that one header file, parsing (not instantiating!) templates that are declared from series of recursive macro expansions.
Here's one from range-v3 pythagorean triples via list comprehensions example. The bottleneck here is class instantiations that cause very deep chains of other classes to get instantiated. Parsing the headers is quite some cost too.
Here are the actual
I did not want to change existing
For less overhead, it only samples the wall time via a single
The implementation in TimeProfiler.cpp/.h should probably be updated to match Clang/LLVM coding standards better, i.e. use StringRef, SmallString, SmallVector etc.; this is my first time in this codebase and I'm not familiar with how things are done yet.
Implementation could be optimized a bit, to avoid creating string objects in each profiling sample (by having a single "arena style" allocation buffer to hold all strings etc.). However right now I haven't measured any significant overhead of the current simple implementation, on various source files I tried.
I haven't written tests or added the flag to documentation related files yet.
This is great! In the backend, there's a gap in the time trace before the first module gets processed, and this gap seems to scale with the complexity of the file. Any idea what that does? Would be great to add tracing there even if it's just one block that hints at what's happening.
The gap is "generating LLVM IR" time. I tried adding a time scope around that, but then often for some reason Chrome gets confused and displays it overlapped with the parent block, making it very confusing. Will see what I can do there.
Is that time on the x-axis? If so, it's a flame chart. Flame graphs put the alphabet on the x-axis, where the stack trace is sorted alphabetically from root to leaf, which maximizes frame merging. If you're emitting timestamped events and having the chrome dev tools display it, it's a flame chart. It's also useful, just a different name. Chrome dosen't support both yet (https://bugs.chromium.org/p/chromium/issues/detail?id=452624).