New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time trace profiler output support (-ftime-trace) #2

Open
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@aras-p
Copy link
Owner

aras-p commented Jan 11, 2019

Adds -ftime-trace option to clang that produces Chrome chrome://tracing compatible JSON profiling output dumps.

I have written about how existing -ftime-report is not very useful, when one is not a compiler developer (see blog post). As a user, when I'm investigating compile times, I'm most interested in "What and where in my code things are slow to compile?" The existing -ftime-report only partially answers the "what" part, and does not answer the "where" part at all.

This change adds hierarchical "time trace" profiling blocks that can be visualized in Chrome, in a "flame chart" style. Each profiling block can have a "detail" string that for example indicates the file being processed, template name being instantiated, function being optimized etc.

Here are some examples in how the output looks like in Chrome trace viewer:

Catch2 test examples, all put into one file, compiled in without optimizations. This shows that curiously enough, the LLVM backend part still takes quite a large amount of time:
catch2

Here's one that found really bad recursive macros case in our codebase, that made Clang spend ~8s just at including a seemingly simple file. I would have never guessed it being a problem without this view; almost all the time is spent in that one header file, parsing (not instantiating!) templates that are declared from series of recursive macro expansions.
clang-trace-format-slow

Here's one from range-v3 pythagorean triples via list comprehensions example. The bottleneck here is class instantiations that cause very deep chains of other classes to get instantiated. Parsing the headers is quite some cost too.
clang-trace-rangesv3

Here are the actual .json trace profiler dumps that are produced; in Chrome go to chrome://tracing URL, and drag the file into the page content area.

TimeTraceProfileExamples.zip

Implementation details

I did not want to change existing -ftime-report functionality (sounds useful for compiler developers), so instead this one is an unrelated implementation in TimeProfiler.cpp/.h.

For less overhead, it only samples the wall time via a single std::chrono call; from compiler user's POV I don't think it's useful or necessary to sample memory usage, user/system times etc.

The implementation in TimeProfiler.cpp/.h should probably be updated to match Clang/LLVM coding standards better, i.e. use StringRef, SmallString, SmallVector etc.; this is my first time in this codebase and I'm not familiar with how things are done yet.

Implementation could be optimized a bit, to avoid creating string objects in each profiling sample (by having a single "arena style" allocation buffer to hold all strings etc.). However right now I haven't measured any significant overhead of the current simple implementation, on various source files I tried.

I haven't written tests or added the flag to documentation related files yet.

aras-p added some commits Jan 11, 2019

[Profiling] Add -ftime-trace option to Clang.
When passed, it enables the "time trace" profiler, and writes result JSON file next to the output file, with JSON extension
[Profiling] Add TIME_TRACE_OR_NULL macro, and tweak profiler scopes.
Removed namespace scope, since seemingly it switches to another preprocessor file
from within namespace parsing block somehow.
@zeux

This comment has been minimized.

Copy link

zeux commented Jan 14, 2019

This is great! In the backend, there's a gap in the time trace before the first module gets processed, and this gap seems to scale with the complexity of the file. Any idea what that does? Would be great to add tracing there even if it's just one block that hints at what's happening.

@aras-p

This comment has been minimized.

Copy link
Owner Author

aras-p commented Jan 14, 2019

This is great! In the backend, there's a gap in the time trace before the first module gets processed, and this gap seems to scale with the complexity of the file. Any idea what that does?

The gap is "generating LLVM IR" time. I tried adding a time scope around that, but then often for some reason Chrome gets confused and displays it overlapped with the parent block, making it very confusing. Will see what I can do there.

@brendangregg

This comment has been minimized.

Copy link

brendangregg commented Jan 14, 2019

Nice!

Is that time on the x-axis? If so, it's a flame chart. Flame graphs put the alphabet on the x-axis, where the stack trace is sorted alphabetically from root to leaf, which maximizes frame merging. If you're emitting timestamped events and having the chrome dev tools display it, it's a flame chart. It's also useful, just a different name. Chrome dosen't support both yet (https://bugs.chromium.org/p/chromium/issues/detail?id=452624).

@aras-p

This comment has been minimized.

Copy link
Owner Author

aras-p commented Jan 15, 2019

Is that time on the x-axis? If so, it's a flame chart.

Indeed it is, I got my terminology mixed up.

@aganea

This comment has been minimized.

Copy link

aganea commented Jan 17, 2019

@aras-p Aras, are you planning on submiting a review on Phabricator? Or some else is?

@aras-p

This comment has been minimized.

Copy link
Owner Author

aras-p commented Jan 17, 2019

Aras, are you planning on submiting a review on Phabricator?

Yes, once I take into account the current comments about this from clang mailing list.


// If any timers were active but haven't been destroyed yet, print their
// results now. This happens in -disable-free mode.
llvm::TimerGroup::printAll(llvm::errs());

if (llvm::TimeTraceProfilerEnabled()) {
SmallString<128> Path(Clang->getFrontendOpts().OutputFile);

This comment has been minimized.

@mrexodia

mrexodia Jan 21, 2019

When using clang++ -ftime-trace main.cpp -o main on Windows this will generate a file in %TEMP%\main-xxx.json is that the intended behavior?

This comment has been minimized.

@aras-p

aras-p Jan 21, 2019

Author Owner

I think so -- basically this produces a .json file for each object file.

So where that is placed is fairly natural if you're just invoking clang to compile one file (with -c argument). However if you're invoking it to both compile and link, then it's a good question at where should the json files be placed (since there can be more than one!), or whether they should be merged somehow, or should just be an error. Today that's not very intuitive as you noticed.

This comment has been minimized.

@mrexodia

mrexodia Jan 21, 2019

My use case is that I'd like to profile compile times of individual files in my Visual Studio solution with the help Clang Power Tools. Since it uses the -fsyntax-only flag the output path will be an empty string, so it outputs a .json file. For my use case I can work around it, but perhaps it's good to consider these questions.

Thanks a lot for your work, I needed this for a long time!

This comment has been minimized.

@aras-p

aras-p Jan 21, 2019

Author Owner

Ah. My use case was "I know which files are slow, I want to look what's inside of each file". The "which files are slow" in my case is already answered by the build system that we use, since it just invokes clang for each.

But yeah "some sort of output" when one clang invocation processes multiple input files would be useful. Will have to think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment