-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Compilation Times #1805
base: release
Are you sure you want to change the base?
Optimize Compilation Times #1805
Conversation
Filed as internal issue #USD-7270 |
Hi @christophercrouzet - just wanted to say thanks, this is really interesting work and impressive numbers! It may be awhile before we're able to act on it, but it'd be cool if others are able to take these experiments even further. |
Totally agree! Even though I don't yet understand all of it, this is totally awesome, and I am really enjoying learning from your write-up and PRs! |
@meshula , @sunyab , and I were discussing earlier this week, and wondering if some further digging could help identify where the biggest bangs for the buck are... e.g. we know some of the boost stuff is heavy, so what do we get if And then also seeing if there are some smaller set of modules that have the greatest impact on build time in switching to unity build... and/or generally other strategies to deploying improvements in smaller bites. Thanks again, @christophercrouzet - really thought-provoking work! |
Awesome! That would be interesting to see the results of these investigations, thank you for sharing and looking into it! I don't know how much work that would require but maybe another thing worth considering would be to experiment with converting Python's bindings from Boost.Python to nanobind? nanobind's author claims that it can be ~2-5x times faster to compile in comparison to Boost.Python, and is ~2x more performant at runtime. Not only that, but it also seems to come with a ~8-9x reduction in binary size in comparison to Boost.Python, which is something to take into consideration if there is an intention to push USD towards runtime environments, as @mirror2mask mentioned it during the recent panel at GTC that was titled “Exploring USD: The HTML for 3D Virtual World [S42112]”. |
Thanks for the pointer to nanobind - it looks pretty fantastic! One of our engineers pushed pretty hard a couple years back to try to get USD to use pybind11 without causing the ripple effect of needing to switch the entire rest of our million+ loc codebase built on top of USD from boost to pybind11 also. Alas, it did not seem possible. Given how many of boost::python's features we use, switching to nanobind seems like it would be even more involved than switching to pybind11, and since we can't just upgrade USD without upgrading all of Presto and our vendor DCC plugins, it's going to take some mountain moving to fund such a project. We definitely would like to, though! |
Good to know, thanks for the explanation @spiffmon! |
@christophercrouzet, this continues to be such an interesting PR :) I was wondering if you could say a little bit about how you configured your build to work with |
Hi @meshula! Aras' tool is only shipped as part of Clang 9.0+ so the main thing was to get USD to compile using Clang, which required a tiny change in the codebase as described in #1696. The rest is basically only a matter of following what is described in https://github.com/aras-p/ClangBuildAnalyzer:
If it can be of any help, I streamlined these steps in the following |
Ah, thanks for the pointers to your Makefile, that's very helpful! |
I thought that I'll just leave this experiment here in case it can be of any help.
That being said, I understand that the code diff can look a bit overwhelming and scary, so no worries if it doesn't land! 😅
What?
After seeing USD's codebase taking around 90 minutes to compile on my laptop (Intel i7-7700HQ, Ubuntu 20.04, single thread), I was curious to understand what was going on and whether the compilation times could be somewhat improved. This pull request is a first iteration of that work.
As it stands, this pull request is still lacking (see the to-do list below) but I'm happy to collaborate if there is any interest in merging it at all.
Why?
With USD being so widespread in the film and tech industries, it seemed like optimizing compilation times could help many developers around the world to speed up iterations and to save in computing resources.
How?
When compiling USD on a single thread using Clang's
-ftime-trace
compiler flag, here is how the profiling output looks like (after being slightly pruned, see the full log):We can see that the number one bottleneck clearly comes from the compiler spending around 15 minutes (900,000 ms) for each header that it has to include in hundreds (if not thousands) of other files.
One solution that is a good fit for tackling this issue is unity builds.
Why Not Using Precompiled Headers?
USD seems to already support PCH for MSVC, so it could have been natural to try extending that approach to other compilers, but then I looked at the list of conditions needed to be met for GCC and it seemed... too constraining? So I didn't really look into this.
Furthermore, I prefer a solution that is not compiler-specific so I thought that I'd try to address the issue with what seemed like a simple approach to reason about and to implement (although simple ain't always easy).
Methodology
The obvious requirement for a codebase to support unity builds is to ensure that all symbols are uniquely identified.
Additionally, it helps to have symbols being fully namespaced whenever possible (e.g.: when not relying on an ADL idiom), otherwise there might be ambiguities around symbols relying on
using
declarations, for example:I could have performed all the required refactoring manually, first by updating the project's CMake with an unity build approach, and finally fixing all the compilations errors one by one. But then it would have been:
Instead, I went (mostly) with a programmatic approach through Clang's AST API.
Caveats
“Surely it should be easy to find all the symbols in a codebase and prefix them with a namespace using the AST”, or so I thought. Alas, C++ is a complex language and it turns out that this complexity is fairly well reflected in Clang's AST API.
Because of that, the refactoring tools that I built do a good chunk of the work but it's not 100% there—after running the tools, there is a need to manually patch/fix some things here and there.
The most obvious (and unfortunate) example of limitation being: the refactoring tools work at the AST level, after C++'s pre-processor has finished evaluating. This means that code wrapped into some
#if
/#ifdef
statements might be discarded and left untouched by the tools—since I've run these tools under Ubuntu, this means for example that the code paths specific to Windows, macOS, Metal, PRMan, Python 2, and others, might require some further attention.Results
Note: the timings for
g++ 11.1.0 (unity=ON)
with 8 threads is only an estimate since my laptop runs out of memory for that one.As for the results from
-ftime-trace
, they look a bit more nuanced (full log):The Actual Refactoring Tools
There are 2 of them:
inline-namespaces
: removes theusing
declarations and fully declares the namespace for the symbols that were relying on it.disambiguate-symbols
: sets a name to all the anonymous namespaces found and declares the namespace for the symbols that were relying on it.For reference, I've made these tools available on this repository: https://github.com/christophercrouzet/pxr-usd-unity-build.
To emphasize this again: this was developed using Linux—this might run on macOS if the stars are aligned, but probably won't on Windows without a few touches.
The Refactoring Steps
There are several steps that were involved to apply the required changes and created the first commits to reflect these steps:
make usd-inline-namespaces
-> commit “Inline namespaces using Clang's AST API”.make usd-patch-inline-namespaces
-> commit “Apply manual changes to the namespaces inlining step”.make usd-disambiguate-symbols
-> commit “Disambiguate symbols using Clang's AST API”.make usd-patch-disambiguate-symbols
-> commit “Apply manual changes to the symbols disambiguation step”.make usd-patch-misc
-> commit “Apply some miscellaneous manual fixes”.Build Configuration Used
See https://github.com/christophercrouzet/pxr-usd-unity-build/blob/3da4696dc2bd3212e27c57eee65d00ec56f1e914/Makefile#L47-L110.
Additional Goals
To-Do
pxr/usd/usd/codegenTemplates
.Notes
It should be possible to build some linters on top of the Clang's AST API, that could be run as part of the CI, to enforce certain rules such as flagging free private functions not belonging to any named namespace.
Also, compiling USD with Clang requires the changes described in this other pull request: #1696.
Credits
This work was only possible thanks to @aras-p who implemented the
-ftime-trace
compiler flag for Clang (see https://aras-p.info/blog/2019/01/16/time-trace-timeline-flame-chart-profiler-for-Clang).