New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problems/regression with many library dependencies #184
Comments
I believe this is the most likely explanation but it would be helpful to confirm that this is definitely not a regression. One way to do it would be to try to re-create the same setup using the Even if it turns out that this is not a regression, there are still some tricks we can try. But I would like to first determine conclusivly this is not a regression. |
|
Ok, I've reinstalled 0.14 and built the project in a new configuration with it. Regardless, it's clear that the fundamental issue relates to the number of library dependencies/boost packages, and is not a regression. I'm curious what the high-level algorithm is for |
There is some background on this in issue #114 but the gist of it is that for a heavily inter-dependent library like Boost where most of the dependencies are interface dependencies, we could end up with a lot of visits to each library (as in millions). We've done quite a bit of work on this for What I suspect is happening in your case is due to a large number of libraries involved and those libraries having Boost libraries as interface dependencies, you are in a sense "breeding" duplicate interface dependencies. To test this hypothesis, would you be able to add the |
|
So it certainly made a difference. Here is a breakdown of the various tests I've done in case the extra info is useful.
I'm not sure if there is any obvious reason why (2) would fall so far short of (3) in its effects? One other thing I noticed, somewhat unrelated but perhaps worth looking into. Adding/removing the deduplication directive resulted in recompilation of translation units. To quote you from issue #114:
I'm not sure what approach you ended up taking regarding the |
Yes, that's better but still not usable. Do you think you would be able to describe your project hierarchy (or better yet, create and make available a test one) so that I can try to reproduce this? On my machine an up-to-date check for the whole of Boost (including a test for every library) takes a bit over a second, which is still not great but probably usable. So I am wondering if there is something special about your project, your hardware/OS, or something else.
I don't think this is surprising: in (3) you've chopped off a large part of your build graph and replaced it with an assumption that
I don't think this is unexpected: deduplication is likely to change the order in which libraries are traversed which in turn may alter the order of |
|
Yep, when I get chance I'll do some investigation to try to see if I can repro this in a simpler project. I'll also look into generating a dependency graph for my project, since that may be instructive. |
|
Thanks for the graph, quite elaborate, I must say ;-). I've create a test repository that attempts to simulate this by picking (1) nodes that introduce boost dependencies as well as (2) "breeder" nodes that would cause the multiplication of such dependencies. Because I've only recreated a small fraction of the nodes, I compensated for that by duplicating the To test, I measured the time it took to do an up-to-date check (i.e., when everything is already up to date) of
Yes, we are aware of this (there is also
I think I mentioned the reason in #114, but in the nutshell, we prune the graph traversal everywhere except when collecting libraries to link. And there we cannot prune because the library should appear on the command line at the position (relative to other libraries) of the last encounter as opposed to the first (this is because the order in which libraries are linked could be important). I am going to take a look at this again (based on the above test, provided it is representative of the times for your real project) and see if there is anything more we can optimize. One promising idea is to prune recursivle-header-only libraries since they don't link anything. |
|
Added the |
To confirm, running
Ah okay. I'd assumed the cause was the same and therefore you hadn't been aware of it, but perhaps it's not doing quite the same thing then.
Yep. Have you considered the topological sort approach I suggested in that thread? It has algorithmic complexity linear in graph size: |
|
Ok, I've made a bunch of optimizations (including automatic deduplication as part of Could you give it a try and see what difference it makes for you? I am interested in both my synthetic test but more importantly in your original project (without manual deduplication) that took 4m. Also, the
I have but the difficulties of implementing it (which I mentioned in this comment) are still there. More importantly, with the above optimizations, |
|
I've optimized things some more and with the latest staged version I get 0.3s on the synthetic test on my machine with the load/match/execute splits being 0.16/0.10/0.04. Out of curiosity I also ran this test on an M1 Mac Mini (8G version) where it takes 0.51s (0.36/0.11/0.04). |
|
Thanks for the update. Definitely an improvement but still quite slow. Do you mind sharing what kind of hardware this is on (specifically, CPU, RAM, and disk)? That would help understand whether this is due to hardware or software (Windows).
Yes, this makes sense: |
Ok thanks. Just wanted to check there wasn't something unintentional going on there.
Sure. I'm using an Intel Hades Canyon NUC: Just to add, although clearly there is still room for improvement, with these optimizations it's now definitely back in the realm of being usable for my project, and anyway there are local workarounds I can do if I need to speed things up further. So completely understand if you feel you're at the point you need to shelve further optimization work for the moment. But of course let me know if you want me to test anything further. I'll make sure to keep a copy of my project in its current state for future comparisons. |
|
Ok, thanks, so it's a bit dated (2018) 4-core mobile CPU.
Yes, thanks, that will definitely be helpful. I think I've now picked most of the high-impact, low-hanging fruit. One idea I have for further optimization is to get rid of wildcards in package archives (i.e., all the Boost libraries) by rewriting them to the expanded list during distribution. I did a quick test by simply disabling the expansion and that reduced the load part for the synthetic test on my machine from 0.16s to 0.11s so we are likely looking at ~20% overall time reduction (likely more on Mac OS and especially Windows). |
|
Another optimization idea: similar to rewriting wildcards we could add real directory |


Observed behaviour
b.exeis maxing out multiple cores for long periods during ab update, even when the project is already up to date. Same behaviour is seen for--dry-runmode.Details/prior behaviour
Previously, running
b updateon an up to date configuration would take a little while to count through 'targets to update', but then when done would very quickly reach '100% of targets updated' and complete. It now will spend minutes going from 96% to 100%.Unfortunately, having not worked with my project much for a number of months, I'm not able to pinpoint exactly when or what change is responsible for this issue (though I can investigate further given some direction).
boostas abpkgdependency (my code hasn't changed, but previously was usingboostvia a manually added include path).In case it's relevant, my project is by no means especially large, but it does have a high ratio of libraries to code. It's split up into ~40 libraries (mostly shared) in ~15 packages, with ~15 further 3rd party dependency libraries, plus a few boost dependencies (which transitively ends up being a lot of boost dependencies).
Profiling
I did some very minimal profiling which showed 5 worker threads spending about 50% of their time in
dynamic_cast, with the remainder being map lookups/path_traitscomparisons. This suggests the real problem is likely just algorithmic complexity - generally execution is ~30 recursive calls deep inprocess_libraries.b-update-profiler-call-graph.txt
The text was updated successfully, but these errors were encountered: