Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkle tree cache becomes pathologically slow for deep depsets #20862

Closed
DavidANeil opened this issue Jan 12, 2024 · 19 comments
Closed

Merkle tree cache becomes pathologically slow for deep depsets #20862

DavidANeil opened this issue Jan 12, 2024 · 19 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: bug

Comments

@DavidANeil
Copy link
Contributor

DavidANeil commented Jan 12, 2024

Description of the bug:

67021ff (#20358) causes spawning certain actions with large numbers of inputs to increase from less than 1s to more than 20s.
They display as [Prepa] in the UI during this time.

I can work on creating a repro if necessary.

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

67021ff

@DavidANeil
Copy link
Contributor Author

CC: @tjgq (author), @meteorcloudy (reviewer), and @lberki (merged conflicting code since)

@lberki lberki added P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams and removed untriaged labels Jan 12, 2024
@lberki
Copy link
Contributor

lberki commented Jan 12, 2024

I'll assign this to @tjgq because he was the original author of this change, but I'm deeply interested in how chis could possibly be.

I'd like to take up your offer on a reproduction; if we have one, it's much faster to figure out if whatever fix we come up with actually fixes the issue and to gather profiles ourselves instead of asking you to (@tjgq might know off the bat what went wrong, but I sure don't)

@fmeum
Copy link
Collaborator

fmeum commented Jan 12, 2024

@bazel-io fork 7.0.1

@tjgq
Copy link
Contributor

tjgq commented Jan 12, 2024

The only theory I've got is that delaying the flattening somehow moves it into the critical path, whereas before it had a chance to run in parallel with something else. Once a repro is available I can do some profiling and find out.

@tjgq tjgq added more data needed awaiting-user-response Awaiting a response from the author labels Jan 15, 2024
@meteorcloudy
Copy link
Member

I'm making this a soft release blocker for 7.0.1, which means it won't block 7.0.1 if other things clear out.

@DavidANeil
Copy link
Contributor Author

My first attempts at a repro were unsuccessful. Does anyone know of an "easy" way to get Bazel to output the shape of a NestedSet so that I can just reproduce that shape from our main repo for the repro case?

@lberki
Copy link
Contributor

lberki commented Jan 16, 2024

If you can plumb it to an output group, the structure of NamedSetOfFiles in the build event protocol in OutputGroup in the TargetComplete event corresponds exactly to the structure of the nested set. Dunno if it counts as "easy".

@lberki
Copy link
Contributor

lberki commented Jan 16, 2024

If you can't reproduce it, you can try uploading a profile taken by --experimental_command_profile (it'll dump a file called $(bazel info output_base)/profile.jfr). That's better than nothing, but considerably worse for us than a clean repro.

DavidANeil added a commit to DavidANeil/repro-bazel-nested-set that referenced this issue Jan 16, 2024
@DavidANeil
Copy link
Contributor Author

Thanks.
I have published a reproduction repo here: https://github.com/DavidANeil/repro-bazel-nested-set

It seems that this repro never finishes without experimental_remote_merkle_tree_cache_size=0. With that flag it takes about 80s on my machine, so it is possible that this is related to #18686.

@lberki
Copy link
Contributor

lberki commented Jan 17, 2024

Thanks! From cursory glance, it looks like you indeed replicated the problematic nested set structure verbatim :)

@tjgq @meteorcloudy do you think it's worth waiting with Bazel 7.0.1 until we know the root cause?

@meteorcloudy
Copy link
Member

7.0.1 has been lasting for too long. I would prefer to not block 7.0.1 anytime longer.

@lberki
Copy link
Contributor

lberki commented Jan 17, 2024

I agree. It's been too long indeed.

@tjgq
Copy link
Contributor

tjgq commented Jan 17, 2024

@DavidANeil Thanks for the very nice repro! I want to highlight the fact that --experimental_remote_merkle_tree_cache is required to reproduce the pathological build time, so it can (hopefully) be worked around by disabling it. (Please let me know if this is incorrect.)

One thing I'd like to know is: before Bazel 7, by how much did --experimental_remote_merkle_tree_cache speed up the end-to-end build time (I mean a real full build, not this repro nor the single isolated action it was distilled from)? I ask because, in my own experience, the Merkle tree cache is often not an optimization at all; most depset topologies aren't able to leverage it effectively.

As to why this issue was introduced by 67021ff:

Before that change, the eager depset flattening meant that, for most actions (roughly everything except C++ and Java), the depset of inputs as observed by the Merkle tree cache had a single node (!) so the only opportunity for reusing cached Merkle trees was in the presence of tree artifacts, which are specially handled by the cache; the cache was otherwise pure overhead (since approximately no two actions have the exact same set of inputs) but the single node could be computed very quickly.

After that change, the Merkle tree cache is able to traverse the original depset structure, compute subtrees from individual nodes, and merge them. Unfortunately, what that reveals is that a lot of time is spent merging a large number of very small nodes. Merging is not an efficient operation because the Merkle tree structure is at odds with the depset structure: Merkle tree nodes must correspond to points in the directory hierarchy, whereas depset nodes don't correspond to anything; they're just arbitrary sets of files. I believe the example you provided is pathological because of the large depset depth, but I can definitely see the overhead in other situations, though not by nearly as much. (In non-pathological multi-action builds the overhead might, of course, be paid off by the ability to reuse previously computed nodes; in a single-action build nothing can be reused, so the cache is always pure overhead.)

Given the above, and because deferring depset flattening is worth it for other reasons (reduced memory pressure, ability for other downstream logic to exploit the depset structure), I’m not convinced that reverting 67021ff is the right path forward.

While there might be some gains to be had in optimizing the Merkle tree merge operation, it might be unsalvageable for very deep depsets. I think we need to rethink/redesign the entire way the cache works so that it actually provides a benefit in most cases without failing catastrophically for the rest (for example, by selectively applying it only to subtrees that can be merged cheaply, such as tree artifacts and runfiles).

Another angle we can explore is optimizing the (uncached) computation of the Merkle tree so that the cache becomes unnecessary. (This is something I've suspected we can optimize for a while; I just haven't had the time to look into it properly.)

One thing I still don’t understand is why the run time explodes when the cache size is reduced; I agree that this is likely to be the same issue as observed in #18686. I didn’t have access to a good repro at that time, but now that I do, I intend to look into it further.

@tjgq tjgq removed more data needed awaiting-user-response Awaiting a response from the author labels Jan 17, 2024
@tjgq tjgq changed the title Performance regression caused by not flattening NestedSets Merkle tree cache becomes pathologically slow for deep depsets Jan 17, 2024
@DavidANeil
Copy link
Contributor Author

--experimental_remote_merkle_tree_cache was created, in part, because of/for us (see #10875, filed by my coworker). It was enabled too long ago for us to still have the build time metrics from when we enabled it, but I remember it was noticeable.
Though I have started an internal discussion around disabling it now, since this change has probably made it an overall regression.

@tjgq
Copy link
Contributor

tjgq commented Jan 17, 2024

In that case, I feel like I should ask for one further clarification: is the pathological action (that the repro was distilled from) a Starlark action, or a native (C++/Java) one?

My analysis above assumed a Starlark action because that's what I see in the repro, but judging from the discussion in #10875, it seems that the cache was originally introduced with the objective of speeding up C++ actions (which would also explain why its ineffectiveness for Starlark actions wasn't noticed at the time).

@DavidANeil
Copy link
Contributor Author

It is a Starlark action using a nodejs_binary as the tool.

@tjgq
Copy link
Contributor

tjgq commented Jan 18, 2024

Thanks for confirming.

Regarding the effect of the cache size: the depset in this repro has 7383 nodes and 953936 edges (it's a dag, not a tree). The graph search relies on the cache to avoid visiting the same node multiple times. If the cache size is smaller than 7383, some of the nodes can be visited multiple times because they fall out of the cache before a subsequent visit. As the cache size gets smaller and smaller, the number of steps approaches the number of edges, leading to a catastrophic blowup in run time.

I think the fix is to either (1) ensure the cache size is large enough to fit the entire depset during a search (trivial for a single action, less so when multiple arbitrary actions can run in parallel), or (2) keep a separate visited set for each individual search. I'd lean towards (2), but before we invest further into the cache implementation we really need some hard evidence that it benefits anyone in its current form. (I'll also comment on #18686 to that effect.)

@tjgq
Copy link
Contributor

tjgq commented Feb 16, 2024

I'll consider this to be a duplicate of #18686, since the issue is not the depset size per se, but the fact that the cache is too small for it to fit.

@tjgq tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Feb 16, 2024
@DavidANeil
Copy link
Contributor Author

This one reproduces with the cache enabled with unlimited size, which is a bit different than #18686 (which was opened before the commit that caused this regression), but I agree that we can consolidate the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants