Ensure process_runnables is not too eager in the presence of multiple splits #11367
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Well, this isn't fixing anything visible yet but it fixes an internal branch of the algorithm that's been acting too eagerly. I'm still working on a test.
This relates to #11363
CPath before
After
So, the initial version would start executing along the critical path until it hits the first wall. After this, it would try to process runnable tasks as good as possible. Since the root task is shared by all branches, this allows all computation branches to be considered as candidates (this is the essential dilemma of the widely shared dependencies).
Following these candidates effectively causes a depth first search. However, if we do not process them at all, this typically leaves many intermediates in memory we could otherwise release. Therefore, to allow some progress, these runnable paths are allowed to execute if they are reducing to a task, i.e. we're speculatively walking down those branches and if those speculatively executed branches reduce into a task, they are allowed to execute for real.
What happens here is that those executable branches encounter multiple splits (we're following splits as well since they often allow us to reduce stuff). After the second split we're actually finding an intermediate reducer... This is where the algorithm on main decided to go with it. However, this leaves us still with one more task in memory than if we didn't execute that branch which is a suboptimal decision. Only if all those splits reduce, we should execute the branch.
In the case of this specific example this doesn't change anything but I believe this makes the algorithm more predictable.
Simplified graph