feat: use tree reduction to aggregate files in preprocessing #1079
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Preprocessing used to fail with
RecursionError
when enough files were included (#1078). Following helpful advice from @lgray, this applies the exact same method as dask-contrib/dask-awkward#479 to resolve the problem by using a tree reduction strategy.This seems to have worked as a rather clean copy/paste with no real changes apart from the paths to all the imported objects and functions. Perhaps that is not so surprising given that the approach is generic, but it did make me slightly suspicious at first. I did not find anything wrong with it myself so far. I was also testing with
sys.setrecursionlimit(200)
to more quickly hit the error. I don't think that this is an issue, but wanted to mention it to be sure.I am not very familiar with the internals here and only have a rough understanding of everything happening. In addition, my test case is running distributed on a facility (which works since the relevant code only is needed on the head node). I can confirm that these changes do resolve the issue in my test setup, but a critical look / other tests would be very welcome.
resolves #1078