Avoid mutating collections unless absolutely necessary #14048
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Today, Prefect uses the
visit_collections
function to resolve task arguments into futures. This is because the future might be passed directly (e.g.f(x=Future())
) or in a deeply nested collection or BaseModel attribute (or both).Currently, visit collections recursively visits every member of every collection in a DFS, calling the
visit_fn
on each one, and then rebuilds a copy of the data structure. For example, if thevisit_fn
resolves futures to results and you passdict(x=[1, 2], y=[1, 2, Future(3)])
as the arg, then a new object dict(x=[1, 2], y=[1, 2, 3]) will be generated and passed to the task.The problem is that the object is copied even if it doesn't have any futures in it.
This PR modifies the behavior of visit_collections to make the smallest number of copies possible. The way it does this is by examining the return value of the
visit_fn
. ifvisit_fn(x) is x
for all values of a collection, then the collection is left unchanged. But if one of those values changes, say because x is a Future and visit_fn resolves it to a value, then the collection is copied. So in addition to passing the test above, with this strategy and a hypotheticalresolve_futures
fn:but:
Note that this only affects identity, not equality!
What brought this to my attention is I was modifying mutable objects in one task that then appeared, unmodified, in a different task, despite the fact that everything was in-memory. Since I wasn't using futures, I had no expectation that I was working with copies of data.
This implementation of
visit_collections
also fixes the following behaviors:visit_collections
will no longer iterate over a generator. Previously, it automatically converted generators to lists. Since generators can only be consumed once, and may be infinite, this is almost always going to be the wrong behavior, at best silently exhausting the generator before the user can call it, and at worst blocking forever!visit_collections
will no longer recurse endlessly on collections with mutual references, allowing us to fix two xfail tests.