#19851: Proof-of-concept to implement deferred side inputs for combiners #30743
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#19851
This is a proof-of-concept draft for fixing/implementing deferred side inputs with Combiners. I could use feedback on how to proceed and then I'll clean it up as well.
Issue:
Currently if you try to use a side input with a combine function, you end up with a traceback during the pipeline translation step:
Fix?:
The issue is because in the code where we determine whether or not to lift a combiner, we assume the combiner has a single input pcoll, which is not true when the combiner has a deferred side input.
I went through fixing this and then realized that we also don't have plumbing for deferred side inputs implemented at the
operations.py
layer either. It seemed like a lot of work to duplicate all the side input plumbing that ParDo has when the phases of the combiners can be expressed as ParDos anyways.Inside of
direct/helper_transforms.py
we actually already define a ParDo-based version of a lifted combiner. I tried adding an override to use it only if we have a combiner with side inputs but at that point we've already substituted away the side input for aArgumentPlaceholder
s. I instead updated it and returned it inCombinePerKey.__new__
. I couldn't put it inCombinePerKey.expand
sinceCombinePerKey
has the special urn that would've resulted in the subtransforms getting ignored