New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3419] Flesh out iterable side inputs and key enumeration for multimaps in shared libraries #10147
Conversation
d319089
to
38cc4bc
Compare
…ltimaps in shared libraries This now removed the byte[] that was used as the key and exposed the SDKs coder specifically using the structural value for comparison. Update portable Python to use the iterable state key. Note that this doesn't effect Dataflow since dataflow_runner.py converts all iterable side inputs into multimap right now and no SDK performs key enumeration yet. Update both Flink and Spark to support iterable API and also key enumeration for multimaps. To minimize the extent of this change, I did the minimal modification for Dataflow. A follow-up PR will do the same for Dataflow and then enable multimap side input key enumeration and iterable lookup within various SDKs.
Run Portable_Python PreCommit |
Run Python PreCommit |
Run Java PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the side input access patterns more accurately represented in the Proto, as well as in the shared libraries. I've made a pass. It looks good to me.
Run Flink ValidatesRunner |
I think this breaks :runners:spark:compileJava on master. @lukecwik can you please take a look? |
|
@ibzib has a fix in flight for this. |
…ltimaps in shared libraries (apache#10147) This now removed the byte[] that was used as the key and exposed the SDKs coder specifically using the structural value for comparison. Update portable Python to use the iterable state key. Note that this doesn't effect Dataflow since dataflow_runner.py converts all iterable side inputs into multimap right now and no SDK performs key enumeration yet. Update both Flink and Spark to support iterable API and also key enumeration for multimaps. To minimize the extent of this change, I did the minimal modification for Dataflow. A follow-up PR will do the same for Dataflow and then enable multimap side input key enumeration and iterable lookup within various SDKs.
This now removed the byte[] that was used as the key and exposed the SDKs coder specifically using the structural value for comparison.
Update portable Python to use the iterable state key. Note that this doesn't effect Dataflow since dataflow_runner.py converts all iterable side inputs into multimap right now and no SDK performs key enumeration yet.
Update both Flink and Spark to support iterable API and also key enumeration for multimaps. To minimize the extent of this change, I did the minimal modification for Dataflow. A follow-up PR will do the same for Dataflow and then enable multimap side input key enumeration and iterable lookup within various SDKs.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.