[BEAM-10933] Adjust GBK type before creating the pipeline proto#12880
Conversation
Codecov Report
@@ Coverage Diff @@
## master #12880 +/- ##
==========================================
- Coverage 82.32% 82.32% -0.01%
==========================================
Files 452 453 +1
Lines 54016 54042 +26
==========================================
+ Hits 44471 44492 +21
- Misses 9545 9550 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
@CraigChambersG I was under the impression that this wasn't true and that Dataflow v2 handled this.
|
Yes, that should be the case, on v2. Dataflow v2 can handle
heterogeneously typed Flattens.
…On Mon, Sep 21, 2020 at 9:10 AM Lukasz Cwik ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
<#12880 (comment)>:
> @@ -430,6 +430,15 @@ def _check_for_unsupported_fnapi_features(self, pipeline_proto):
components.coders[windowing_strategy.window_coder_id].spec.urn,
windowing_strategy.window_fn.urn))
+ def _adjust_types_for_dataflow(self, pipeline):
+ # Dataflow runner requires a KV type for GBK inputs, hence we enforce that
+ # here.
+ pipeline.visit(self.group_by_key_input_visitor())
+
+ # Dataflow runner requires output type of the Flatten to be the same as the
@CraigChambersG <https://github.com/CraigChambersG> I was under the
impression that this wasn't true and that Dataflow v2 handled this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12880 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKXWJXCTMD5YB2IIFZ5PAHLSG53HVANCNFSM4RTFTKVA>
.
|
|
I see. I just moved Flatten update out of caution. I was not hitting this in any tests. GBK update seems to be needed though. I was hitting this in test_gbk_side_input (where we were failing while try to determine the coder for native shuffle reader). |
|
Yeah, the GBK one makes sense since it needs to say its a pair coder. |
|
Moved the Flatten change back and added a comment about Runner v2. PTAL. |
There was a problem hiding this comment.
Why do the type encodings not survive the round trip (pipeline -> proto -> pipeline)?
There was a problem hiding this comment.
Verified that types are preserved in the pipeline->proto->pipeline roundtrip and removed this.
There was a problem hiding this comment.
_adjust_pipeline_for_dataflow_v2?
There was a problem hiding this comment.
Verified that types are preserved in the pipeline->proto->pipeline roundtrip and removed this.
|
Run Python PreCommit |
|
Run Python_PVR_Flink PreCommit |
1 similar comment
|
Run Python_PVR_Flink PreCommit |
38a941d to
83a4d41
Compare
These have to be done before creating the pipeline proto for Dataflow to be able to correctly generate job requests from portable protos.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.