Skip to content

[BEAM-10933] Adjust GBK type before creating the pipeline proto#12880

Merged
chamikaramj merged 1 commit intoapache:masterfrom
chamikaramj:dataflow_iterable_flatten_proto_fixes_2
Sep 22, 2020
Merged

[BEAM-10933] Adjust GBK type before creating the pipeline proto#12880
chamikaramj merged 1 commit intoapache:masterfrom
chamikaramj:dataflow_iterable_flatten_proto_fixes_2

Conversation

@chamikaramj
Copy link
Contributor

These have to be done before creating the pipeline proto for Dataflow to be able to correctly generate job requests from portable protos.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Sep 19, 2020

Codecov Report

Merging #12880 into master will decrease coverage by 0.00%.
The diff coverage is 92.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12880      +/-   ##
==========================================
- Coverage   82.32%   82.32%   -0.01%     
==========================================
  Files         452      453       +1     
  Lines       54016    54042      +26     
==========================================
+ Hits        44471    44492      +21     
- Misses       9545     9550       +5     
Impacted Files Coverage Δ
...python/apache_beam/examples/wordcount_dataframe.py 91.66% <91.66%> (ø)
...on/apache_beam/runners/dataflow/dataflow_runner.py 77.24% <100.00%> (+0.06%) ⬆️
sdks/python/apache_beam/runners/common.py 88.75% <0.00%> (-0.45%) ⬇️
sdks/python/apache_beam/transforms/combiners.py 91.97% <0.00%> (-0.19%) ⬇️
sdks/python/apache_beam/io/iobase.py 84.33% <0.00%> (+0.28%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 096f695...83a4d41. Read the comment docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CraigChambersG I was under the impression that this wasn't true and that Dataflow v2 handled this.

@CraigChambersG
Copy link
Contributor

CraigChambersG commented Sep 21, 2020 via email

@chamikaramj
Copy link
Contributor Author

I see. I just moved Flatten update out of caution. I was not hitting this in any tests.

GBK update seems to be needed though. I was hitting this in test_gbk_side_input (where we were failing while try to determine the coder for native shuffle reader).

@lukecwik
Copy link
Member

Yeah, the GBK one makes sense since it needs to say its a pair coder.

@chamikaramj
Copy link
Contributor Author

Moved the Flatten change back and added a comment about Runner v2. PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the type encodings not survive the round trip (pipeline -> proto -> pipeline)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that types are preserved in the pipeline->proto->pipeline roundtrip and removed this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_adjust_pipeline_for_dataflow_v2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that types are preserved in the pipeline->proto->pipeline roundtrip and removed this.

@chamikaramj chamikaramj changed the title [BEAM-10933] Adjust GBK and Flatten types before creating the pipeline proto [BEAM-10933] Adjust GBK type before creating the pipeline proto Sep 21, 2020
@chamikaramj
Copy link
Contributor Author

Run Python PreCommit

@chamikaramj
Copy link
Contributor Author

Run Python_PVR_Flink PreCommit

1 similar comment
@chamikaramj
Copy link
Contributor Author

Run Python_PVR_Flink PreCommit

@chamikaramj chamikaramj force-pushed the dataflow_iterable_flatten_proto_fixes_2 branch from 38a941d to 83a4d41 Compare September 22, 2020 00:56
@chamikaramj chamikaramj merged commit 0e90411 into apache:master Sep 22, 2020
@chamikaramj chamikaramj deleted the dataflow_iterable_flatten_proto_fixes_2 branch September 22, 2020 01:35
@chamikaramj chamikaramj restored the dataflow_iterable_flatten_proto_fixes_2 branch September 22, 2020 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants