Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9117] Sets the correct coder when clustering is enabled for the multi-partition path #10584

Merged
merged 1 commit into from Jan 15, 2020

Conversation

chamikaramj
Copy link
Contributor


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@chamikaramj
Copy link
Contributor Author

R: @jklukas

Copy link
Contributor

@jklukas jklukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks correct in that I don't see it doing any harm, but can you provide some more context on how this was discovered?

We are running jobs in production that should be hitting both the multi-partition and the single-partition paths for loading into clustered tables, and we are not seeing any clustering-related errors, so it's not clear to me that the change here will have any effect on pipeline behavior.

My read here is that the clustering part of the TableDestination never actually gets used past this point. In particular, WriteRename issues copy jobs that don't need to explicitly specify the clustering of the destination table. Thus, using TableDestinationCoderV3 has no practical effect (but I still think this change is a good thing in case WriteRename ever evolves to reference clustering). Does that all sound correct to you, or do we have evidence of issues caused by lack of TableDestinationCoderV3 here?

I am now wondering whether BatchLoads currently supports creating a clustered table correctly when we have CreateDisposition.CREATE_IF_NEEDED. AFAICT, copy jobs do not support taking a clustering configuration; I assume that means that if the destination table needs to be created, it applies the same clustering as the source table(s), but I don't think the docs specify as much.

@chamikaramj
Copy link
Contributor Author

A Dataflow user was seeing following error when writing to a clustered BQ table with dynamic destinations.
"Failed to copy Natural partitioned table to Natural partitioned clustering meta table: not supported."

I don't have a reproduction so I'm not sure if this fixes the root cause. I just saw that we are using V2 coder here when observing the code.

@chamikaramj
Copy link
Contributor Author

Run Java PreCommit

2 similar comments
@chamikaramj
Copy link
Contributor Author

Run Java PreCommit

@chamikaramj
Copy link
Contributor Author

Run Java PreCommit

@chamikaramj chamikaramj merged commit 148cc71 into apache:master Jan 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants