[BEAM-1301] Support for BigQuery table description#1821
[BEAM-1301] Support for BigQuery table description#1821asfgit merged 2 commits intoapache:masterfrom
Conversation
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 6594 lines...] at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.maven.plugin.MojoFailureException: You have 1 Checkstyle violation. at org.apache.maven.plugin.checkstyle.CheckstyleViolationCheckMojo.execute(CheckstyleViolationCheckMojo.java:588) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 31 more2017-01-23T23:36:59.624 [ERROR] 2017-01-23T23:36:59.624 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-01-23T23:36:59.624 [ERROR] 2017-01-23T23:36:59.624 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-01-23T23:36:59.624 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException2017-01-23T23:36:59.624 [ERROR] 2017-01-23T23:36:59.625 [ERROR] After correcting the problems, you can resume the build with the command2017-01-23T23:36:59.625 [ERROR] mvn -rf :beam-sdks-java-io-google-cloud-platformchannel stoppedSetting status of 6950238 to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/6750/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install--none-- |
649a690 to
4dcf59e
Compare
|
Refer to this link for build results (access rights to CI server needed): |
|
Changes Unknown when pulling 4dcf59e on ravwojdyla:tbl_desc into ** on apache:master**. |
|
Refer to this link for build results (access rights to CI server needed): |
|
R: @dhalperi Sorry, did not see this. Thanks Rav! Will take a look tomorrow. |
| null /* jsonSchema */, | ||
| CreateDisposition.CREATE_IF_NEEDED, | ||
| WriteDisposition.WRITE_EMPTY, | ||
| null /*tableDescription */, |
There was a problem hiding this comment.
nit: /* table (with space)
| .withLabel("Validation Enabled"), true) | ||
| .addIfNotNull(DisplayData.item("tableDescription", tableDescription) | ||
| .withLabel("Table description")); | ||
|
|
| switch (jobStatus) { | ||
| case SUCCEEDED: | ||
| if (tableDescription != null) { | ||
| datasetService.patchTableDescription( |
There was a problem hiding this comment.
Note to self and reviewers: confirmed that you can't set the table description from the load job itself: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load
| ref.getDatasetId(), | ||
| ref.getTableId(), | ||
| tableDescription); | ||
| } |
There was a problem hiding this comment.
you probably want this to run after copy job too, because for a very large table we load to temp tables then copy to the final table.
| "Unable to patch table description: %s, aborting after %d retries.", | ||
| tableId, MAX_RPC_RETRIES), | ||
| Sleeper.DEFAULT, | ||
| backoff, |
There was a problem hiding this comment.
How do you want this feature to interact with per-window tables created in streaming mode? I think the new table description is not applied in those cases.
There was a problem hiding this comment.
Good question. Would it make sense to patch description of each table we are streaming to, precisely right after createdTables.add(tableSpec) ?
There was a problem hiding this comment.
I would probably do it here: https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2724 based on if this worker successfully created the table (return value != null).
There was a problem hiding this comment.
Nope, that's not the return type of the createTable API.
Maybe move description into the createTable API proper, where it actually can be embedded in the request we're already sending? https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#description
There was a problem hiding this comment.
Pushed commit (49f125a) with suggestion above - please let me know what do you think.
There was a problem hiding this comment.
Sorry missed the comments above. Yes the createTable sounds like best idea! Will go ahead and add that.
|
Refer to this link for build results (access rights to CI server needed): |
49f125a to
89419aa
Compare
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 5107 lines...] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOTest.java:[1605,38] constructor StreamingWriteFn in class org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.StreamingWriteFn cannot be applied to given types; required: org.apache.beam.sdk.options.ValueProvider<com.google.api.services.bigquery.model.TableSchema>,org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition,java.lang.String,org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices found: ,org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition,org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOTest.FakeBigQueryServices reason: actual and formal argument lists differ in length at org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:972) at org.apache.maven.plugin.compiler.TestCompilerMojo.execute(TestCompilerMojo.java:153) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 31 more2017-01-31T23:28:04.283 [ERROR] 2017-01-31T23:28:04.283 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-01-31T23:28:04.283 [ERROR] 2017-01-31T23:28:04.283 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-01-31T23:28:04.283 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException2017-01-31T23:28:04.283 [ERROR] 2017-01-31T23:28:04.283 [ERROR] After correcting the problems, you can resume the build with the command2017-01-31T23:28:04.283 [ERROR] mvn -rf :beam-sdks-java-io-google-cloud-platformchannel stoppedSetting status of 49f125a to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/6940/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install--none-- |
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 5108 lines...] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOTest.java:[1605,38] constructor StreamingWriteFn in class org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.StreamingWriteFn cannot be applied to given types; required: org.apache.beam.sdk.options.ValueProvider<com.google.api.services.bigquery.model.TableSchema>,org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition,java.lang.String,org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices found: ,org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition,org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOTest.FakeBigQueryServices reason: actual and formal argument lists differ in length at org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:972) at org.apache.maven.plugin.compiler.TestCompilerMojo.execute(TestCompilerMojo.java:153) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 31 more2017-01-31T23:34:42.919 [ERROR] 2017-01-31T23:34:42.919 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-01-31T23:34:42.919 [ERROR] 2017-01-31T23:34:42.919 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-01-31T23:34:42.919 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException2017-01-31T23:34:42.919 [ERROR] 2017-01-31T23:34:42.919 [ERROR] After correcting the problems, you can resume the build with the command2017-01-31T23:34:42.919 [ERROR] mvn -rf :beam-sdks-java-io-google-cloud-platformchannel stoppedSetting status of 89419aa to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/6942/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install--none-- |
|
Refer to this link for build results (access rights to CI server needed): |
|
Looks great to me! Just a few minor nits. I can also fix those in the merge so feel free to leave as-is; I will fixup and merge tomorrow if you haven't made the fixups. |
|
@dhalperi cleaned up a bit:
I am happy to fix anything that is still not right. |
|
Refer to this link for build results (access rights to CI server needed): |
dhalperi
left a comment
There was a problem hiding this comment.
Apparently I did not successfully comment the nits last night. GitHub UI issues... sorry!
| /** TableSchema in JSON. Use String to make the class Serializable. */ | ||
| @Nullable private final ValueProvider<String> jsonTableSchema; | ||
|
|
||
| private final String tableDescription; |
There was a problem hiding this comment.
nit: please annotate member variable @Nullable :)
| @Nullable ValueProvider<TableSchema> tableSchema, | ||
| Write.CreateDisposition createDisposition, | ||
| @Nullable String tableDescription, | ||
| BigQueryServices bqServices) { |
There was a problem hiding this comment.
nit: in Beam, we don't align parameters usually. but ok.
| BigQueryServices bqServices) { | ||
| Write.CreateDisposition createDisposition, | ||
| @Nullable String tableDescription, | ||
| BigQueryServices bqServices) { |
There was a problem hiding this comment.
nit: in Beam, we don't align parameters usually. but ok.
| createDisposition, | ||
| tableDescription, | ||
| bqServices | ||
| ))); |
There was a problem hiding this comment.
nit: move to previous line.
|
@dhalperi there definitely was sth going on with GH yesterday. anyway, no problem. fixed your comments. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Thanks Rav! |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.