-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
Our pipeline gets this error from BigQuery when using BigQueryIO.Write.Method.FILE_LOADS, BigQueryIO.Write.CreateDisposition.CREATE_NEVER, and field-based time partitioning (full exception at the bottom of this note):
Table with field based partitioning must have a schema.
We do supply a schema when we create the pipeline by calling BigQuery.Write.withSchema, but this schema is ignored because the processElement method here:
always provides a null schema when using CREATE_NEVER.
I would expect Beam to use the provided schema no matter what setting we are using for the CreateDisposition.
Full exception:
java.io.IOException: Unable to insert job: 078646f70a664daaa1ed96832b233036_19e873cd24cf1968559515e49b3d868d_00001_00000-0, aborting after 9 . org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:236) org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:204) org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startLoadJob(BigQueryServicesImpl.java:144) org.apache.beam.sdk.io.gcp.bigquery.WriteTables.load(WriteTables.java:259) org.apache.beam.sdk.io.gcp.bigquery.WriteTables.access$600(WriteTables.java:77) org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.processElement(WriteTables.java:155) Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "Table with field based partitioning must have a schema.", "reason" : "invalid" } ], "message" : "Table with field based partitioning must have a schema." }
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065) com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:218) org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:204) org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startLoadJob(BigQueryServicesImpl.java:144) org.apache.beam.sdk.io.gcp.bigquery.WriteTables.load(WriteTables.java:259) org.apache.beam.sdk.io.gcp.bigquery.WriteTables.access$600(WriteTables.java:77) org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.processElement(WriteTables.java:155) org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn$DoFnInvoker.invokeProcessElement(Unknown Source) org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:177) org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:138) com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunner.startBundle(StreamingSideInputDoFnRunner.java:60) com.google.cloud.dataflow.worker.SimpleParDoFn.reallyStartBundle(SimpleParDoFn.java:300) com.google.cloud.dataflow.worker.SimpleParDoFn.startBundle(SimpleParDoFn.java:226) com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.start(ParDoOperation.java:35) com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:67) com.google.cloud.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1197) com.google.cloud.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:137) com.google.cloud.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:940) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)
Imported from Jira BEAM-4486. Original Jira may contain additional context.
Reported by: GlennAmmons.