-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Description
Steps to reproduce:
PROJECT=$(gcloud config get-value project)
BUCKET=${USER}_gcs_bucket
BQ_DATASET=${USER}_bq_dataset
TABLE_NAME=out
bq
mk --project=$PROJECT $BQ_DATASET
gsutil mb gs://$BUCKET
PATH_TO_REPO_CLONE=/path/to/beam
mvn
archetype:generate -DarchetypeGroupId=org.apache.beam -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples
-DarchetypeVersion=2.8.0 -DgroupId=org.example -DartifactId=word-count-beam -Dversion="0.1" -Dpackage=org.apache.beam.examples
-DinteractiveMode=false
cd word-count-beam/
mkdir src/main/java/org/apache/beam/examples/cookbook
cp
$PATH_TO_REPO_CLONE/examples/java/src/main/java/org/apache/beam/examples/cookbook//BigQueryTornadoes.java
./src/main/java/org/apache/beam/examples/cookbook
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.cookbook.BigQueryTornadoes
-Dexec.args="--runner=DataflowRunner --project=$PROJECT --input=clouddataflow-readonly:samples.weather_stations
--gcpTempLocation=gs://$BUCKET/tmp --output=$BQ_DATASET.$TABLE_NAME " -Pdataflow-runner
This fails with:
java.lang.IllegalArgumentException: BigQueryIO.Read needs a GCS temp location to store temp files.
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead.validate(BigQueryIO.java:662)
at
org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:641)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:645)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
at
org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at
org.apache.beam.sdk.Pipeline.validate(Pipeline.java:577)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:312)
at
org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
at org.apache.beam.examples.cookbook.BigQueryTornadoes.runBigQueryTornadoes(BigQueryTornadoes.java:166)
at
org.apache.beam.examples.cookbook.BigQueryTornadoes.main(BigQueryTornadoes.java:172)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at
java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
at
java.lang.Thread.run(Thread.java:748)
Ironically, the example works if we remove --gcpTempLocation. From logs, we can see that in that case we use a bucket that looks like: gs://dataflow-staging-us-central1-927334603519.
Imported from Jira BEAM-6069. Original Jira may contain additional context.
Reported by: tvalentyn.
Reactions are currently unavailable