Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

BigQueryIO.Read.fromQuery breaks on EU datasets #405

Closed
davidpadbury opened this issue Aug 26, 2016 · 8 comments · Fixed by #411
Closed

BigQueryIO.Read.fromQuery breaks on EU datasets #405

davidpadbury opened this issue Aug 26, 2016 · 8 comments · Fixed by #411

Comments

@davidpadbury
Copy link

The query creates a temporary Dataset that by default has it's location in the US. Executing a query against a EU dataset throws the exception:

Caused by: java.io.IOException: Executing query ... failed: Cannot read and write in different locations: source: EU, destination: US
    at com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator.executeQueryAndWaitForCompletion(BigQueryTableRowIterator.java:414)
    at com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator.open(BigQueryTableRowIterator.java:138)
    at com.google.cloud.dataflow.sdk.util.BigQueryServicesImpl$BigQueryJsonReaderImpl.start(BigQueryServicesImpl.java:513)
    at com.google.cloud.dataflow.sdk.io.BigQueryIO$BigQuerySourceBase$BigQueryReader.start(BigQueryIO.java:1124)
    at com.google.cloud.dataflow.sdk.io.Read$Bounded$1.evaluateReadHelper(Read.java:178)

Closed issue #86 referenced fixing an almost identical error, but still seems broken in 1.7.0-SNAPSHOT.

@dhalperi
Copy link
Contributor

@davidpadbury can you confirm that this is the Direct or InProcesPipelineRunner?

@peihe -- can you take a look?

@davidpadbury
Copy link
Author

@dhalperi ah sorry. The DirectPipelineRunner.

@dhalperi
Copy link
Contributor

Yeah. So I bet that per #86 this is fixed only in the DataflowPipelineRunner. We will fix this for the Direct/InProcess runners in the next cut.

@dhalperi
Copy link
Contributor

Thanks for the bug report!

@nevillelyh
Copy link
Contributor

We can re-pro the same issue when querying from an EU table. Works with the public shakespeare table.

import com.google.api.services.bigquery.Bigquery;
import com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator;

import java.io.IOException;

public class Test {
  public static void main(String[] args) throws IOException, InterruptedException {
    Bigquery bq = BigQueryClient.defaultInstance().bigquery();
    BigQueryTableRowIterator iterator = BigQueryTableRowIterator.fromQuery(
        "SELECT word FROM [data-integration-test:samples_eu.shakespeare]",
        "data-integration-test", bq, false);
    iterator.open();
    iterator.advance();
    System.out.println(iterator.getCurrent());
  }
}

@dhalperi
Copy link
Contributor

dhalperi commented Sep 4, 2016

Fix will be released in 1.7.0

@MingweiSamuel
Copy link

MingweiSamuel commented Aug 24, 2017

I'm still having this issue with version 2.0.0 using the renamed DataflowRunner.

Although it could be some other sort of staging option, there are a lot of pipeline options.

@lukecwik
Copy link
Contributor

Can you provide a simple repro?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants