[BEAM-3983][SQL] Add BigQuery table provider #5220

apilloud · 2018-04-24T23:08:15Z

This adds a bigquery table provider to Beam SQL.

Follow this checklist to help us incorporate your contribution quickly and easily:

apilloud · 2018-04-25T16:52:18Z

run java precommit

apilloud · 2018-04-25T19:59:58Z

R: @kennknowles This is the SQL part of bigquery support.

kennknowles · 2018-04-30T00:07:14Z

...c/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamBigQueryTable.java

+import org.apache.beam.sdk.values.Row;
+
+/**
+ * {@code BeamBigQueryTable} represent a BigQuery table as source or target.


Mention lack of support for read in the javadoc?

Updated comment to reflect the lack of read (source) support.

kennknowles · 2018-04-30T00:07:15Z

...c/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamBigQueryTable.java

+ * {@code BeamBigQueryTable} represent a BigQuery table as source or target.
+ *
+ */
+public class BeamBigQueryTable extends BaseBeamTable implements Serializable {


Good to put @Experimental here, even though the whole SQL module is still experimental anyhow.

Added @Experimental

kennknowles · 2018-04-30T00:07:15Z

sdks/java/extensions/sql/build.gradle

@@ -65,6 +65,7 @@ dependencies {
  shadow library.java.joda_time
  shadow project(path: ":beam-runners-direct-java", configuration: "shadow")
  provided project(path: ":beam-sdks-java-io-kafka", configuration: "shadow")
+  provided project(path: ":beam-sdks-java-io-google-cloud-platform", configuration: "shadow")


Commented on the pom, but since it is deprecated, consider that comment to apply to this line.

kennknowles · 2018-04-30T00:07:15Z

sdks/java/extensions/sql/pom.xml

+    <dependency>
+      <groupId>org.apache.beam</groupId>
+      <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
+      <scope>provided</scope>


Noting that I think we are going to have to find a different way to manage the various providers. For now this follows the pattern set by Kafka so that's totally fine. But we should track how to reorg the modules so we don't have to bake in everything.

Each provider has a dependency on SQL and its respective IO module, so we need to break providers out into their own module by IO module. The SQL CLI also depends on the providers, so we need some way for it to discover providers are actually available. I created a jira: https://issues.apache.org/jira/browse/BEAM-4190

kennknowles · 2018-04-30T00:07:15Z

...ava/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProviderTest.java

+    assertEquals("project:dataset.table", bqTable.getTableSpec());
+  }
+
+  private static Table mockTable(String name) {


Nit: Is mock the right term? It has a particular denotation at this point, of a weird magic object that doesn't go through real code paths. Since a table is basically a data struct, I'd just say it is "fake".

I agree, replaced mock with fake.

kennknowles · 2018-04-30T00:07:15Z

...ql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/package-info.java

+ */
+
+/**
+ * table schema for BigQuery.


nit: capitalization

Fixed capitalization here and in kafka.

kennknowles · 2018-04-30T20:22:48Z

...in/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java

+  }
+
+  @Override public void dropTable(String tableName) {
+    // empty


A verbose comment here about our intentions might avoid anyone implementing this in a data lossy way later.

… table provider" This reverts commit 1ea4db6, reversing changes made to ddf7353.

kennknowles reviewed Apr 30, 2018

View reviewed changes

[BEAM-3983][SQL] Add BigQuery table provider

744fb5b

apilloud force-pushed the bigquery branch from a96a74c to 744fb5b Compare April 30, 2018 16:59

kennknowles approved these changes Apr 30, 2018

View reviewed changes

kennknowles merged commit 1ea4db6 into apache:master Apr 30, 2018

apilloud mentioned this pull request May 1, 2018

[BEAM-4044] [SQL] Simplify TableProvider interface #5254

Closed

10 tasks

apilloud deleted the bigquery branch May 1, 2018 18:19

apilloud added a commit to apilloud/beam that referenced this pull request May 5, 2018

Revert "Merge pull request apache#5220: [BEAM-3983][SQL] Add BigQuery…

8447b53

… table provider" This reverts commit 1ea4db6, reversing changes made to ddf7353.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-3983][SQL] Add BigQuery table provider #5220

[BEAM-3983][SQL] Add BigQuery table provider #5220

apilloud commented Apr 24, 2018

apilloud commented Apr 25, 2018

apilloud commented Apr 25, 2018

kennknowles Apr 30, 2018

apilloud Apr 30, 2018

kennknowles Apr 30, 2018

apilloud Apr 30, 2018

kennknowles Apr 30, 2018

kennknowles Apr 30, 2018

apilloud Apr 30, 2018

kennknowles Apr 30, 2018

apilloud Apr 30, 2018

kennknowles Apr 30, 2018

apilloud Apr 30, 2018

kennknowles Apr 30, 2018

[BEAM-3983][SQL] Add BigQuery table provider #5220

[BEAM-3983][SQL] Add BigQuery table provider #5220

Conversation

apilloud commented Apr 24, 2018

apilloud commented Apr 25, 2018

apilloud commented Apr 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment