New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3983][SQL] Add BigQuery table provider #5220
Conversation
run java precommit |
R: @kennknowles This is the SQL part of bigquery support. |
import org.apache.beam.sdk.values.Row; | ||
|
||
/** | ||
* {@code BeamBigQueryTable} represent a BigQuery table as source or target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention lack of support for read in the javadoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment to reflect the lack of read (source) support.
* {@code BeamBigQueryTable} represent a BigQuery table as source or target. | ||
* | ||
*/ | ||
public class BeamBigQueryTable extends BaseBeamTable implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to put @Experimental
here, even though the whole SQL module is still experimental anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added @Experimental
@@ -65,6 +65,7 @@ dependencies { | |||
shadow library.java.joda_time | |||
shadow project(path: ":beam-runners-direct-java", configuration: "shadow") | |||
provided project(path: ":beam-sdks-java-io-kafka", configuration: "shadow") | |||
provided project(path: ":beam-sdks-java-io-google-cloud-platform", configuration: "shadow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented on the pom, but since it is deprecated, consider that comment to apply to this line.
<dependency> | ||
<groupId>org.apache.beam</groupId> | ||
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId> | ||
<scope>provided</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that I think we are going to have to find a different way to manage the various providers. For now this follows the pattern set by Kafka so that's totally fine. But we should track how to reorg the modules so we don't have to bake in everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each provider has a dependency on SQL and its respective IO module, so we need to break providers out into their own module by IO module. The SQL CLI also depends on the providers, so we need some way for it to discover providers are actually available. I created a jira: https://issues.apache.org/jira/browse/BEAM-4190
assertEquals("project:dataset.table", bqTable.getTableSpec()); | ||
} | ||
|
||
private static Table mockTable(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is mock
the right term? It has a particular denotation at this point, of a weird magic object that doesn't go through real code paths. Since a table is basically a data struct, I'd just say it is "fake".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, replaced mock with fake.
*/ | ||
|
||
/** | ||
* table schema for BigQuery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed capitalization here and in kafka.
} | ||
|
||
@Override public void dropTable(String tableName) { | ||
// empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A verbose comment here about our intentions might avoid anyone implementing this in a data lossy way later.
This adds a bigquery table provider to Beam SQL.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue../gradlew build
to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.