Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-3983][SQL] Add BigQuery table provider #5220

Merged
merged 1 commit into from Apr 30, 2018

Conversation

apilloud
Copy link
Member

This adds a bigquery table provider to Beam SQL.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand:
    • What the pull request does
    • Why it does it
    • How it does it
    • Why this approach
  • Each commit in the pull request should have a meaningful subject line and body.
  • Run ./gradlew build to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@apilloud
Copy link
Member Author

run java precommit

@apilloud
Copy link
Member Author

R: @kennknowles This is the SQL part of bigquery support.

import org.apache.beam.sdk.values.Row;

/**
* {@code BeamBigQueryTable} represent a BigQuery table as source or target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention lack of support for read in the javadoc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment to reflect the lack of read (source) support.

* {@code BeamBigQueryTable} represent a BigQuery table as source or target.
*
*/
public class BeamBigQueryTable extends BaseBeamTable implements Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to put @Experimental here, even though the whole SQL module is still experimental anyhow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added @Experimental

@@ -65,6 +65,7 @@ dependencies {
shadow library.java.joda_time
shadow project(path: ":beam-runners-direct-java", configuration: "shadow")
provided project(path: ":beam-sdks-java-io-kafka", configuration: "shadow")
provided project(path: ":beam-sdks-java-io-google-cloud-platform", configuration: "shadow")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on the pom, but since it is deprecated, consider that comment to apply to this line.

<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
<scope>provided</scope>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that I think we are going to have to find a different way to manage the various providers. For now this follows the pattern set by Kafka so that's totally fine. But we should track how to reorg the modules so we don't have to bake in everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each provider has a dependency on SQL and its respective IO module, so we need to break providers out into their own module by IO module. The SQL CLI also depends on the providers, so we need some way for it to discover providers are actually available. I created a jira: https://issues.apache.org/jira/browse/BEAM-4190

assertEquals("project:dataset.table", bqTable.getTableSpec());
}

private static Table mockTable(String name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is mock the right term? It has a particular denotation at this point, of a weird magic object that doesn't go through real code paths. Since a table is basically a data struct, I'd just say it is "fake".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, replaced mock with fake.

*/

/**
* table schema for BigQuery.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalization

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed capitalization here and in kafka.

}

@Override public void dropTable(String tableName) {
// empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A verbose comment here about our intentions might avoid anyone implementing this in a data lossy way later.

@kennknowles kennknowles merged commit 1ea4db6 into apache:master Apr 30, 2018
@apilloud apilloud deleted the bigquery branch May 1, 2018 18:19
apilloud added a commit to apilloud/beam that referenced this pull request May 5, 2018
… table provider"

This reverts commit 1ea4db6, reversing
changes made to ddf7353.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants