Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7832] Add ZetaSQL as a dialect in BeamSQL #9210

Merged
merged 2 commits into from
Aug 21, 2019
Merged

Conversation

amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Jul 31, 2019

Support ZetaSQL(https://github.com/google/zetasql) as a dialect in BeamSQL.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@amaliujia
Copy link
Contributor Author

Run Java PreCommit

@amaliujia
Copy link
Contributor Author

Run RAT PreCommit

@amaliujia
Copy link
Contributor Author

Run SQL PostCommit

2 similar comments
@amaliujia
Copy link
Contributor Author

Run SQL PostCommit

@amaliujia
Copy link
Contributor Author

Run SQL PostCommit

@amaliujia
Copy link
Contributor Author

Run Java PreCommit

@amaliujia
Copy link
Contributor Author

Run SQL PostCommit

2 similar comments
@amaliujia
Copy link
Contributor Author

Run SQL PostCommit

@amaliujia
Copy link
Contributor Author

Run SQL PostCommit

@amaliujia amaliujia requested a review from apilloud August 1, 2019 19:08
@@ -309,6 +312,10 @@ private QueryPlanner instantiatePlanner(JdbcConnection jdbcConnection, RuleSet[]
return new CalciteQueryPlanner(jdbcConnection, ruleSets);
}

if (queryPlannerClassName.equals(ZETASQL_PLANNER)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this with a reflection version and move it into its own PR:

try {
        return Class.forName(queryPlannerClassName)
                .getConstructor(QueryPlanner.class)
                .newInstance(jdbcConnection, ruleSets);
      } catch (Exception e) {
        throw new RuntimeException(
            String.format("Cannot construct query planner %s", queryPlannerClassName), e);
      }

@apilloud
Copy link
Member

apilloud commented Aug 1, 2019

Please move the ZetaSQL code into its own package. "sdks/java/extensions/sql/zetasql" would be a good choice. Once that is done, you can send an email to dev@ proposing this as a code drop. If that proposal is accepted I'll give it a quick code review. If not, we can discuss breaking this up into smaller pieces.

@amaliujia
Copy link
Contributor Author

amaliujia commented Aug 2, 2019

@apilloud
During moving ZetaSQL planner to another package, I find a dependency problem: Calcite.

BeamSQL depends on Calcite and does relocation on it.
ZetaSQL planner also depends on Calcite.

The problem is, in order to initialize an instance of ZetaSQL planner, we expect BeamSQL pass in a config to ZetaSQL planner, which unfortunately is CalciteConnectionConfig. Clearly, it causes a conflict and code fails to compile.

The right solution is vendored Calcite, which is in progress: #9189

So I suggest we still keep all code in BeamSQL, but in the dedicated zetasql directory. Once vendored Calcite effort is done, we can move ZetaSQL code to a separate module.

What do you think? Do you have a better idea?

@kanterov
Copy link
Member

kanterov commented Aug 2, 2019

I tried to run our internal test suite against this branch. There is a problem with gRPC dependency from jni_channel. It doesn't seem to be using vendored gRPC, or relocating gRPC classes properly.

@kanterov
Copy link
Member

kanterov commented Aug 2, 2019

It was causing:

java.lang.ClassNotFoundException: io.grpc.StatusRuntimeException

It seems there is an issue with Netty as well:

java.lang.NoClassDefFoundError: io/netty/channel/socket/nio/NioSocketChannel
	at org.apache.beam.repackaged.sql.com.google.zetasql.io.grpc.netty.NettyChannelBuilder.<init>(NettyChannelBuilder.java:73)
	at org.apache.beam.repackaged.sql.com.google.zetasql.io.grpc.netty.NettyChannelBuilder.forAddress(NettyChannelBuilder.java:94)

@amaliujia
Copy link
Contributor Author

During the process of moving zetasql planner to another package, I also experienced missing grpc dependency.

@apilloud
Copy link
Member

apilloud commented Aug 2, 2019

The grpc issue might be a ZetaSQL bug, I'll work with you to get an isolated repro on Monday. As for moving this to a separate package, I would consider that a blocker to this merging. ZetaSQL currently only works on a limited subset of linux machines, we don't want to break other users and developers. I see #9189 is waiting for my review, so I'm going to do that now to help move this forward.

Copy link
Member

@apilloud apilloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too large to review, but it is a code drop so we are OK with that. I reviewed the changes outside of zetasql then skimmed the rest.

LGTM

PrecommitJobBuilder builder = new PrecommitJobBuilder(
scope: this,
nameBase: 'JavaBeamZetaSQL',
gradleTask: ':javaPreCommitBeamZetaSQL',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this justifies its own jenkins precommit. Should this just be added to the existing SQL precommit?

If you do plan to keep, please test that this works in the PR prior to merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no SQL precommit. There is a SQL postcommit. I am actually happy to add to an existing precommit if there is any suitable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the SQL unit tests run as part of the Java precommit.

@@ -35,6 +35,8 @@ applyJavaNature(
include(dependency("org.apache.calcite:.*"))
include(dependency("org.apache.calcite.avatica:.*"))
include(dependency("org.codehaus.janino:.*"))
include(dependency("com.google.api.grpc:.*"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block of things goes away in #9189. We will probably need to vendor ZetaSQL. (No action required now, just want to make sure you are aware of the conflict.) cc: @vectorijk

@@ -82,6 +92,10 @@ dependencies {
compile "org.apache.calcite:calcite-core:$calcite_version"
compile "org.apache.calcite:calcite-linq4j:$calcite_version"
compile "org.apache.calcite.avatica:avatica-core:$avatica_version"
compile "com.google.api.grpc:proto-google-common-protos:1.12.0" // Interfaces with ZetaSQL use this
compile "com.google.zetasql:zetasql-jni-channel:2019.07.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the version 2019.07.1 a variable.

// Disable Gradle cache (it should not be used because the IT's won't run).
outputs.upToDateWhen { false }

include '**/*ZetaSQL.class'
Copy link
Member

@apilloud apilloud Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make this *TestZetaSQL.class instead?

@@ -132,6 +132,8 @@ public String explain(String sqlString) throws ParseException {
public static class BeamSqlEnvBuilder {
private static final String CALCITE_PLANNER =
"org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner";
private static final String ZETASQL_PLANNER =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't belong here. Can you just delete it?

FunctionSignatureId.FN_CASE_NO_VALUE,
FunctionSignatureId.FN_CASE_WITH_VALUE,
FunctionSignatureId.FN_TIMESTAMP_ADD,
// FunctionSignatureId.FN_TIMESTAMP_SUB,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a whole bunch of these commented out values, should they just be deleted? (I'm guessing they are effectively TODOs, but want to confirm.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are effectively TODO or partially supported. I will add comment to mark it.

@amaliujia
Copy link
Contributor Author

Run javaPreCommitBeamZetaSQL

@amaliujia
Copy link
Contributor Author

Run Seed Job

@amaliujia
Copy link
Contributor Author

Run JavaBeamZetaSQL Precommit

@amaliujia
Copy link
Contributor Author

Run Python PreCommit

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments. Since this is so huge, it is best to merge first and then fix the issues after, I think. The main thing that probably is new to this PR is the integration with Beam's gradle and Jenkins configurations. I think you should not have to do any new configuration to run your tests. Maybe I missed something special about them?

PrecommitJobBuilder builder = new PrecommitJobBuilder(
scope: this,
nameBase: 'JavaBeamZetaSQL',
gradleTask: ':javaPreCommitBeamZetaSQL',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the SQL unit tests run as part of the Java precommit.

@@ -187,6 +208,19 @@ task runPojoExample(type: JavaExec) {
args = ["--runner=DirectRunner"]
}

task runZetaSQLTest(type: Test) {
// Disable Gradle cache (it should not be used because the IT's won't run).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these ITs? I was scanning the files but did not see them.


/** ZetaSQLDialectSpecTest. */
@RunWith(JUnit4.class)
public class ZetaSQLDialectSpecTestZetaSQL {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally this class would end with Test. It has a strange name. But more important is that people may not notice it if they are reading quickly for test classes, or using find command line tool, etc.

It is also far too big. Should be broken up into areas of the dialect to make it easier to navigate and control which tests execute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Tests should be split.

For naming, ending Test will be matched as normal SQL tests (which we should not allow it happen due to less platform support of ZetaSQL). It is why IT is the end for integration tests. I followed the same convention to add ZetaSQL after Test to indicate what kind of tests it is.

@amaliujia amaliujia merged commit 68734a2 into master Aug 21, 2019
@amaliujia amaliujia deleted the rw_beam_zetasql branch August 21, 2019 19:15
@apilloud
Copy link
Member

Looks like the SQL post commit didn't get run on this. It started hard failing immediately after this went in: https://issues.apache.org/jira/browse/BEAM-8036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants