[BEAM-4044] [SQL] Add tables via TableStore in Schema, execute DDL in Calcite model #5224

apilloud · 2018-04-25T17:06:38Z

This PR moves our TableStore into the Calcite Schema as the only way to provide tables which removes the need to copy tables between the two. It also moves our DDL execution into the rel node, which allows calcite to execute the DDL directly.

Follow this checklist to help us incorporate your contribution quickly and easily:

akedin · 2018-04-25T19:41:31Z

LGTM

apilloud · 2018-04-25T23:12:29Z

R: @xumingmin @xumingming This is a trivial change, but it does change the public interface around BeamQueryPlanner.

apilloud · 2018-05-03T17:35:12Z

R: @kennknowles This gets us 90% of the way to sqlline.

akedin

Looks good, mostly nits.
Are there any new tests needed?

akedin · 2018-05-03T17:33:55Z

...tensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropObject.java

+    final boolean existed;
+    switch (getKind()) {
+    case DROP_TABLE:
+    case DROP_MATERIALIZED_VIEW:


Do we support views? If we don't have concrete plans to support them i'd rather remove all related code

Interesting question for the future. For pure SQL REPL use, we probably would want something to name queries for reuse. Does calcite manage these for us, and only delegates materialized views?

Deleted. Calcite does a lot of things for us when we get out of the way, views are probably one of them.

akedin · 2018-05-03T17:42:55Z

...va/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java

+          inputPCollections.getPipeline().apply("left", leftRelNode.toPTransform());
      PCollection<Row> rightRows =
-          inputPCollections.apply("right", rightRelNode.toPTransform());
+          inputPCollections.getPipeline().apply("right", rightRelNode.toPTransform());


Not sure if it's the right thing to access the pipeline directly. Who knows what's there? Does it have a source so that it can produce elements? With PCollections I would have at least an expectation that there should be elements in it

Agree. Does this actually work? If so, is there a different data path whereby the left and right collections are passed as input? This invocation will not register them as inputs to the transform.

akedin · 2018-05-03T17:44:01Z

...sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/BeamSqlTableProvider.java

+/**
+ * A {@code BeamSqlTableProvider} provides read only set of {@code BeamSqlTable}.
+ */
+public class BeamSqlTableProvider implements TableProvider {


make it @AutoValue+Builder?

akedin · 2018-05-03T17:49:35Z

...sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/BeamSqlTableProvider.java

+          .type(getTableType())
+          .name(table.getKey())
+          .columns(Collections.emptyList())
+          .build());


nit: I would rewrite it this way:

tables .values() .stream() .map(sqlTable -> Table .builder() .type(getTableType()) .name(sqlTable.getKey()) .columns(Collections.emptyList()) .build()) .collect(toList());

And I would write this exactly how I did. I find for loops to be far easier to read then java streams.

akedin · 2018-05-03T18:01:32Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/QueryTransform.java

+      for (Map.Entry<TupleTag<?>, PValue> input : inputs.expand().entrySet()) {
+        tables.put(input.getKey().getId(),
+            new BeamPCollectionTable(toRows(input.getValue())));
+      }


nit: I would avoid stateful if/else with loops with generics, hurts readability. Might consider extracting something like this:

if (input instanceof PCollection) { return ImmuableMap.of( PCOLLECTION_NAME, new BeamPCollectionTable(toRows(inputs))) } return inputs .expand() .entrySet() .stream() .collect( toMap( keyedPCollection -> keyedPCollection.getKey().getId(), keyedPCollection -> keyedPCollection.getValue()))

and then create BeamSqlTableProvider outside

Incidentally, here (or maybe it is nearby beneath a fold) seems like a good place to (possibly redundantly) explain that a PCollection makes a single magic table while any other kind of input uses expand() to make many tables using the tags as names.

I agree this is hard to read, I disagree that java streams makes it more readable. I've restructured it like you've suggested otherwise and added the comment.

akedin · 2018-05-03T18:03:11Z

...a/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteTable.java

+
+  @Override
+  public RelDataType getRowType(RelDataTypeFactory typeFactory) {
+    return CalciteUtils.toCalciteRowType(this.beamTable.getSchema(), BeamQueryPlanner.TYPE_FACTORY);


Create this in constructor?

akedin · 2018-05-03T18:05:12Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlCli.java

-      handleDropTable((SqlDropTable) sqlNode);
+    if (sqlNode instanceof SqlExecutableStatement) {
+      ((SqlExecutableStatement) sqlNode).execute(env.getContext());
    } else {


nit: add a comment what is executable statement, what is not?

Comment added: DDL nodes are SqlExecutableStatement

akedin · 2018-05-03T18:09:43Z

.../extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchema.java

+      if (table.getName().equals(name)) {
+        return new BeamCalciteTable(tableProvider.buildBeamSqlTable(table));
+      }
+    }


nit: looks like it would be better to convert this to a Map<String, BeamCalciteTable> once in constructor, this way you wouldn't need to implement map.keySet() or map.get()

I'm assuming this comment is on the line with tableProvider.listTables() not }? If so, the output of that function can not be cached. I do however agree that it makes sense to change the return value of that API to Map<String, Table>. Simplifies a lot of code all over, so I'll do that.

akedin · 2018-05-03T18:11:27Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java

+  private class ContextImpl implements CalcitePrepare.Context {
+    @Override
+    public JavaTypeFactory getTypeFactory() {
+      throw new UnsupportedOperationException();


Would it be wrong to return BeamQueryPlanner.TYPE_FACTORY?

Nope, changed.

akedin · 2018-05-03T18:14:43Z

...sions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStore.java


-  public InMemoryMetaStore() {
+  @Override public String getTableType() {
+    return "";


I think it should have its own table type still

Ok, I added type of store.

kennknowles

Nice. And I have said similar on a bunch of lines where the new stuff is more readable. Good cleanup along to way to this addition.

kennknowles · 2018-05-04T18:00:54Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/QueryTransform.java

-    PCollectionTuple inputTuple = toPCollectionTuple(input);
-
-    BeamSqlEnv sqlEnv = new BeamSqlEnv();
+    BeamSqlEnv sqlEnv = new BeamSqlEnv(toTableProvider(input));


kennknowles · 2018-05-04T18:01:01Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlCli.java

      options.setJobName("BeamPlanCreator");
      Pipeline pipeline = Pipeline.create(options);
-      compilePipeline(sqlString, pipeline, env);
+      env.getPlanner().compileBeamPipeline(sqlString, pipeline);


kennknowles · 2018-05-04T18:01:08Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlCli.java

-    } else if (sqlNode instanceof SqlDropTable) {
-      handleDropTable((SqlDropTable) sqlNode);
+    if (sqlNode instanceof SqlExecutableStatement) {
+      ((SqlExecutableStatement) sqlNode).execute(env.getContext());


kennknowles · 2018-05-04T18:01:14Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlCli.java

-    for (Table table : tables) {
-      env.registerTable(table.getName(), metaStore.buildBeamSqlTable(table.getName()));
-    }
+    this.env = new BeamSqlEnv(metaStore);


kennknowles · 2018-05-04T18:03:18Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/QueryTransform.java

+      for (Map.Entry<TupleTag<?>, PValue> input : inputs.expand().entrySet()) {
+        tables.put(input.getKey().getId(),
+            new BeamPCollectionTable(toRows(input.getValue())));
+      }


Incidentally, here (or maybe it is nearby beneath a fold) seems like a good place to (possibly redundantly) explain that a PCollection makes a single magic table while any other kind of input uses expand() to make many tables using the tags as names.

kennknowles · 2018-05-04T18:04:22Z

.../extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchema.java

+
+  @Override
+  public boolean isMutable() {
+    return true;


Curious - why? Is it that the underlying TableProvider is mutable? Or does this simply mean that the DDL is allowed to introduce new tables?

This means the DDL is allowed to introduce new tables, but I don't know of anywhere it is actually checked or set to anything but true in calcite.

kennknowles · 2018-05-04T18:06:56Z

...tensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/SqlDropObject.java

+    final boolean existed;
+    switch (getKind()) {
+    case DROP_TABLE:
+    case DROP_MATERIALIZED_VIEW:


Interesting question for the future. For pure SQL REPL use, we probably would want something to name queries for reuse. Does calcite manage these for us, and only delegates materialized views?

kennknowles · 2018-05-04T18:10:17Z

...va/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java

+          inputPCollections.getPipeline().apply("left", leftRelNode.toPTransform());
      PCollection<Row> rightRows =
-          inputPCollections.apply("right", rightRelNode.toPTransform());
+          inputPCollections.getPipeline().apply("right", rightRelNode.toPTransform());


Agree. Does this actually work? If so, is there a different data path whereby the left and right collections are passed as input? This invocation will not register them as inputs to the transform.

kennknowles · 2018-05-04T18:11:27Z

...va/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java

-      PCollectionTuple inputPCollections) {
-    PCollection<Row> factStream = inputPCollections.apply(leftRelNode.toPTransform());
+      PInput inputPCollections) {
+    PCollection<Row> factStream = inputPCollections.getPipeline().apply(leftRelNode.toPTransform());


getPipeline().apply() is probably not what you want here, either. It is actually a bad method - it is the same as getPipeline().begin().apply() so it always starts a new initial pipeline segment.

The current model is wrong, the new model is wrong. I've dropped this change and just added another PCollectionTuple.empty(input.getPipeline()). We can discuss the right way to do it as a followup.

kennknowles · 2018-05-04T18:11:45Z

...ns/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java

+  public PCollection<Row> buildBeamPipeline(PInput inputPCollections) {
    PCollection<Row> leftRows =
-        inputPCollections.apply(
+        inputPCollections.getPipeline().apply(


and here, and throughout

apilloud · 2018-05-04T23:19:28Z

run java precommit

apilloud · 2018-05-04T23:19:42Z

Got a Beam 1 Disconnected failure.

kennknowles · 2018-05-07T00:09:13Z

run java precommit

apilloud · 2018-05-07T21:19:05Z

run java precommit

apilloud · 2018-05-07T21:55:24Z

run java precommit

apilloud · 2018-05-07T22:35:27Z

run java precommit

apilloud force-pushed the cleanup branch from 3eaa0d8 to ef78f04 Compare April 25, 2018 19:56

apilloud force-pushed the cleanup branch 2 times, most recently from f7c251f to 2b0476e Compare May 3, 2018 16:24

apilloud changed the title ~~[BEAM-4044] [SQL] Cleanout unneeded sqlEnv~~ [BEAM-4044] [SQL] Add tables via TableStore in Schema, execute DDL in Calcite model May 3, 2018

apilloud mentioned this pull request May 3, 2018

[BEAM-4044] [SQL] Simplify TableProvider interface #5254

Closed

10 tasks

akedin reviewed May 3, 2018

View reviewed changes

kennknowles reviewed May 4, 2018

View reviewed changes

apilloud force-pushed the cleanup branch from 229fb77 to 99c8399 Compare May 4, 2018 22:34

apilloud added 6 commits May 7, 2018 10:34

[SQL] Cleanout unneeded sqlEnv

343c880

[SQL] Hide details of BeamSqlEnv from rel test

b18c813

[SQL] Simplify TableProvider interface

ac0b65c

[SQL] Return map from TableProvider.getTables

f67d4aa

[SQL] Add tables via TableStore in CalciteSchema

bc08f46

[SQL] Move Create and Drop Table inline with ddl

4bf7741

apilloud force-pushed the cleanup branch from 99c8399 to 4bf7741 Compare May 7, 2018 17:34

kennknowles merged commit 378ca94 into apache:master May 8, 2018

[BEAM-4044] [SQL] Add tables via TableStore in Schema, execute DDL in Calcite model #5224

[BEAM-4044] [SQL] Add tables via TableStore in Schema, execute DDL in Calcite model #5224

Uh oh!

Conversation

apilloud commented Apr 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akedin commented Apr 25, 2018

Uh oh!

apilloud commented Apr 25, 2018

Uh oh!

apilloud commented May 3, 2018

Uh oh!

akedin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kennknowles left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

apilloud commented Apr 25, 2018 •

edited

Loading