[FLINK-3640] Add support for SQL in DataSet programs #1862

vasia · 2016-04-07T15:13:56Z

This PR adds basic support for batch SQL queries embedded in Table API programs.
In order to run a SQL query, a DataSet or Table needs to be registered in the TableEnvironment and then the query is executed using the sql method:

val tEnv = getScalaTableEnvironment
val t = getDataSet(env).toTable
tEnv.registerTable("MyTable", t)
val sqlQuery = "SELECT * FROM MyTable"
val result = tEnv.sql(sqlQuery)

The result of the sql method is a Table which can be used in subsequent Table API or SQL queries.

- add EnumerableToLogicalScan rule - in order to be able to mix TableAPI and SQL, we need our own copy of PlannerImpl

rmetzger · 2016-04-07T16:41:47Z

Awesome! Really cool to see that coming to Flink.
I played a bit around with it and it seems to work ;)

Table table = tableEnv.fromDataSet(input);
tableEnv.registerTable("tab", table);
tableEnv.registerTable("tab1", table);
Table res = tableEnv.sql("SELECT COUNT(tab1.acount) AS acount, tab.word " +
    "FROM tab, tab1 " +
    "WHERE tab.word = tab1.word GROUP BY tab.word");
res = res.filter("acount > 2");

Table res = tableEnv.sql("SELECT COUNT(acount) AS acount, word " +
    "FROM (SELECT * FROM tab WHERE acount = 1 ) " +
    "GROUP BY word");

fhueske · 2016-04-08T10:38:04Z

...braries/flink-table/src/main/scala/org/apache/flink/api/table/AbstractTableEnvironment.scala

@@ -83,4 +84,17 @@ class AbstractTableEnvironment {
    )
    TranslationContext.registerTable(dataSetTable, name)
  }
+
+  /**
+   * Execute a SQL query on a batch [[Table]].


Keep the description of this method more generic as it will be the entry point for stream SQL queries as well.

…e RelOptPlanner

fhueske · 2016-04-08T13:37:59Z

...ries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/test/AggregationsITCase.scala

+import scala.collection.JavaConverters._
+
+@RunWith(classOf[Parameterized])
+class AggregationsITCase(


Do we need each test for DataSet and Table? Wouldn't it be sufficient to test for Table and have one or two tests for DataSet?

fhueske · 2016-04-08T13:47:06Z

Thanks for the PR. I had a few minor comments but otherwise it looks really good.

There are a few follow up issues, IMO:

Check if we somehow can get around the EnumerableToLogicalTableScan. Maybe the Calcite community can help. I will open a JIRA for this once the PR is merged.
Check how we can exclude unsupported SQL features such as outer joins, intersection, etc. Also here, the Calcite community should be able to help. I will open a JIRA for this once the PR is merged.
Refactor TranslationContext and TableEnvironment to prevent that the same planner is used several times. I'll start a discussion about this soon.

vasia · 2016-04-09T10:02:19Z

Thanks for the review @fhueske. I've addressed your comments :)

fhueske · 2016-04-11T10:08:45Z

...ibraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/TranslationContext.scala


    // initialize RelBuilder
    frameworkConfig = Frameworks
      .newConfigBuilder
      .defaultSchema(tables)
      .parserConfig(parserConfig)
      .costFactory(new DataSetCostFactory)
-      .traitDefs(ConventionTraitDef.INSTANCE)
+      .programs(Programs.ofRules(FlinkRuleSets.DATASET_OPT_RULES))


I think this line can be removed because we set the rules explicitly before calling the optimizer.

fhueske · 2016-04-11T10:09:40Z

+1 to merge after resolving one last minor comment.

vasia · 2016-04-11T10:22:26Z

Thanks, I will make the change and merge.

- add EnumerableToLogicalScan rule - in order to be able to mix TableAPI and SQL, we need our own copy of PlannerImpl - create a dummy RelNode in the reset() method, in order to retrieve the RelOptPlanner This closes apache#1862

[FLINK-3640] Add support for SQL in DataSet programs

f0912a7

- add EnumerableToLogicalScan rule - in order to be able to mix TableAPI and SQL, we need our own copy of PlannerImpl

fhueske reviewed Apr 8, 2016
View reviewed changes

create a dummy RelNode in the reset() method, in order to retrieve th…

dcd0068

…e RelOptPlanner

fhueske reviewed Apr 8, 2016
View reviewed changes

address review comments

2fda45f

fhueske reviewed Apr 11, 2016
View reviewed changes

asfgit closed this in ed1e52a Apr 11, 2016

rmetzger added the component=API/TableSQL label Mar 14, 2019

flinkbot added component=TableSQL/API and removed component=API/TableSQL labels Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-3640] Add support for SQL in DataSet programs #1862

[FLINK-3640] Add support for SQL in DataSet programs #1862

vasia commented Apr 7, 2016

rmetzger commented Apr 7, 2016

fhueske Apr 8, 2016

fhueske Apr 8, 2016

fhueske commented Apr 8, 2016

vasia commented Apr 9, 2016

fhueske Apr 11, 2016

fhueske commented Apr 11, 2016

vasia commented Apr 11, 2016

[FLINK-3640] Add support for SQL in DataSet programs #1862

[FLINK-3640] Add support for SQL in DataSet programs #1862

Conversation

vasia commented Apr 7, 2016

rmetzger commented Apr 7, 2016

fhueske Apr 8, 2016

Choose a reason for hiding this comment

fhueske Apr 8, 2016

Choose a reason for hiding this comment

fhueske commented Apr 8, 2016

vasia commented Apr 9, 2016

fhueske Apr 11, 2016

Choose a reason for hiding this comment

fhueske commented Apr 11, 2016

vasia commented Apr 11, 2016