-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-3640] Add support for SQL in DataSet programs #1862
Conversation
- add EnumerableToLogicalScan rule - in order to be able to mix TableAPI and SQL, we need our own copy of PlannerImpl
Awesome! Really cool to see that coming to Flink. Table table = tableEnv.fromDataSet(input);
tableEnv.registerTable("tab", table);
tableEnv.registerTable("tab1", table);
Table res = tableEnv.sql("SELECT COUNT(tab1.acount) AS acount, tab.word " +
"FROM tab, tab1 " +
"WHERE tab.word = tab1.word GROUP BY tab.word");
res = res.filter("acount > 2"); Table res = tableEnv.sql("SELECT COUNT(acount) AS acount, word " +
"FROM (SELECT * FROM tab WHERE acount = 1 ) " +
"GROUP BY word"); |
@@ -83,4 +84,17 @@ class AbstractTableEnvironment { | |||
) | |||
TranslationContext.registerTable(dataSetTable, name) | |||
} | |||
|
|||
/** | |||
* Execute a SQL query on a batch [[Table]]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the description of this method more generic as it will be the entry point for stream SQL queries as well.
import scala.collection.JavaConverters._ | ||
|
||
@RunWith(classOf[Parameterized]) | ||
class AggregationsITCase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need each test for DataSet
and Table
? Wouldn't it be sufficient to test for Table
and have one or two tests for DataSet
?
Thanks for the PR. I had a few minor comments but otherwise it looks really good. There are a few follow up issues, IMO:
|
Thanks for the review @fhueske. I've addressed your comments :) |
|
||
// initialize RelBuilder | ||
frameworkConfig = Frameworks | ||
.newConfigBuilder | ||
.defaultSchema(tables) | ||
.parserConfig(parserConfig) | ||
.costFactory(new DataSetCostFactory) | ||
.traitDefs(ConventionTraitDef.INSTANCE) | ||
.programs(Programs.ofRules(FlinkRuleSets.DATASET_OPT_RULES)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line can be removed because we set the rules explicitly before calling the optimizer.
+1 to merge after resolving one last minor comment. |
Thanks, I will make the change and merge. |
- add EnumerableToLogicalScan rule - in order to be able to mix TableAPI and SQL, we need our own copy of PlannerImpl - create a dummy RelNode in the reset() method, in order to retrieve the RelOptPlanner This closes apache#1862
- add EnumerableToLogicalScan rule - in order to be able to mix TableAPI and SQL, we need our own copy of PlannerImpl - create a dummy RelNode in the reset() method, in order to retrieve the RelOptPlanner This closes apache#1862
This PR adds basic support for batch SQL queries embedded in Table API programs.
In order to run a SQL query, a
DataSet
orTable
needs to be registered in theTableEnvironment
and then the query is executed using thesql
method:The result of the
sql
method is aTable
which can be used in subsequent Table API or SQL queries.