Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New feature] Federated SQL query and Query optimization are going to sail out. #8284

Open
24 of 33 tasks
tristaZero opened this issue Nov 22, 2020 · 51 comments · Fixed by #11079
Open
24 of 33 tasks

[New feature] Federated SQL query and Query optimization are going to sail out. #8284

tristaZero opened this issue Nov 22, 2020 · 51 comments · Fixed by #11079

Comments

@tristaZero
Copy link
Contributor

tristaZero commented Nov 22, 2020

Hi community,

As you know, ShardingSphere has made a lot of efforts on SQL parser and provided a great independent SQL parser to help users parse SQL.

Based on this substantial work, we plan to do query optimization to optimize the input SQLs from users and produce an optimized SQL query plan to improve query efficiency. Plus, the federated SQL query feature (Like join query from different instances) is another essential highlight for our next release. : )

We will leverage Apache Calcite, an excellent framework to implement two of the features. Currently, three main work focus are presented here.

  • The investigation of RBO (Rule-Based Optimization) and CBO (Cost-Based optimization) ,i.e., Hep planner engine and Volcano planner engine. (Mechanism, usage, pros and cons).
  • The investigation of Calcite adaptor, especially TranslatableTable API.
  • How to transform the parsed result of ShardingSphere to the algebra of Calcite
  • Call some of the optimization rules to process the relational expression
  • Implement Calcite adaptor to join SQLs from different instances
  • Combine sharding tables with Calcite adaptors.

Actually, there are plenty of works to do on this issue. We are in the investigation phase now and will seek contributors for this issue later. If you are interested in this one, please give it a watch. 😉


10th January 2021 Task Update

Hi, community,

Here is the progress update so far.

Functions

  • The research on the source code and API of Apache Calcite
  • The design plan on Apache ShardingSphere and Apache Calcite
  • The implementation of CalciteLogicSchemaFactory, CalciteLogicSchema, and CalciteFilterableTable
  • The Calcite JDBC executor
  • The work embedding with initialization, router, and executor with ShardingSphere
  • Use ShardingSphere parser instead of the Calcite parser to parse SQL (SqlNodeConverter, @guimingyue) (Q2)
  • Custom planner for SQL Optimization(PlannerInitializer, @guimingyue ) (Q2)
  • SQL Optimization for CalciteFilterableTable (CalciteExecutionSQLGenerator) (Q2)
  • The implement for CalciteRawExecutor (Especially ResultSetMetadata) (Q2)
  • The SQLRouter of ShardingSphere to route SQL to CalciteExecutor (Q2)

Unit test (Q2)

SQL Federation

  • Binding tables (Native)
  • Single table and Broadcast table (Native)
  • Sharding table and Broadcast table (Native)
  • Broadcast table and Broadcast table (Native)
  • Single table and Single table + UT
  • Single table and Sharding table + UT(Q2)
  • Sharding table and Sharding table + UT (Q2)

Scenario

  • Sharding
  • Sharding + ReplicaQuery (Q2)
  • Sharding + Encryption (Q2)
  • Sharding + Encryption + ReplicaQuery (Q2)
@tristaZero tristaZero added this to the 5.0.0-beta milestone Nov 22, 2020
@tristaZero tristaZero self-assigned this Nov 22, 2020
@guimingyue
Copy link
Contributor

Maybe this issue is well for me: How to transform the parsed result of ShardingSphere to the algebra of Calcite.
I have done some work a few months ago, and I'm familiar with the ast of ShardingSphere parsed result.
Also, I am investigating the query optimization module of calcite, but until now, I think there is still a long way to go.

@junwen12221
Copy link

current level of mycat2

@tristaZero
Copy link
Contributor Author

@guimingyue Welcome!
Yep, it looks like this issue will take up much time, but we can take it step by step. : )

@tristaZero
Copy link
Contributor Author

tristaZero commented Nov 23, 2020

Hi @junwen12221 ,
I gave a brief look at the doc. It looks like you focus on Query optimization rather than federated SQL, doesn't it?
You took a lot of work on it, veteran. :-)
BTW, if you would like to join this community, welcome anytime!

@junwen12221
Copy link

@tristaZero
The federated SQL query feature (Like join query from different instances) I have achieved.We can talk about some aspects of calcite.

@tristaZero
Copy link
Contributor Author

@junwen12221 That's great!
Your link above shows me you spent many efforts on your community, I am not unsure whether you would like to put your mind and experience to best use, I mean, this wonderful feature. :-) Currently, we have another veteran, @guimingyue .

BTW, recently one issue confused me a lot. Could I listen to your great idea?
The basic process is,

SQL Statement (parsed ShardingSphere parser)-> RelNode(tranlasted from SQL Statment, need developing)->Optimized RelNode (optimized by CBO)

My question is how to tell Calcite adaptor to use these optimized RelNodes?

AFAIK, Driver.getConnection(CALCITE_URL) is the only way to trigger Calcite to use the custom adaptor, which will bring another SQL parsing from Calcite instead of using our optimized RelNode, doesn't it?

How can we use the custom adaptor without calcite parser?

Your GitHub activity tells me you are skillful at Calcite, anticipate your ideas. : )
Indeed, we welcome any comments from guys in the community!

@junwen12221
Copy link

junwen12221 commented Nov 24, 2020

@tristaZero @guimingyue

Look at the example below

https://github.com/zabetak/calcite/blob/c7646fa6d3054b9e6b9a800e08d18f3cc89922e4/core/src/test/java/org/apache/calcite/examples/foodmart/java/EndToEndExampleEnumerable.java

on line 127 ,you can replace the sql parser
on line 156 ,can replace RelNode with RelBuilder
on line 165-179,it is optimizer code.
after 187 ,the plan runs.
so you need a converter that builds RelNode from ShardingSphere Parser or convert ShardingSphere AST to RelNode or ShardingSphere AST to org.apache.calcite.sql.SqlNode.

@tristaZero
Copy link
Contributor Author

tristaZero commented Nov 26, 2020

Hi @junwen12221 ,

Sorry for my late response, as I am writing a demo to implement the query optimization and federated SQL these days.
Your explanation makes sense to me. My big thanks for your share!
From 104-optimizer, I know you are specializing in Calcite and query optimization. I wonder whether you are interested in this community to participate in this feature? My big welcome. : )

BTW, what do you think ProjectableFilterableTable? I am considering using it instead of TranslatableTable to get data from actual databases. Given push down rule, I suppose ProjectableFilterableTable is enough to filter rows. Besides, it seems TranslatableTable has the same function but is more complicated. Do you think which scenario will make us choose TranslatableTable rather than ProjectableFilterableTable?

Looking forward to your any innovative ideas.

Best wishes,
Trista

@junwen12221
Copy link

junwen12221 commented Nov 27, 2020

@tristaZero

I'm glad that more open source projects use calcite to do data sharding based on SQL language. I don't care which project or community. This technology already exists in closed source software.

Calcite converts SQL to LogicalTableScan , before LogicalTableScan applys other table interfaces.

ProjectableFilterableTable (FilterableTable,ScannableTable) has built-in rules in calcite, so you can use it to achieve pushdown projection and filtering, and correspondingly you can generate about select ... from... Where... , but if you want to define more relnode (your own SQL generating rules), you can use TranslatableTable(or perform conversion in other stages). It can use relnode of your defined table instead of using LogicalTableScan . Relatively, you need to write your own rules to optimize it. In fact, a table can be ProjectableFilterableTable , TranslatableTable, FilterableTable, ScannableTable or custom interface.

When the result of TranslatableTable is LogicalTableScan , it is trivial.The result also can be another ProjectableFilterableTable RelNode.

A conservative approach is to distinguish multiple converters or optimizers. In the first stage, when SQL generates relnode, use the built-in table interface and rules of calculate, and then try your own rules or do the next transformation in the second stage.

If you know all the rules, you can perform all transformations in one phase.

@guimingyue
Copy link
Contributor

Before we start the work on this issue, we need to design the schema definition depending on calcite. Now, I'm investigating calcite's schema system。

@tristaZero
Copy link
Contributor Author

@guimingyue +1, BTW. If your investigation makes progress, welcome your sharing here. A demo is a good way for discussion and presentation.

@tristaZero
Copy link
Contributor Author

Hi @junwen12221 ,

Very appreciated your explanation.
Just to follow your last comment, I gave a detailed look at TranslatableTable and ProjectableFilterableTable.

Here is my understanding,
ProjectableFilterableTable provide some simple interfaces (Nothing to do with rules) to allow users to focus on project and filter optimization. At the same time TranslatableTable gives users the most possibilities to control all the relNode conversions (Optimizations), like project, count(), limit etc. The function of Rule is to match and convert RelNode.

Later, I tried to learn more about plan rules, which is a headache for me. For example, what's the difference between ConverterRule and RelRule? What are the CoreRules and EnumerableRules for?

Worse still, the document of Calcite is too skimped to get the answer to these questions above. :( I wonder how you master this complicated tool? Are there any links or docs that can teach users more?

BTW, do you mind exchanging weChat number? If it is possible, could you send your weChat number to panjuan@apache.org? No doubt, talking here is still a good way if you like. :)

Best,
Trista

@junwen12221
Copy link

@tristaZero

Generally, logical RelNode(src\main\java\org\apache\calcite\rel\logical)
starts from org.apache.calcite.plan.Convention#NONE,
for example:

  public static LogicalTableScan create(RelOptCluster cluster,
      final RelOptTable relOptTable, List<RelHint> hints) {
    final Table table = relOptTable.unwrap(Table.class);
    final RelTraitSet traitSet =
        cluster.traitSetOf(Convention.NONE)
            .replaceIfs(RelCollationTraitDef.INSTANCE, () -> {
              if (table != null) {
                return table.getStatistic().getCollations();
              }
              return ImmutableList.of();
            });
    return new LogicalTableScan(cluster, traitSet, hints, relOptTable);
  }

Then it goes through the (rule) converter to transform into the another RelNode with a new org.apache.calcite.plan.Convention,
The class that specializes in this conversion is org.apache.calcite.rel.convert.ConverterRule.
So it is also a kind of org.apache.calcite.plan.RelOptRule.

org.apache.calcite.plan.RelRule ,a new class recently added, in order to change the behavior of the rule through configuration.
You can see:
CALCITE-3923

CoreRules also a new class recently added from refactor.Simply put, it sums up the rules of
Classes in src\main\java\org\apache\calcite\rel\logical
Its input is logical RelNode, and its output is also logical RelNode,
they are all Convention.NONE. Unlike ConverterRule from one Convention RelNode to another Convention RelNode.

According to the same idea, we can guess the purpose of org.apache.calcite.adapter.enumerable.EnumerableRules,
It implements the conversion from Convention.NONE to EnumerableConvention.INSTANCE.
EnumerableConvention is (EnumerableRel) ReNode of Calcite code generator executor.
After the conversion is complete, you can use org.apache.calcite.adapter.enumerable.EnumerableRel#implement to generate the executor.

Despite all the above, they are actually some cases of org.apache.calcite.plan.RelOptRule.

@tristaZero
Copy link
Contributor Author

Hi, @junwen12221

Its input is logical RelNode, and its output is also logical RelNode, they are all Convention.NONE. Unlike ConverterRule from one Convention RelNode to another Convention RelNode.

If CoreRules is in the field of Convention.NONE and has nothing to do with Calling convention, what's the role of CoreRules for relNode optimization?

Also, here are some of my thinking about this issue. If I missed something, welcome your correction. :)

  1. It is possible to implement the custom adaptor for federated queries.
  2. We need to consider SQL optimization. If we use Calcite JDBC Driver, the only way to affect the optimization process is to override the interface for different kinds of tables like TranslatableTable and provide RelOptRule.
  3. On the other hand, we may try to skip Calcite JDBC Driver and call the parse, validate, optimize and execute functions in our new custom execute Driver, which needs a in-depth understanding about the source code of Calcite and coding work.
  4. By 3, we can use our parser engine to parse SQL instead of Calcite parser, providing a broad SQL support as our SQL parser can parse SQLs from different databases dialects.

@tristaZero
Copy link
Contributor Author

tristaZero commented Dec 2, 2020

My greetings for @junwen12221 @guimingyue .

Based on the points @junwen12221 gave before, I write a demo for this issue, i.e., SQL federation and SQL query optimization (Looking forward to your collaborative effort) ✊ .

In this demo, SQL federation using Calcite JDBC driver can work well. Nonetheless, I also write a raw executor (As 3 mentioned above) with parsing, validating, optimizing and executing, which you can view as a custom executor driver. Supposing this custom executor driver has a run-through process, we can replace Calcite parser with our SQL parser and add more plan rules to findBestExp. @guimingyue 's expectation, right?

Unfortunately, this CalciteRawExecutor succeeded in parsing, validating, optimizing SQLs but is blocked in executing. 😟

I tried any method I can, but failed, so I sincerely seek your any kind help and point! The exception info is presented later, also you can run assertSingleExecute() in CalciteRawExecutorTest to make it recurred.

The reason, I guess, is related to using the custom schema since it is different from this example new ReflectiveSchema().

But I have no approaches to debug or figure out the corresponding solutions. The only similar question I found is apache-calcite-querying-without-using-the-jdbc-api, FYI.

15:01:08.019 [main] DEBUG org.apache.calcite.plan.RelOptPlanner - Provenance:
rel#19:EnumerableCalc.ENUMERABLE(input=EnumerableCalc#18,expr#0..2={inputs},0=$t0)
  direct
    rel#17:EnumerableCalc.ENUMERABLE(input=RelSubset#16,expr#0..2={inputs},0=$t0)
      call#9 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
        rel#14:LogicalCalc.NONE(input=RelSubset#6,expr#0..2={inputs},0=$t0)
          call#5 rule [ProjectToCalcRule]
            rel#7:LogicalProject.NONE(input=RelSubset#6,inputs=0)
              no parent
rel#18:EnumerableCalc.ENUMERABLE(input=EnumerableTableScan#11,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
  direct
    rel#15:EnumerableCalc.ENUMERABLE(input=RelSubset#12,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
      call#7 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
        rel#13:LogicalCalc.NONE(input=RelSubset#4,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
          call#3 rule [FilterToCalcRule]
            rel#5:LogicalFilter.NONE(input=RelSubset#4,condition=<($0, 10))
              no parent
rel#11:EnumerableTableScan.ENUMERABLE(table=[sharding, t_order])
  call#1 rule [EnumerableTableScanRule(in:NONE,out:ENUMERABLE)]
    rel#1:LogicalTableScan.NONE(table=[sharding, t_order])
      no parent


java.lang.NullPointerException
	at Baz.bind(Unknown Source)
	at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:184)
	at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:173)
	at federated.sql.executor.CalciteRawExecutorTest.assertSingleExecute(CalciteRawExecutorTest.java:61)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)

@guimingyue
Copy link
Contributor

My greetings for @junwen12221 @guimingyue .

Based on the points @junwen12221 gave before, I write a demo for this issue, i.e., SQL federation and SQL query optimization (Looking forward to your collaborative effort) ✊ .

In this demo, SQL federation using Calcite JDBC driver can work well. Nonetheless, I also write a raw executor (As 3 mentioned above) with parsing, validating, optimizing and executing, which you can view as a custom executor driver. Supposing this custom executor driver has a run-through process, we can replace Calcite parser with our SQL parser and add more plan rules to findBestExp. @guimingyue 's expectation, right?

Unfortunately, this CalciteRawExecutor succeeded in parsing, validating, optimizing SQLs but is blocked in executing. 😟

I tried any method I can, but failed, so I sincerely seek your any kind help and point! The exception info is presented later, also you can run assertSingleExecute() in CalciteRawExecutorTest to make it recurred.

The reason, I guess, is related to using the custom schema since it is different from this example new ReflectiveSchema().

But I have no approaches to debug or figure out the corresponding solutions. The only similar question I found is apache-calcite-querying-without-using-the-jdbc-api, FYI.

15:01:08.019 [main] DEBUG org.apache.calcite.plan.RelOptPlanner - Provenance:
rel#19:EnumerableCalc.ENUMERABLE(input=EnumerableCalc#18,expr#0..2={inputs},0=$t0)
  direct
    rel#17:EnumerableCalc.ENUMERABLE(input=RelSubset#16,expr#0..2={inputs},0=$t0)
      call#9 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
        rel#14:LogicalCalc.NONE(input=RelSubset#6,expr#0..2={inputs},0=$t0)
          call#5 rule [ProjectToCalcRule]
            rel#7:LogicalProject.NONE(input=RelSubset#6,inputs=0)
              no parent
rel#18:EnumerableCalc.ENUMERABLE(input=EnumerableTableScan#11,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
  direct
    rel#15:EnumerableCalc.ENUMERABLE(input=RelSubset#12,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
      call#7 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
        rel#13:LogicalCalc.NONE(input=RelSubset#4,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
          call#3 rule [FilterToCalcRule]
            rel#5:LogicalFilter.NONE(input=RelSubset#4,condition=<($0, 10))
              no parent
rel#11:EnumerableTableScan.ENUMERABLE(table=[sharding, t_order])
  call#1 rule [EnumerableTableScanRule(in:NONE,out:ENUMERABLE)]
    rel#1:LogicalTableScan.NONE(table=[sharding, t_order])
      no parent


java.lang.NullPointerException
	at Baz.bind(Unknown Source)
	at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:184)
	at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:173)
	at federated.sql.executor.CalciteRawExecutorTest.assertSingleExecute(CalciteRawExecutorTest.java:61)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)

I will check out this demo, and debug it tomorrow.

@junwen12221
Copy link

@guimingyue @tristaZero
I fix it.
tristaZero/federatedSQL#2

@guimingyue
Copy link
Contributor

@guimingyue @tristaZero
I fix it.
tristaZero/federatedSQL#2

@junwen12221 I add the VM options you mentioned in https://github.com/tristaZero/federatedSQL/pull/2,and I found the line throwed the NPE,It‘s root.getRootSchema().getSubSchema("sharding").getTable("t_order")
@tristaZero
This is because the sharding sub schema you defined in the constructor method CalciteRowExecutor is assigned to CalciteRawExecutor#schema field。Then the DataContext object with the same sub schema is binded to executablePlan,so, this invocation root.getRootSchema() in generated code returned the sharding sub schema which does not have any sub schema.

@tristaZero
Copy link
Contributor Author

Hi @guimingyue @junwen12221 ,

After adding VM opts, I found the program didn't enter the core part executablePlan.bind() (generated code), though these generated code showed in the IDEA console. How did you proceed with debugging in Baz@5153?

image

@junwen12221
Copy link

@tristaZero @guimingyue

read it.

janino

Debugging
The generated classes can be debugged interactively, even though they were created on-the-fly......

-Dorg.codehaus.janino.source_debugging.dir=federatedSQL\target\

target is maven target folder.

If it doesn't work, you can copy the code of the function to replace the generated object for debugging。

@junwen12221
Copy link

junwen12221 commented Dec 7, 2020

@tristaZero

CoreRules:Just rules that perform logical(not physical,Iterable) transformations on relational expressions.Generally used for RBO.This step mainly applies some heuristic rules, which are rule-based optimizer (RBO), so it is often called RBO stage.

It is possible to implement the custom adaptor for federated queries.
Yes.

We need to consider SQL optimization. If we use Calcite JDBC Driver, the only way to affect the optimization process is to override the interface for different kinds of tables like TranslatableTable and provide RelOptRule.
Yes.

On the other hand, we may try to skip Calcite JDBC Driver and call the parse, validate, optimize and execute functions in our new custom execute Driver, which needs a in-depth understanding about the source code of Calcite and coding work.

Calcite JDBC Driver provides some default advanced features including
lattice,Materialized Views.Or you can use

org.apache.calcite.tools.FrameworkConfig
org.apache.calcite.tools.Frameworks
org.apache.calcite.prepare.PlannerImpl

It may maintain the function of calcite jdbc driver, but it is not JDBC interface.

(adapter)https://calcite.apache.org/docs/adapter.html
(lattice)https://calcite.apache.org/docs/lattice.html
(Materialized Views)https://calcite.apache.org/docs/materialized_views.html#materialized-views-maintained-by-calcite

By 3, we can use our parser engine to parse SQL instead of Calcite parser, providing a broad SQL support as our SQL parser can parse SQLs from different databases dialects.

On the whole, it can be achieved.

@tristaZero
Copy link
Contributor Author

Hi @junwen12221 ,
Glad to see your explanation.

Calcite JDBC Driver provides some default advanced features including lattice,Materialized Views.Or you can use

I suppose these features are far away from us since currently we just focus on the adaptor, rules optimization, and the conversion between parsed result and relNode. Nevertheless, Frameworks is a handy tool including parser, validator and RelConverter.

@guimingyue @junwen12221

Speaking of Adaptor (Federated SQL), I suggest we implement the interface ProjectableFilterableTable initially to run through the whole process well. Next, we can import TranslatableTable for furthermore optimization.

When it comes to Query Optimization, from my perspective, there are two aspects for our consideration (Give it a notice here @guimingyue ).

  • Apply some of CoreRules and EnumerableRules to perform logical transformations on relational expressions
  • Implement TranslatableTable to register custom rules (Like CoreRules)

Notice, The conversion from the SQLStatement of ShardingSphere parser to SqlNode of Calcite is a prerequisite for broad SQL support.

Please be free to share your views on the content above. : )

BTW, I updated federatedSQL project with a non-generated-code converter to avoid janino debug, welcome your check-out. Moreover, I am going to post a summary or blog about How does Calcite handle a SQL? afterward. But...please leave me some more time...

@guimingyue How are you going? If you want to try to write something about query optimization, our federatedSQL repo is the right place, I guess. :)

@guimingyue
Copy link
Contributor

guimingyue commented Dec 8, 2020

@tristaZero
Sorry for the late reply, I have created two converter class for converting SQLStatement to calcite SqlNode and then to calcite RelNode. You can see my progress from this repository guimingyue/shardingsphere
Now, I'm trying to add some rules to rewrite RelNode with hep planner.

@junwen12221
Copy link

@guimingyue @tristaZero

org.apache.calcite.schema.ProjectableFilterableTable,it is indeed a quick start interface.

guimingyue added a commit to guimingyue/shardingsphere that referenced this issue May 30, 2021
This commit add a new executor for executing physical plan of relational expression, and this executor is based on volcano model. The detail of this commit follows blow.

* add a new package exec in , in this package, the main part is Executors,and they are the implementation of phycical RelNode. For example the NestedLoopJoinExecutor is the implemantation of physical RelNode SSNestedLoopJoin.
* Some Executors need to execute function to filter or aggregate rows, so that's what BuiltinFunction does. When building Executor instance, BuiltinFunction may be created and passed to Executor instance, for example, org.apache.shardingsphere.infra.executor.exec.CalcExecutor#build create BuiltinFunction for filter operator to filter rows.
@tristaZero tristaZero modified the milestones: 5.0.0-beta, 5.0.0-RC1 Jun 10, 2021
guimingyue added a commit to guimingyue/shardingsphere that referenced this issue Jun 20, 2021
add a SqlNodeConverter class to convert ss ast to calcite ast.
tristaZero pushed a commit that referenced this issue Jun 28, 2021
* to #8284
add a SqlNodeConverter class to convert ss ast to calcite ast.

* use ast SqlNodeConverter implementation to convert ast instead of all in one

* refactor sql node converter

* modify  according to code style

* refactor offset and row count sql node converter

* fix code style checking error
@tristaZero tristaZero reopened this Sep 6, 2021
@strongduanmu strongduanmu removed this from the 5.0.0-RC1 milestone Oct 14, 2021
@strongduanmu strongduanmu self-assigned this Oct 14, 2021
@github-actions
Copy link

github-actions bot commented Oct 8, 2022

Hello , this issue has not received a reply for several days.
This issue is supposed to be closed.

Copy link

There hasn't been any activity on this issue recently, and in order to prioritize active issues, it will be marked as stale.

@github-actions github-actions bot added the stale label Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment