[IOTDB-28] Calcite Integration for IoTDB #902

Alima777 · 2020-03-11T13:01:30Z

IoTDB-Calcite Adapter.

IoTDB - Calcite Adapter 功能文档

关系表结构

IoTDB - Calcite Adapter 中使用的关系表结构为：

time	device	sensor1	sensor2	sensor3	...

其中，IoTDB 中每个存储组作为一张表，表中的列包括 time , device 列以及该存储组中所有设备中传感器的最大并集，其中不同设备的同名传感器应该具有相同的数据类型。

例如对于 IoTDB 中存储组 root.sg，其中设备及其对应的传感器为：

d1 -> s1, s2
d2 -> s2, s3
d3 -> s1, s4

则在 IoTDB - Calcite Adapter 中的表名为 root.sg，其表结构为

time	device	s1	s2	s3	s4

工作原理

接下来简单介绍 IoTDB - Calcite Adapter 的工作原理。

输入的 SQL 语句在经过 Calicte 的解析验证后，对 IoTDBRules 中定义的优化（下推）规则进行匹配，对于能够下推的节点做相应转化后，得到能够在 IoTDB 端执行的 SQL 语句，然后在 IoTDB 端执行查询语句获取源数据；对于不能下推的节点则调用 Calcite 默认的物理计划进行执行，最后通过 IoTDBEnumerator 遍历结果集获取结果。

查询介绍

当前在 IoTDBRules 中定义的下推规则有：IoTDBProjectRule, IoTDBFilterRule, IoTDBLimitRule。

IoTDBProjectRule

IoTDBProjectRule 实现了将查询语句中出现的投影列下推到 IoTDB 端进行执行。

例如：（以下 sql 均为测试中的语句）

对于通配符

select * from "root.vehicle"

对于通配符 *，将在转化中保持原样，而不转化为列名，得到 IoTDB 中的查询语句为：

select * from root.vehicle.* align by device

对于非通配符的传感器列

select s0 from "root.vehicle"

将转化为：

select s0 from root.vehicle.* align by device

对于非通配符的非传感器列

select "time", device, s2 from "root.vehicle"

该语句中的 time 及 device 列是 IoTDB 的查询语句中不需要包括的，因此转化将去掉这两列，得到 IoTDB 中的查询语句为：

select s2 from root.vehicle.* align by device

特别地，如果查询语句中仅包含 time 及 device 列，则投影部分将转化为通配符 *。

重命名 Alias

当前 IoTDB - Calcite Adapter 仅支持在 SELECT 语句中对投影列进行重命名，不支持在后续语句中使用重命名后的名称。

select "time" AS t, device AS d, s2 from "root.vehicle"

将得到结果中 time 列的名字为 t，device 列的名字为 d。

IoTDBFilterRule

IoTDBFilterRule 实现了将查询语句中的 WHERE 子句下推到 IoTDB 端进行执行。

WHERE 子句中不限制 device 列

select * from "root.vehicle" where "time" < 10 AND s0 >= 150

对于 time 列将不作改变，由于未限制具体的设备，因此传感器列不会与具体的设备名进行拼接，得到 IoTDB 中的查询语句为：

select * from root.vehicle.* where time < 10 AND s0 >= 150

WHERE 子句中限制 device 列

仅限制单个设备

select * from "root.vehicle" where device = 'root.vehicle.d0' AND "time" > 10 AND s0 <= 100

如果 WHERE 中只限制了单个设备且其它限制条件均是对该设备的限制，则在 IoTDB 中将转化为对该设备的查询，上述查询将转化为：

select * from root.vehicle.d0 where time > 10 AND s0 <= 100

限制多个设备

select * from "root.vehicle" where (device = 'root.vehicle.d0' AND "time" <= 1) OR (device = 'root.vehicle.d1' AND s0 < 100)

如果 WHERE 中限制了多个设备，将转化为多条查询语句，根据对每个设备的限制条件分别进行查询。

如上述查询语句将转化为两条 SQL 在 IoTDB 中执行：

select * from root.vehicle.d0 where time <= 1
select * from root.vehicle.d1 where s0 < 100

既有限制设备的条件，又有全局条件

select * from "root.vehicle" where (device = 'root.vehicle.d0' AND "time" <= 1) OR s0 = 999

在上述 SQL 语句中，除了有对设备 root.vehicle.d0 的单独限制外，还有一个限制条件 s0 = 999，该限制条件被认为是一个全局条件，任何设备只要满足该条件都被认为是正确结果。

因此上述查询将转化为对存储组中所有设备的查询，对于有单独限制条件的设备将单独处理，其它剩余设备将使用全局条件统一查询。

select * from root.vehicle.d0 where time <= 1 OR s0 = 999
select * from root.vehicle.d1 where s0 = 999

注：由于测试中恰好只有两个设备，如果再有一个设备 d2，则将在 FROM 子句加上 root.vehicle.d2 而非为设备 d2 单独再次查询。

…ve some bugs

vesense · 2020-05-11T10:42:14Z

@Alima777 Thanks for your contribution, I will have a review this month.

Alima777 · 2020-05-11T13:58:50Z

@Alima777 Thanks for your contribution, I will have a review this month.

Thank you. Wait for your review.

Alima777 · 2020-09-14T03:21:26Z

Hi, @yuqi1129 would you like to have a review of this PR? :D

yuqi1129 · 2020-09-14T05:12:59Z

@test

  public void testFilter7() {
    CalciteAssert.that()
        .with(MODEL)
        .with("UnQuotedCasing", IoTDBConstant.UnQuotedCasing)
        .query("select * from \"root.vehicle\" " +
            "where (device = 'root.vehicle.d0' AND \"time\" <= 1) OR s2 = 2.22 and 1 = 1")
        .returns("time=1; device=root.vehicle.d0; s0=101; s1=1101; s2=null; s3=null; s4=null\n" +
            "time=2; device=root.vehicle.d0; s0=10000; s1=40000; s2=2.22; s3=null; s4=null\n" +
            "time=2222; device=root.vehicle.d1; s0=null; s1=null; s2=2.22; s3=null; s4=null\n")
        .explainContains("PLAN=IoTDBToEnumerableConverter\n" +
            "  IoTDBFilter(condition=[OR(AND(=($1, 'root.vehicle.d0'), <=($0, 1)), =(CAST($4):DOUBLE NOT NULL, 2.22))])\n" +
            "    IoTDBTableScan(table=[[IoTDBSchema, root.vehicle]])");
  }

Should add rules to reduce constant value

like where 1 = 1 and a = 1 should be folded as where a = 1, see above

calcite/pom.xml

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBConstant.java

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBEnumerator.java

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBSchema.java

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBEnumerator.java

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBTable.java

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBRel.java

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBFilter.java

Alima777 · 2020-09-14T09:03:31Z

@yuqi1129 Hi, thank you very much for your patient review!

Since it's the first large module I implemented, ignoring lots of code standards, like: exception processing, error log, standard code style and something else... And It has not been maintained for a long time.

So Thanks again! I've fixed it and please have a check.

yuqi1129 · 2020-09-15T11:52:27Z

calcite/src/test/java/org/apache/iotdb/calcite/IoTDBAdapterTest.java

+    CalciteAssert.that()
+        .with(MODEL)
+        .with("UnQuotedCasing", IoTDBConstant.UNQUOTED_CASING)
+        .query("select * from \"root.vehicle\"")


What's the sql dialect here ? oracle , mysql or else ?

According to the calcite doc:

SQL conformance level. Values: DEFAULT (the default, similar to PRAGMATIC_2003)

It could be customed by properties.

yuqi1129 · 2020-09-15T12:10:01Z

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBTable.java

+
+  public Enumerable<Object> query(final Connection connection) {
+    return query(connection, ImmutableList.of(), ImmutableList.of(), ImmutableList.of(),
+        ImmutableList.of(), 0, 0);


Cassandra do not support to push limit and offset down to table scan, so just set 0 here and do the logic in upper relnode

yuqi1129 · 2020-09-15T12:23:50Z

calcite/IoTDB-Calcite-Adapter.md

+
+### IoTDBLimitRule
+
+IoTDBLimitRule 实现了将查询语句中的 LIMIT 及 OFFSET 子句下推到 IoTDB 端进行执行。


In fact, IoTDBLimitRule can't push limit and offset down to IoTDBTable, the limit and offset logic in done in code generated by calcite see IoTDBToEnumerableConverter#implement which will call IoTDBTable#query by reflection

you can debug the query method and see it

Actually, it does pushing down the limit and offset to IoTDBTable.query() by debugging. I don't understand what you mean exactly...

private IoTDBLimitRule() {

super(operand(EnumerableLimit.class, operand(IoTDBToEnumerableConverter.class, any())), "IoTDBLimitRule"); }

Maybe you can see this code..

JulianFeinauer

First off, this is really impressive work and I think we can merge it soon and I would rely try to use it more and more (especiually with the powerful query planner). I found some discussion points that I'd like to hear your opinion on or discuss the implementation. Thanks!

JulianFeinauer · 2020-09-26T08:47:49Z

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBSchema.java

+   * @param storageGroup the table name
+   * @return the columns' names and data types
+   */
+  RelProtoDataType getRelDataType(String storageGroup) throws SQLException, QueryProcessException {


I would pass the RelDataTypeFactory here via Parameter (from calling IoTDBTable and not create a new one). WDYT?

Hi, it's explained as follows in Calcite:

// Temporary type factory, just for the duration of this method. Allowable // because we're creating a proto-type, not a type; before being used, the // proto-type will be copied into a real type factory.

But actually I don't get the point from this... So I just copied the implementation way.

JulianFeinauer · 2020-09-26T08:54:54Z

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBSchema.java

+   * @param storageGroup the table name
+   * @return the columns' names and data types
+   */
+  RelProtoDataType getRelDataType(String storageGroup) throws SQLException, QueryProcessException {


A rather general question about the data model itself. Generally, if i understand it correctly we map the whole storage group to a single table where we have one column for the device and then unroll all possible sensors of each device as a column, is that right?
And if so, this could lead to issues in cases where I have two devices which have the same sensors but with different types (see Line 119 here). This could be avoided in situations where the query is something like
SELECT * FROM 'root.vehicles' where device = 'mycar' as we only have this single device there. What do you think, should we, perhaps in a later PR try to push the query further down to to evaluate the RelDataType with further information to avoid the situation descibed above?

Generally, if i understand it correctly we map the whole storage group to a single table where we have one column for the device and then unroll all possible sensors of each device as a column, is that right?

Yes, that's right.

This could be avoided in situations where the query is something like
SELECT * FROM 'root.vehicles' where device = 'mycar' as we only have this single device there.

As you mentioned, it does could be avoided in this situation. But before querying, we have to build a relational table here whose column data type has to be just one. The table will be validated by Calcite, in this situation which type of sensors should we adopt?
So I think the same sensor with different types should be avoided. If some queries are really needed, we can build a view in model file.

JulianFeinauer · 2020-09-26T09:44:20Z

calcite/src/main/java/org/apache/iotdb/calcite/IoTDBRules.java

+
+  public static final RelOptRule[] RULES = {
+      IoTDBFilterRule.INSTANCE,
+      IoTDBProjectRule.INSTANCE,


I am very unshure about this Rule or, the use of a custom Project Rule in general. By default Calcite will transform a LogicalProject into EnumerableCalc Rule which especially implements all Rex-Operations (like addition, substraction, ...).

In the current Implementation the IoTDBProject implements the default RexVisitorImpl which translates any relational expression to its first operand. So, e.g. a + b which is the REX +(a, b) will simply be transformed to a.

If we remove this rule by commenting out L51 Calcite will use the default EnumerableCalc and Rex Expressions will work as expected. You can easily see that by adding a test case like the following:

@Test public void testProject() { CalciteAssert.that() .with(MODEL) .with("UnQuotedCasing", IoTDBConstant.UNQUOTED_CASING) .query("select *, s0 + s1 as test from \"root.vehicle\" " + "where s0 <= 10") .limit(1) .returns("time=1000; device=root.vehicle.d1; s0=10; s1=5; s2=null; s3=thousand; s4=null; test=15\n") .explainContains("PLAN=EnumerableCalc(expr#0..6=[{inputs}], expr#7=[+($t2, $t3)], proj#0..7=[{exprs}])\n" + " IoTDBToEnumerableConverter\n" + " IoTDBFilter(condition=[<=($2, 10)])\n" + " IoTDBTableScan(table=[[IoTDBSchema, root.vehicle]])"); }

If the IotDBProjectRule is active, then this will give the wrong result 5 (only s1), without this RULE (commenting out L51) this will give the correct result 15 and the test will pass.

Is there a specific reason why you decided to implement the Project Rel explicitly? We could do this partial at least to push down aggregate queries of course but I would do this in a second step as I personally consider rex expressions "more important" so to say.

WDYT?

Hi,

Good point. I impletented the Project rel to push down more operations to IoTDB to avoid querying all data in each query.

For example, I want to see the column s1 in table, if I removed the ProjectRule, the query select s1 from "root.vehicle" will be transformed to select * from root.vehicle align by device. I have to query ALL DATA in storage group root.vehicle. The time cost will be unacceptable.

But as you mentioned, a + b will give the wrong result if it's active, so we can try to modify the rule of Project. Try to push down the single column, and leave a + b column for Calcite if possible.

And by the way, the aggregation is not push down in this PR as the aggregation method of IoTDB is a bit different from those relations'. We can try to push down partiallly in later PR.

wangchao316 · 2021-02-02T01:43:46Z

hi all, The project was developed based on calcite and avatica. calcite has some problems:

For simple query optimization, using calcite is too heavy.
When calcite generates the execution plan tree, some class files are generated, which may cause memory overflow. This is because there are too many fields in the search criteria.
For example, slelect * from a in (1, 2, 3, 4.....500),
This will cause oom.

yuqi1129 · 2021-02-02T02:27:29Z

@wangchao316 , Hi, about your problem, we may do the following as far as i known

Skip the optimization stage, that is, we only use the parser to convert sql to ast then validate and transfer the ast to logical RelNode, avatica builtin logical indeed is heavy for simple query, especially for time scales sql, this may need to change the calcite source code and maintain our own version if we want to solve this.
this problem seems the problem the SQL itself , we do not recommend too many values in in clause, due to the limit about the field in java language, we can't solve it well.

The above is a personal point of view, discussion is wanted

coveralls · 2021-08-31T03:45:31Z

Coverage increased (+0.005%) to 67.367% when pulling 77eede7 on Alima777:dev_calcite1 into fb18357 on apache:master.

Alima777 added 26 commits January 6, 2020 10:07

Version 1.0: implement all function and project pushing down rule

5bf413f

Merge branch 'master' into dev_calcite1

7bf10d2

Version 1.1: Add IoTDBLimitRule to push down Limit function

b90090b

Version 1.2 SNAPSHOT: Add IoTDBFilterRule to push down filter and lea…

f203d42

…ve some bugs

Merge branch 'master' into dev_calcite1

1b7e17c

Merge branch 'master' into dev_calcite1

c7ecfcb

Version 1.2: Leave the global restriction bug

ab2e5c1

Merge branch 'master' into dev_calcite1

d454536

Version 1.3: Fix filter rule bugs

bad9f01

modify the way creating table and data type

5591069

fix some bugs

486a760

Add apache license

d2b917e

Merge branch 'master' into dev_calcite1

630f073

Add more comments and logback.xml

672bfb9

Fix a bug

4471494

Fix a bug

2f1ce53

Merge branch 'master' into dev_calcite1

d1483ec

Fix type

6a964cb

Merge branch 'master' into dev_calcite1

ff43860

IoTDB - Calcite Adapter

7348c21

Change POM

d848565

Fix typo

dcdf32e

Fix a little bug

198e3c9

Support constant column

d3621d7

Fix conflict

d73d7ee

Fix bug: show timeseries alias column

c24fbda

Alima777 force-pushed the dev_calcite1 branch from 33dcb3c to c24fbda Compare April 27, 2020 13:22

Merge branch 'master' into dev_calcite1

bcb0d23

fix conflict

04db0cc

yuqi1129 reviewed Sep 14, 2020

View reviewed changes

Alima777 added 2 commits September 14, 2020 16:25

add exception processing and modify code style

1968107

add some test comment

6262d2b

yuqi1129 reviewed Sep 15, 2020

View reviewed changes

JulianFeinauer requested changes Sep 26, 2020

View reviewed changes

Alima777 and others added 3 commits September 27, 2020 16:51

fix conflict and modify pom

609d53e

merge master

6b4825c

Merge branch 'master' into dev_calcite1

e324d26

Merge branch 'master' into dev_calcite1

77eede7


		### IoTDBLimitRule

		IoTDBLimitRule 实现了将查询语句中的 LIMIT 及 OFFSET 子句下推到 IoTDB 端进行执行。

[IOTDB-28] Calcite Integration for IoTDB #902

Are you sure you want to change the base?

[IOTDB-28] Calcite Integration for IoTDB #902

Uh oh!

Conversation

Alima777 commented Mar 11, 2020 • edited by qiaojialin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

IoTDB - Calcite Adapter 功能文档

关系表结构

工作原理

查询介绍

IoTDBProjectRule

IoTDBFilterRule

Uh oh!

vesense commented May 11, 2020

Uh oh!

Alima777 commented May 11, 2020

Uh oh!

Alima777 commented Sep 14, 2020

Uh oh!

yuqi1129 commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Alima777 commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Alima777 Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JulianFeinauer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Alima777 Sep 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangchao316 commented Feb 2, 2021

Uh oh!

yuqi1129 commented Feb 2, 2021

Uh oh!

coveralls commented Aug 31, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Alima777 commented Mar 11, 2020 •

edited by qiaojialin

Loading

yuqi1129 commented Sep 14, 2020 •

edited

Loading

Alima777 commented Sep 14, 2020 •

edited

Loading

Alima777 Sep 15, 2020 •

edited

Loading

Alima777 Sep 27, 2020 •

edited

Loading