[CALCITE-4034] Implement a MySQL InnoDB adapter by neoremind · Pull Request #1996 · apache/calcite

neoremind · 2020-05-31T15:49:02Z

https://issues.apache.org/jira/browse/CALCITE-4034

Calcite’s InnoDB adapter allows you to query the data based on InnoDB data files directy, data files are also known as .ibd files. It leverages innodb-java-reader. This adapter is different from JDBC adapter which maps a schema in a JDBC data source and requires a MySQL server to serve response. With .ibd files and the corresponding DDLs, InnoDB adapter is able to work like a simple "MySQL server", it accepts SQL query and attempts to compile the query based on InnoDB file accessing APIs provided by innodb-java-reader, it will exploit projecting, filtering and sorting directly in InnoDB data file where possible.

What’s more, with DDLs, the adapter is "index aware", it leverages rules to choose the right index to scan, for example, using primary key or secondary keys to look up data, then it tries to push down some conditions into storage engine. Also, the adapter leaves option to provide hint as well, so that user can indicate the optimizer to force use one specific index.

The InnoDB adapter can,

Full scan a table.
Scan by index such as primary key or secondary key, single column or composite keys.
Push down filtering condition to storage engine, support both point query and range query.
Push down projection.
Push down ordering if it matches the natural collation of an index.
Support almost all the commonly used data types.
Work as a simple MySQL server being able to read .ibd data files directly through file system, this can be a tool to query or dump table data by offloading from MySQL process under some conditions.

A basic example of a model file is given below, this schema reads from a MySQL "scott" database:

{
  "version": "1.0",
  "defaultSchema": "scott",
  "schemas": [
    {
      "name": "scott",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.innodb.InnodbSchemaFactory",
      "operand": {
        "sqlFilePath": [ "/path/scott.sql" ],
        "ibdDataFileBasePath": "/usr/local/mysql/data/scott"
      }
    }
  ]
}

sqlFilePath is a list of DDL files, you can generate table definitions by executing mysqldump -d -u<username> -p<password> -h <hostname> <dbname> in command-line.

The file content of /path/scott.sql is given below:

CREATE TABLE `DEPT`(
`DEPTNO` TINYINT NOT NULL,
`DNAME` VARCHAR(50) NOT NULL,
`LOC` VARCHAR(20),
UNIQUE KEY `DEPT_PK` (`DEPTNO`)
)ENGINE=InnoDB DEFAULT CHARSET=latin1;

{% highlight bash %}
CREATE TABLE `EMP`(
`EMPNO` INT(11) NOT NULL,
`ENAME` VARCHAR(100) NOT NULL,
`JOB` VARCHAR(15) NOT NULL,
`AGE` SMALLINT,
`MGR` BIGINT,
`HIREDATE` DATE,
`SAL` DECIMAL(8,2) NOT NULL,
`COMM` DECIMAL(6,2),
`DEPTNO` TINYINT,
`EMAIL` VARCHAR(100) DEFAULT NULL,
`CREATE_DATETIME` DATETIME,
`CREATE_TIME` TIME,
`UPSERT_TIME` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`EMPNO`),
KEY `ENAME_KEY` (`ENAME`),
KEY `HIREDATE_KEY` (`HIREDATE`),
KEY `CREATE_DATETIME_JOB_KEY` (`CREATE_DATETIME`, `JOB`),
KEY `CREATE_TIME_KEY` (`CREATE_TIME`),
KEY `UPSERT_TIME_KEY` (`UPSERT_TIME`),
KEY `DEPTNO_JOB_KEY` (`DEPTNO`, `JOB`),
KEY `DEPTNO_SAL_COMM_KEY` (`DEPTNO`, `SAL`, `COMM`),
KEY `DEPTNO_MGR_KEY` (`DEPTNO`, `MGR`),
KEY `AGE_KEY` (`AGE`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

ibdDataFileBasePath is the parent file path of .ibd files.

Assuming the model file is stored as model.json, you can connect to InnoDB data file to perform query via sqlline as follows:

sqlline> !connect jdbc:calcite:model=model.json admin admin

We can query all employees by writing standard SQL:

sqlline> select empno,ename,job,age,mgr from "EMP";
+-------+--------+-----------+-----+------+
| EMPNO | ENAME  |    JOB    | AGE | MGR  |
+-------+--------+-----------+-----+------+
| 7369  | SMITH  | CLERK     | 30  | 7902 |
| 7499  | ALLEN  | SALESMAN  | 24  | 7698 |
| 7521  | WARD   | SALESMAN  | 41  | 7698 |
| 7566  | JONES  | MANAGER   | 28  | 7839 |
| 7654  | MARTIN | SALESMAN  | 27  | 7698 |
| 7698  | BLAKE  | MANAGER   | 38  | 7839 |
| 7782  | CLARK  | MANAGER   | 32  | 7839 |
| 7788  | SCOTT  | ANALYST   | 45  | 7566 |
| 7839  | KING   | PRESIDENT | 22  | null |
| 7844  | TURNER | SALESMAN  | 54  | 7698 |
| 7876  | ADAMS  | CLERK     | 35  | 7788 |
| 7900  | JAMES  | CLERK     | 40  | 7698 |
| 7902  | FORD   | ANALYST   | 28  | 7566 |
| 7934  | MILLER | CLERK     | 32  | 7782 |
+-------+--------+-----------+-----+------+

While executing this query, the InnoDB adapter scans the InnoDB data file EMP.ibd using primary key, also known as clustering B+ tree index in MySQL, and is able to
push down projection to underlying storage engine. Projection can reduce the size of data fetched from the storage engine.

We can look up one employee by filtering. The InnoDB adapter retrieves all indexes through DDL file provided in model.json.

sqlline> select empno,ename,job,age,mgr from "EMP" where empno = 7782;
+-------+-------+---------+-----+------+
| EMPNO | ENAME |   JOB   | AGE | MGR  |
+-------+-------+---------+-----+------+
| 7782  | CLARK | MANAGER | 32  | 7839 |
+-------+-------+---------+-----+------+

The InnoDB adapter is able to recognize that empno is the primary key and do a point-lookup by using clustering index instead of a full table scan.

We can do range query on primary key as well.

sqlline> select empno,ename,job,age,mgr from "EMP" where empno > 7782 and empno < 7900;

Note that such query with acceptable range is usually efficient in MySQL with InnoDB storage engine, because for clustering B+ tree index, records close in index are close in data file, which is good for scanning.

We can look up employee by secondary key. For example, the filtering condition will be on a VARCHAR field ename.

sqlline> select empno,ename,job,age,mgr from "EMP" where ename = 'smith';
+-------+-------+-------+-----+------+
| EMPNO | ENAME |  JOB  | AGE | MGR  |
+-------+-------+-------+-----+------+
| 7369  | SMITH | CLERK | 30  | 7902 |
+-------+-------+-------+-----+------+

The InnoDB adapter works well on almost all the commonly used data types in MySQL, for more information on supported data types, please refer to innodb-java-reader.

We can query by composite key. For example, given secondary index of DEPTNO_MGR_KEY.

sqlline> select empno,ename,job,age,mgr from "EMP" where deptno = 20 and mgr = 7566;
+-------+-------+---------+-----+------+
| EMPNO | ENAME |   JOB   | AGE | MGR  |
+-------+-------+---------+-----+------+
| 7788  | SCOTT | ANALYST | 45  | 7566 |
| 7902  | FORD  | ANALYST | 28  | 7566 |
+-------+-------+---------+-----+------+

The InnoDB adapter will leverage the matched key DEPTNO_MGR_KEY to push down filtering condition of deptno = 20 and mgr = 7566.

In some cases, only part of the conditions can be pushed down since there is a limitation in the underlying storage engine API, leaving unpushed remainder conditions in the rest of the plan. Given the below SQL, only deptno = 20 is pushed down.

select empno,ename,job,age,mgr from "EMP" where deptno = 20 and upsert_time > '2018-01-01 00:00:00';

innodb-java-reader only supports range query with lower and upper bound using an index, not fully Index Condition Pushdown (ICP). The storage engine returns a range of rows and Calcite will evaluates the rest of WHERE condition from the rows fetched.

For the below SQL, there are multiple indexes satisfying the left-prefix index rule, the possible indexes are DEPTNO_JOB_KEY, DEPTNO_SAL_COMM_KEY and DEPTNO_MGR_KEY, the Innod adapter will choose one of them according to the ordinal defined in DDL, only deptno = 20 condition is pushed down, leaving the rest of WHERE condition handled by Calcite built-in execution engine.

sqlline> select empno,deptno,sal from "EMP" where deptno = 20 and sal > 2000;
+-------+--------+---------+
| EMPNO | DEPTNO |   SAL   |
+-------+--------+---------+
| 7788  | 20     | 3000.00 |
| 7902  | 20     | 3000.00 |
| 7566  | 20     | 2975.00 |
+-------+--------+---------+

Accessing rows through secondary key requires scanning by secondary index and retrieving records back to clustering index in InnoDB, for a "big" scan, that would introduce many random I/O operations, so performance is usually not good enough. Note that the query above can be more performant by using EPTNO_SAL_COMM_KEY index, because covering index does not need to retrieve back to clustering index. We can force using DEPTNO_SAL_COMM_KEY index by hint as below.

sqlline> select empno,deptno,sal from "EMP"/*+ index(DEPTNO_SAL_COMM_KEY) */ where deptno = 20 and sal > 2000;

Hint can be configured in SqlToRelConverter, to enable hint, you should register index HintStrategy on TableScan in SqlToRelConverter.ConfigBuilder. Index hint takes effect on the base TableScan relational node, if there are conditions matching the index, index condition can be pushed down as well. For the below SQL, although none of the indexes can be used, but by leveraging covering index, the performance is better than full table scan, we can force to use DEPTNO_MGR_KEY to scan in secondary index.

sqlline> select empno,mgr from "EMP"/*+ index(DEPTNO_MGR_KEY) */ where mgr = 7839;

Ordering can be pushed down if it matches the natural collation of the index used.

sqlline> select deptno,ename,hiredate from "EMP" where hiredate < '2020-01-01' order by hiredate desc;
+--------+--------+------------+
| DEPTNO | ENAME  |  HIREDATE  |
+--------+--------+------------+
| 20     | ADAMS  | 1987-05-23 |
| 20     | SCOTT  | 1987-04-19 |
| 10     | MILLER | 1982-01-23 |
| 20     | FORD   | 1981-12-03 |
| 30     | JAMES  | 1981-12-03 |
| 10     | KING   | 1981-11-17 |
| 30     | MARTIN | 1981-09-28 |
| 30     | TURNER | 1981-09-08 |
| 10     | CLARK  | 1981-06-09 |
| 30     | WARD   | 1981-02-22 |
| 30     | ALLEN  | 1981-02-20 |
| 20     | JONES  | 1981-02-04 |
| 30     | BLAKE  | 1981-01-05 |
| 20     | SMITH  | 1980-12-17 |
+--------+--------+------------+

Limitations

innodb-java-reader has some prerequisites for .ibd files, please refer to Prerequisites.

You can think of the adapter as a simple MySQL server, with the ability to query, dump data by offloading from MySQL process under some conditions. If pages are not flushed from InnoDB Buffer Pool to disk, then the result may be inconsistent (the LSN in .ibd file might smaller than in-memory pages). InnoDB leverages write ahead log in terms of performance, so there is no command available to flush all dirty pages. Only internal mechanism manages when and where to persist pages to disk, like Page Cleaner thread, adaptive flushing, etc.

Currently the InnoDB adapter does not aware row count and cardinality of a .ibd data file, so it will only rely on simple rules to perform optimization, once underlying storage engine could provide such metrics and metadata, this can be integrated in Calcite by leveraging cost based optimization in the future.

XuQianJin-Stars · 2020-06-01T14:25:25Z

hi @neoremind，Thank for you work, I will take the time to review this pr.

neoremind · 2020-06-01T15:02:27Z

@XuQianJin-Stars many thanks 😃

neoremind · 2020-06-04T16:36:55Z

@XuQianJin-Stars I have rebased master, did some refinement and updated innodb_adapter.md, for how the adapter works, please refer to innodb_adapter.md, I think this doc explains most of the design and use cases.

XuQianJin-Stars · 2020-06-09T13:08:01Z

please

hi @neoremind Thanks for adding the documentation description, the whole PR looks good, I need to take a moment to take a look at it as a whole.

neoremind · 2020-06-09T14:06:29Z

@XuQianJin-Stars No hurry, take your time, thanks very much!

zinking · 2020-06-29T15:05:24Z

LGTM, how about adding a test that isn't within mysql's SQL syntax support but gets supported through this adapter ?

neoremind · 2020-06-30T02:11:16Z

@zinking Thanks for reviewing! Could you give me some testing SQL examples and maybe explain the meaning behind this?

XuQianJin-Stars · 2020-06-30T13:57:57Z

hi @neoremind
Have you considered that row_format has some other values:
default fixed dynamic compressed extra compact

neoremind · 2020-06-30T14:34:08Z

In MySQL 5.6, COMPACT is the default row format. After MySQL 5.7 (include 8.0), DYNAMIC is the default row format. The two are the most popular row formats.

For COMPRESSED, it is not supported yet. Users who cares about storage size rather than CPU load might choose this format. But IMHO, most MySQL users do not specify row format when creating table.

For FIXED row format, it is rarely used. Refer to https://dev.mysql.com/doc/refman/5.7/en/create-table.html

ROW_FORMAT=FIXED is not supported. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is disabled, InnoDB issues a warning and assumes ROW_FORMAT=DYNAMIC. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is enabled, which is the default, InnoDB returns an error.

For REDUNDANT row format, it is an very old format before MySQL 5.1.

For extra, there is no such row format. Valid row formats are {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT}

To conclude, the adapter supports COMPACT and DYNAMIC format which are most commonly used nowadays. I can add explanations in Limitation section.

XuQianJin-Stars

In MySQL 5.6, COMPACT is the default row format. After MySQL 5.7 (include 8.0), DYNAMIC is the default row format. The two are the most popular row formats.

For COMPRESSED, it is not supported yet. Users who cares about storage size rather than CPU load might choose this format. But IMHO, most MySQL users do not specify row format when creating table.

For FIXED row format, it is rarely used. Refer to https://dev.mysql.com/doc/refman/5.7/en/create-table.html
ROW_FORMAT=FIXED is not supported. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is disabled, InnoDB issues a warning and assumes ROW_FORMAT=DYNAMIC. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is enabled, which is the default, InnoDB returns an error.
For REDUNDANT row format, it is an very old format before MySQL 5.1.

For extra, there is no such row format. Valid row formats are {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT}

To conclude, the adapter supports COMPACT and DYNAMIC format which are most commonly used nowadays. I can add explanations in Limitation section.

well, I suggest to add the currently supported format in the document.

XuQianJin-Stars · 2020-06-30T14:17:58Z

innodb/src/main/java/org/apache/calcite/adapter/innodb/QueryType.java

+  /** Scanning table fully with secondary key. */
+  SK_FULL_SCAN(5);
+
+  private int priority;


private int priority -> private final int priority ?

XuQianJin-Stars · 2020-07-01T01:40:01Z

hi @neoremind What is the production usage scenario of this MySQL InnoDB Java Reader?

XuQianJin-Stars · 2020-07-01T01:43:36Z

hi @neoremind Sorry I haven't finished the review yet, I will continue to take the time to complete. This PR looks pretty good overall.

neoremind · 2020-07-01T10:15:12Z

@XuQianJin-Stars I have addressed the comments above.

For the question: What is the production usage scenario of this MySQL InnoDB Java Reader?
Calcite's InnoDB adapter allows user to query the InnoDB data based on ibd files directly as illustrated below. innodb-java-reader provides some basic storage APIs, for example, queryByPrimaryKey, queryAll and queryBySk, etc. With this adapter, user is able to query through SQL, and the internal embedded execution engine can work pretty well under most cases.

                      SQL Query 
                       |     |
                      /       \
                ------         ------
               |                     | 
               v                     v
      MySQL Server                    
+-----------------------+   +----------------------+        
| SQL/Parser/Optimizer..|   |Calcite innodb adapter|   
+-----------------------+   +----------------------+
+-----------------------+    +--------------------+         
| InnoDB Storage Engine |    | innodb-java-reader |
+-----------------------+    +--------------------+

---------------------File System--------------------

	+------------+      +-----+
	| .ibd files | ...  |     |    InnoDB Data files
	+------------+      +-----+

The by-pass querying capability can benefit the following scenarios:

Query by offloading mysql-server process, even without mysql-server running. Backup MySQL without disturbing mysql-server process, this is a good alternative to dump data besides mysql-dump and mysql -e command.
InnoDB data file can be distributed to other storage. Based on this adapter, users can transform as what ever they wanted, for example, transforming data to columnar format (ORC, Parquet) for OLAP engines; dumping data to heterogeneous NewSQL systems, so that Data Lake analytics can leverage this. Although real-time is hard to achieve because we only get a snapshot of the table, it is still useful in Lambda architecture.

neoremind · 2020-08-17T07:48:16Z

I have refactored some of the code to use the new API (in 1.25.0), to create and parameterized innodb planner rules. Please refer to https://issues.apache.org/jira/browse/CALCITE-3923.

neoremind · 2020-08-17T08:25:57Z

@XuQianJin-Stars I have addressed the comments from Julian (discussion in JIRA), and made the latest code compatible with 1.25.0, the binary files are not a concern anymore. Are there any other works to be done for this PR? I 'd very much like to push forward the work. Many thanks!

…r how planner rules are parameterized).

Add Holder.accept

A better implementation of Sarg is possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'.

…able them before merge to master.

(It is not a good practice to use Optional for fields or parameters.)

…nsform

…den methods

InnoDB is a storage engine for MySQL, but it can also be used as a standlone file format. This adapter adds a SQL interface to InnoDB that uses Calcite rather than MySQL. This adapter handles Sarg by expanding to an OR of ranges. A better implementation of Sarg is probably possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'. Tweaks (Julian Hyde): * Add Holder.accept * Make IndexCondition immutable * Move computation out of InnodbFilter's constructor Close apache#1996

neoremind force-pushed the innodb branch from 4c2f1e2 to eab1537 Compare June 1, 2020 13:39

neoremind force-pushed the innodb branch from eab1537 to 7055bfa Compare June 4, 2020 16:26

neoremind force-pushed the innodb branch from 1912941 to 8e3250b Compare June 14, 2020 08:43

neoremind force-pushed the innodb branch 2 times, most recently from 84e64ca to 1fbe785 Compare June 27, 2020 09:35

XuQianJin-Stars reviewed Jul 1, 2020

View reviewed changes

neoremind force-pushed the innodb branch 2 times, most recently from fbfa506 to b5e1622 Compare July 13, 2020 08:20

neoremind force-pushed the innodb branch from d6b613b to 85b39d5 Compare July 16, 2020 09:00

vlsi force-pushed the master branch from e4458fa to 5462be9 Compare July 17, 2020 20:19

neoremind force-pushed the innodb branch 2 times, most recently from 2d24440 to 08f42f7 Compare July 21, 2020 02:54

neoremind force-pushed the innodb branch 2 times, most recently from 1ca1bde to ea2e0fc Compare July 27, 2020 08:49

neoremind force-pushed the innodb branch from ea2e0fc to aa81073 Compare August 17, 2020 03:28

neoremind force-pushed the innodb branch from a025939 to 7792c98 Compare September 15, 2020 13:01

julianhyde and others added 12 commits September 28, 2020 08:34

Refactor: move CassandraRules.reverseDirection into Direction

554019b

squash commits and rebase master.

256ad18

Make changes of rules based on 1.25.0 new API ([CALCITE-3923] Refacto…

e0da108

…r how planner rules are parameterized).

fixup

858acaa

Add Holder.accept

Make HintStrategyTable immutable

9260ef1

Disable 5 tests. It seems that index hints are being ignored. Must en…

9d8c712

…able them before merge to master.

Make IndexCondition immutable

39bb7ba

Remove Optional

9ac63a6

(It is not a good practice to use Optional for fields or parameters.)

Use suppliers in InnodbTable

fadd963

Move computation out of InnodbFilter's constructor

7eb1f99

Fix test cases related to hint

5701459

neoremind force-pushed the innodb branch from 7792c98 to 5701459 Compare October 10, 2020 06:58

use org.apache.calcite.util.Util#transform instead of guava Lists.tra…

112a6fb

…nsform

neoremind force-pushed the innodb branch 2 times, most recently from 1b233f9 to 43b8513 Compare October 10, 2020 08:29

Fix CI failure of Error Prone (JDK 11), add (at)overwrite for overrid…

265cb35

…den methods

neoremind force-pushed the innodb branch from 43b8513 to 265cb35 Compare October 10, 2020 09:11

Fix CI failure of Error Prone (JDK 11), add (at)overwrite for overrid…

cfc332d

…den methods

neoremind force-pushed the innodb branch from 4068109 to cfc332d Compare October 10, 2020 09:28

Suppress Warnings for Method class

c348304

vlsi force-pushed the master branch from 5a338ff to 8df4df0 Compare October 10, 2020 16:14

julianhyde closed this in ebefe52 Oct 11, 2020

Conversation

neoremind commented May 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Limitations

Uh oh!

XuQianJin-Stars commented Jun 1, 2020

Uh oh!

neoremind commented Jun 1, 2020

Uh oh!

neoremind commented Jun 4, 2020

Uh oh!

XuQianJin-Stars commented Jun 9, 2020

Uh oh!

neoremind commented Jun 9, 2020

Uh oh!

zinking commented Jun 29, 2020

Uh oh!

neoremind commented Jun 30, 2020

Uh oh!

XuQianJin-Stars commented Jun 30, 2020

Uh oh!

neoremind commented Jun 30, 2020

Uh oh!

XuQianJin-Stars left a comment

Choose a reason for hiding this comment

Uh oh!

XuQianJin-Stars Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

XuQianJin-Stars commented Jul 1, 2020

Uh oh!

XuQianJin-Stars commented Jul 1, 2020

Uh oh!

neoremind commented Jul 1, 2020

Uh oh!

neoremind commented Aug 17, 2020

Uh oh!

neoremind commented Aug 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

neoremind commented May 31, 2020 •

edited

Loading