[CALCITE-4034] Implement a MySQL InnoDB adapter#1996
[CALCITE-4034] Implement a MySQL InnoDB adapter#1996neoremind wants to merge 16 commits intoapache:masterfrom
Conversation
|
hi @neoremind,Thank for you work, I will take the time to review this pr. |
|
@XuQianJin-Stars many thanks 😃 |
|
@XuQianJin-Stars I have rebased master, did some refinement and updated |
hi @neoremind Thanks for adding the documentation description, the whole PR looks good, I need to take a moment to take a look at it as a whole. |
|
@XuQianJin-Stars No hurry, take your time, thanks very much! |
84e64ca to
1fbe785
Compare
|
LGTM, how about adding a test that isn't within mysql's SQL syntax support but gets supported through this adapter ? |
|
@zinking Thanks for reviewing! Could you give me some testing SQL examples and maybe explain the meaning behind this? |
|
hi @neoremind |
|
In MySQL 5.6, For For For For To conclude, the adapter supports |
XuQianJin-Stars
left a comment
There was a problem hiding this comment.
In MySQL 5.6,
COMPACTis the default row format. After MySQL 5.7 (include 8.0),DYNAMICis the default row format. The two are the most popular row formats.For
COMPRESSED, it is not supported yet. Users who cares about storage size rather than CPU load might choose this format. But IMHO, most MySQL users do not specify row format when creating table.For
FIXEDrow format, it is rarely used. Refer to https://dev.mysql.com/doc/refman/5.7/en/create-table.htmlROW_FORMAT=FIXED is not supported. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is disabled, InnoDB issues a warning and assumes ROW_FORMAT=DYNAMIC. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is enabled, which is the default, InnoDB returns an error.For
REDUNDANTrow format, it is an very old format before MySQL 5.1.For
extra, there is no such row format. Valid row formats are {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT}To conclude, the adapter supports
COMPACTandDYNAMICformat which are most commonly used nowadays. I can add explanations inLimitationsection.
well, I suggest to add the currently supported format in the document.
| /** Scanning table fully with secondary key. */ | ||
| SK_FULL_SCAN(5); | ||
|
|
||
| private int priority; |
There was a problem hiding this comment.
private int priority -> private final int priority ?
|
hi @neoremind What is the production usage scenario of this MySQL InnoDB Java Reader? |
|
hi @neoremind Sorry I haven't finished the review yet, I will continue to take the time to complete. This PR looks pretty good overall. |
|
@XuQianJin-Stars I have addressed the comments above. For the question: What is the production usage scenario of this MySQL InnoDB Java Reader? The by-pass querying capability can benefit the following scenarios:
|
fbfa506 to
b5e1622
Compare
2d24440 to
08f42f7
Compare
1ca1bde to
ea2e0fc
Compare
|
I have refactored some of the code to use the new API (in 1.25.0), to create and parameterized innodb planner rules. Please refer to https://issues.apache.org/jira/browse/CALCITE-3923. |
|
@XuQianJin-Stars I have addressed the comments from Julian (discussion in JIRA), and made the latest code compatible with 1.25.0, the binary files are not a concern anymore. Are there any other works to be done for this PR? I 'd very much like to push forward the work. Many thanks! |
…r how planner rules are parameterized).
A better implementation of Sarg is possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'.
…able them before merge to master.
(It is not a good practice to use Optional for fields or parameters.)
1b233f9 to
43b8513
Compare
InnoDB is a storage engine for MySQL, but it can also be used as a standlone file format. This adapter adds a SQL interface to InnoDB that uses Calcite rather than MySQL. This adapter handles Sarg by expanding to an OR of ranges. A better implementation of Sarg is probably possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'. Tweaks (Julian Hyde): * Add Holder.accept * Make IndexCondition immutable * Move computation out of InnodbFilter's constructor Close apache#1996
InnoDB is a storage engine for MySQL, but it can also be used as a standlone file format. This adapter adds a SQL interface to InnoDB that uses Calcite rather than MySQL. This adapter handles Sarg by expanding to an OR of ranges. A better implementation of Sarg is probably possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'. Tweaks (Julian Hyde): * Add Holder.accept * Make IndexCondition immutable * Move computation out of InnodbFilter's constructor Close apache#1996
InnoDB is a storage engine for MySQL, but it can also be used as a standlone file format. This adapter adds a SQL interface to InnoDB that uses Calcite rather than MySQL. This adapter handles Sarg by expanding to an OR of ranges. A better implementation of Sarg is probably possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'. Tweaks (Julian Hyde): * Add Holder.accept * Make IndexCondition immutable * Move computation out of InnodbFilter's constructor Close apache#1996
https://issues.apache.org/jira/browse/CALCITE-4034
Calcite’s InnoDB adapter allows you to query the data based on InnoDB data files directy, data files are also known as .ibd files. It leverages innodb-java-reader. This adapter is different from JDBC adapter which maps a schema in a JDBC data source and requires a MySQL server to serve response. With .ibd files and the corresponding DDLs, InnoDB adapter is able to work like a simple "MySQL server", it accepts SQL query and attempts to compile the query based on InnoDB file accessing APIs provided by innodb-java-reader, it will exploit projecting, filtering and sorting directly in InnoDB data file where possible.
What’s more, with DDLs, the adapter is "index aware", it leverages rules to choose the right index to scan, for example, using primary key or secondary keys to look up data, then it tries to push down some conditions into storage engine. Also, the adapter leaves option to provide hint as well, so that user can indicate the optimizer to force use one specific index.
The InnoDB adapter can,
A basic example of a model file is given below, this schema reads from a MySQL "scott" database:
sqlFilePathis a list of DDL files, you can generate table definitions by executingmysqldump -d -u<username> -p<password> -h <hostname> <dbname>in command-line.The file content of
/path/scott.sqlis given below:ibdDataFileBasePath is the parent file path of
.ibdfiles.Assuming the model file is stored as
model.json, you can connect to InnoDB data file to perform query via sqlline as follows:We can query all employees by writing standard SQL:
While executing this query, the InnoDB adapter scans the InnoDB data file
EMP.ibdusing primary key, also known as clustering B+ tree index in MySQL, and is able topush down projection to underlying storage engine. Projection can reduce the size of data fetched from the storage engine.
We can look up one employee by filtering. The InnoDB adapter retrieves all indexes through DDL file provided in
model.json.The InnoDB adapter is able to recognize that
empnois the primary key and do a point-lookup by using clustering index instead of a full table scan.We can do range query on primary key as well.
Note that such query with acceptable range is usually efficient in MySQL with InnoDB storage engine, because for clustering B+ tree index, records close in index are close in data file, which is good for scanning.
We can look up employee by secondary key. For example, the filtering condition will be on a
VARCHARfieldename.The InnoDB adapter works well on almost all the commonly used data types in MySQL, for more information on supported data types, please refer to innodb-java-reader.
We can query by composite key. For example, given secondary index of
DEPTNO_MGR_KEY.The InnoDB adapter will leverage the matched key
DEPTNO_MGR_KEYto push down filtering condition ofdeptno = 20 and mgr = 7566.In some cases, only part of the conditions can be pushed down since there is a limitation in the underlying storage engine API, leaving unpushed remainder conditions in the rest of the plan. Given the below SQL, only
deptno = 20is pushed down.innodb-java-readeronly supports range query with lower and upper bound using an index, not fullyIndex Condition Pushdown (ICP). The storage engine returns a range of rows and Calcite will evaluates the rest ofWHEREcondition from the rows fetched.For the below SQL, there are multiple indexes satisfying the left-prefix index rule, the possible indexes are
DEPTNO_JOB_KEY,DEPTNO_SAL_COMM_KEYandDEPTNO_MGR_KEY, the Innod adapter will choose one of them according to the ordinal defined in DDL, onlydeptno = 20condition is pushed down, leaving the rest ofWHEREcondition handled by Calcite built-in execution engine.Accessing rows through secondary key requires scanning by secondary index and retrieving records back to clustering index in InnoDB, for a "big" scan, that would introduce many random I/O operations, so performance is usually not good enough. Note that the query above can be more performant by using
EPTNO_SAL_COMM_KEYindex, because covering index does not need to retrieve back to clustering index. We can force usingDEPTNO_SAL_COMM_KEYindex by hint as below.Hint can be configured in
SqlToRelConverter, to enable hint, you should registerindexHintStrategy onTableScaninSqlToRelConverter.ConfigBuilder. Index hint takes effect on the baseTableScanrelational node, if there are conditions matching the index, index condition can be pushed down as well. For the below SQL, although none of the indexes can be used, but by leveraging covering index, the performance is better than full table scan, we can force to useDEPTNO_MGR_KEYto scan in secondary index.Ordering can be pushed down if it matches the natural collation of the index used.
Limitations
innodb-java-readerhas some prerequisites for.ibdfiles, please refer to Prerequisites.You can think of the adapter as a simple MySQL server, with the ability to query, dump data by offloading from MySQL process under some conditions. If pages are not flushed from InnoDB Buffer Pool to disk, then the result may be inconsistent (the LSN in
.ibdfile might smaller than in-memory pages). InnoDB leverages write ahead log in terms of performance, so there is no command available to flush all dirty pages. Only internal mechanism manages when and where to persist pages to disk, like Page Cleaner thread, adaptive flushing, etc.Currently the InnoDB adapter does not aware row count and cardinality of a
.ibddata file, so it will only rely on simple rules to perform optimization, once underlying storage engine could provide such metrics and metadata, this can be integrated in Calcite by leveraging cost based optimization in the future.