MySQLOnRocksDB/mysql-5.6
forked from facebook/mysql-5.6

Loading…
Inefficient index condition pushdown #67
Debugging, I see these calls:
Breakpoint 2, ha_rocksdb::index_read_map (this=0x7fff28008640, buf=0x7fff28008b20 "", key=0x7fff28036150 "", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:2835
...
Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
Breakpoint 1, ha_rocksdb::index_next (this=0x7fff28008640, buf=0x7fff28008b20 "") at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:3487
Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
....
it looks like ICP code attempts to check the index condition after walking out of the range, and ends up scanning the index until the end of the table.
ICP code has logic to prevent this, but it is not working for some reason.
The mentioned logic is here in ha_rocksdb::check_index_cond:
if (end_range && compare_key_icp(end_range) > 0)
{
/* caller should return HA_ERR_END_OF_FILE already */
The problem is, for equality lookups, end_range=NULL and the logic is not working.
Looking at how it is done in InnoDB: row_search_for_mysql() has this piece:
if (match_mode == ROW_SEL_EXACT) {
/* Test if the index record matches completely to search_tuple
in prebuilt: if not, then we return with DB_RECORD_NOT_FOUND */
/* fputs("Comparing rec and search tuple\n", stderr); */
if (0 != cmp_dtuple_rec(search_tuple, rec, offsets)) {
...
err = DB_RECORD_NOT_FOUND;
...
goto normal_return;
They also have a similar block for handling match_mode == ROW_SEL_EXACT_PREFIX.
ROW_SEL_.... are innodb's internal constants that are assigned like so:
if (find_flag == HA_READ_KEY_EXACT) {
match_mode = ROW_SEL_EXACT;
} else if (find_flag == HA_READ_PREFIX
|| find_flag == HA_READ_PREFIX_LAST) {
match_mode = ROW_SEL_EXACT_PREFIX;
}
https://gist.github.com/spetrunia/a16d337608e1f30095ca -- the patch is here. What is currently missing is the testcases. I'll have to revisit the problem of counters not counting index scan accesses.
Storage API and its implementation are weird. The API for index_next_same passes key/keylen:
virtual int index_next_same(uchar *buf, const uchar *key, uint keylen);
However, InnoDB chooses to ignore it and reuse the lookup tuple it has saved in ::index_read_map :
ha_innobase::index_next_same(
/*=========================*/
uchar* buf, /*!< in/out: buffer for the row */
const uchar* key, /*!< in: key value */
uint keylen) /*!< in: key value length */
{
ha_statistic_increment(&SSV::ha_read_next_count);
return(general_fetch(buf, ROW_SEL_NEXT, last_match_mode));
}
note how key/key_len are ignored.
My patch follows InnoDB's approach.
Found a way to add a testcase. The patch is here: https://reviews.facebook.net/D38769
|
|
spetrunia |
Issue #67: Inefficient index condition pushdown
…
Summary: Inside index_next_same() call, we should 1. first check whether the record matches the index lookup prefix, 2. then check pushed index condition. If we try to check #2 without checking #1 first, we may walk off the index lookup prefix and scan till the end of the index. Test Plan: Run mtr Reviewers: hermanlee4, maykov, jtolmer, yoshinorim Reviewed By: yoshinorim Differential Revision: https://reviews.facebook.net/D38769 |
0fe186f
|
|
|
spetrunia |
Issue #67: Inefficient index condition pushdown
…
Summary: Inside index_next_same() call, we should 1. first check whether the record matches the index lookup prefix, 2. then check pushed index condition. If we try to check #2 without checking #1 first, we may walk off the index lookup prefix and scan till the end of the index. Test Plan: Run mtr Reviewers: hermanlee4, maykov, jtolmer, yoshinorim Reviewed By: yoshinorim Differential Revision: https://reviews.facebook.net/D38769 |
32c3ac8
|
Here is a test case to repeat inefficient index condition push down.
CREATE TABLE w (
idbigint(10) NOT NULL AUTO_INCREMENT,c1bigint(20) NOT NULL DEFAULT '0',c2bigint(20) NOT NULL DEFAULT '0',c3int(1) NOT NULL DEFAULT '0',c4int(10) NOT NULL DEFAULT '0',c5text CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,c6int(11) NOT NULL DEFAULT '0',c7tinyint(1) unsigned NOT NULL DEFAULT '0',c8varchar(40) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,c9bigint(20) NOT NULL DEFAULT '0',c10bigint(20) NOT NULL DEFAULT '0',c11bigint(20) NOT NULL DEFAULT '0',c12text CHARACTER SET latin1 COLLATE latin1_bin,c13int(10) unsigned DEFAULT NULL,c14bigint(20) unsigned DEFAULT NULL,c15int(10) unsigned DEFAULT '0',PRIMARY KEY (
id) COMMENT 'cf_default',KEY
i1(c2,c1,c6) COMMENT 'cf_default',KEY
i2(c8) COMMENT 'cf_default',KEY
i3(c14) COMMENT 'cf_default',KEY
i4(c1,id) COMMENT 'cf_default',KEY
i5(c1,c6) COMMENT 'cf_default') ENGINE=ROCKSDB;
Loading 10 million rows
for(my $i= 1; $i < 1000*10000; $i++) {
print "$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i\n";
}
load data local infile '/tmp/10m.dat' into table w fields terminated by ',';
optimize table w;
mysql> explain select * from w where c14=1;
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
| 1 | SIMPLE | w | ref | i3 | i3 | 9 | const | 10 | Using index condition |
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
1 row in set (0.00 sec)
mysql> select * from w where c14=1;
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 |
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
1 row in set (2.37 sec) <-- this is too long.
If not using index condition pushdown, query finished quickly.
mysql> select id from w where c14=1;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)
mysql> select * from w where id=1;
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 |
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
1 row in set (0.00 sec)