Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

Inefficient index condition pushdown #67

Closed
yoshinorim opened this Issue · 5 comments

2 participants

@yoshinorim
Owner

Here is a test case to repeat inefficient index condition push down.

CREATE TABLE w (
id bigint(10) NOT NULL AUTO_INCREMENT,
c1 bigint(20) NOT NULL DEFAULT '0',
c2 bigint(20) NOT NULL DEFAULT '0',
c3 int(1) NOT NULL DEFAULT '0',
c4 int(10) NOT NULL DEFAULT '0',
c5 text CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
c6 int(11) NOT NULL DEFAULT '0',
c7 tinyint(1) unsigned NOT NULL DEFAULT '0',
c8 varchar(40) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
c9 bigint(20) NOT NULL DEFAULT '0',
c10 bigint(20) NOT NULL DEFAULT '0',
c11 bigint(20) NOT NULL DEFAULT '0',
c12 text CHARACTER SET latin1 COLLATE latin1_bin,
c13 int(10) unsigned DEFAULT NULL,
c14 bigint(20) unsigned DEFAULT NULL,
c15 int(10) unsigned DEFAULT '0',
PRIMARY KEY (id) COMMENT 'cf_default',
KEY i1 (c2,c1,c6) COMMENT 'cf_default',
KEY i2 (c8) COMMENT 'cf_default',
KEY i3 (c14) COMMENT 'cf_default',
KEY i4 (c1,id) COMMENT 'cf_default',
KEY i5 (c1,c6) COMMENT 'cf_default'
) ENGINE=ROCKSDB;

Loading 10 million rows
for(my $i= 1; $i < 1000*10000; $i++) {
print "$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i,$i\n";
}

load data local infile '/tmp/10m.dat' into table w fields terminated by ',';
optimize table w;

mysql> explain select * from w where c14=1;
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
| 1 | SIMPLE | w | ref | i3 | i3 | 9 | const | 10 | Using index condition |
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
1 row in set (0.00 sec)

mysql> select * from w where c14=1;
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 |
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
1 row in set (2.37 sec) <-- this is too long.

If not using index condition pushdown, query finished quickly.
mysql> select id from w where c14=1;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)
mysql> select * from w where id=1;
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 |
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
+----+----+----+----+----+----+----+----+------+----+-----+-----+------+------+------+------+
1 row in set (0.00 sec)

@spetrunia
Collaborator

Debugging, I see these calls:

 Breakpoint 2, ha_rocksdb::index_read_map (this=0x7fff28008640, buf=0x7fff28008b20 "", key=0x7fff28036150 "", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:2835
 ... 
  Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111

  Breakpoint 1, ha_rocksdb::index_next (this=0x7fff28008640, buf=0x7fff28008b20 "") at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:3487

  Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
  Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
  Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
  Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
  Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
  Breakpoint 3, ha_rocksdb::check_index_cond (this=0x7fff28008640) at /home/psergey/dev-git/mysql-5.6-rocksdb-look-4/storage/rocksdb/ha_rocksdb.cc:5111
....

it looks like ICP code attempts to check the index condition after walking out of the range, and ends up scanning the index until the end of the table.

ICP code has logic to prevent this, but it is not working for some reason.

@spetrunia
Collaborator

The mentioned logic is here in ha_rocksdb::check_index_cond:

  if (end_range && compare_key_icp(end_range) > 0)
  {
    /* caller should return HA_ERR_END_OF_FILE already */

The problem is, for equality lookups, end_range=NULL and the logic is not working.

Looking at how it is done in InnoDB: row_search_for_mysql() has this piece:

    if (match_mode == ROW_SEL_EXACT) {
        /* Test if the index record matches completely to search_tuple
        in prebuilt: if not, then we return with DB_RECORD_NOT_FOUND */

        /* fputs("Comparing rec and search tuple\n", stderr); */

        if (0 != cmp_dtuple_rec(search_tuple, rec, offsets)) {
...
            err = DB_RECORD_NOT_FOUND;
...
            goto normal_return;

They also have a similar block for handling match_mode == ROW_SEL_EXACT_PREFIX.

ROW_SEL_.... are innodb's internal constants that are assigned like so:

    if (find_flag == HA_READ_KEY_EXACT) {

        match_mode = ROW_SEL_EXACT;

    } else if (find_flag == HA_READ_PREFIX
           || find_flag == HA_READ_PREFIX_LAST) {

        match_mode = ROW_SEL_EXACT_PREFIX;
    }
@spetrunia
Collaborator

https://gist.github.com/spetrunia/a16d337608e1f30095ca -- the patch is here. What is currently missing is the testcases. I'll have to revisit the problem of counters not counting index scan accesses.

@spetrunia
Collaborator

Storage API and its implementation are weird. The API for index_next_same passes key/keylen:

  virtual int index_next_same(uchar *buf, const uchar *key, uint keylen);

However, InnoDB chooses to ignore it and reuse the lookup tuple it has saved in ::index_read_map :

ha_innobase::index_next_same(
/*=========================*/
    uchar*      buf,    /*!< in/out: buffer for the row */
    const uchar*    key,    /*!< in: key value */
    uint        keylen) /*!< in: key value length */
{
    ha_statistic_increment(&SSV::ha_read_next_count);

    return(general_fetch(buf, ROW_SEL_NEXT, last_match_mode));
}

note how key/key_len are ignored.

My patch follows InnoDB's approach.

@spetrunia
Collaborator

Found a way to add a testcase. The patch is here: https://reviews.facebook.net/D38769

@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #67: Inefficient index condition pushdown
Summary:
Inside index_next_same() call, we should
1. first check whether the record matches the index
   lookup prefix,
2. then check pushed index condition.

If we try to check #2 without checking #1 first, we may walk
off the index lookup prefix and scan till the end of the index.

Test Plan: Run mtr

Reviewers: hermanlee4, maykov, jtolmer, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D38769
0fe186f
@spetrunia spetrunia closed this
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #67: Inefficient index condition pushdown
Summary:
Inside index_next_same() call, we should
1. first check whether the record matches the index
   lookup prefix,
2. then check pushed index condition.

If we try to check #2 without checking #1 first, we may walk
off the index lookup prefix and scan till the end of the index.

Test Plan: Run mtr

Reviewers: hermanlee4, maykov, jtolmer, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D38769
32c3ac8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.