Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

SELECT ... FROM rocksdb_table ORDER BY primary_key uses sorting #33

Closed
spetrunia opened this Issue · 4 comments

1 participant

@spetrunia
Collaborator

Originally from discussion on facebook.com:

create table r0 (id int primary key, value int) engine=rocksdb;
insert into r0 values (1,1),(2,2),(3,3);
explain select * from r0 ORDER BY id;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | r0    | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+

Actually, the records are read in the PK order, so filesort is redundant.

Properties of RocksDB-SE table are very similar to properties of InnoDB table with primary key. With InnoDB table and the above example, one gets:

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
|  1 | SIMPLE      | r0    | index | NULL          | PRIMARY | 4       | NULL |    3 | NULL  |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
@spetrunia
Collaborator

Debugging both queries. The difference seems to start from here:

  #0  test_if_cheaper_ordering (tab=0x7fffa8013fd0, order=0x7fffa8005e28, table=0x7fffa8023690, usable_keys=..., ref_key=-1, select_limit=18446744073709551615, new_key=0x7fffec5095c8, new_key_direction=0x7fffec5095d4, new_select_limit=0x7fffec509598, new_used_key_parts=0x7fffec5095cc, saved_best_key_parts=0x7fffec5095d0) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-r10/sql/sql_select.cc:5506
  #1  0x00000000008b3283 in test_if_skip_sort_order (tab=0x7fffa8013fd0, order=0x7fffa8005e28, select_limit=18446744073709551615, no_changes=false, map=0x7fffa8023728, clause_type=0x11ddf34 "ORDER BY") at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-r10/sql/sql_select.cc:4096
  #2  0x0000000000a5dc8f in JOIN::optimize (this=0x7fffa8006710) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-r10/sql/sql_optimizer.cc:977
  #3  0x00000000008ac57a in mysql_execute_select (thd=0x1cff150, select_lex=0x1d02878, free_join=true) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-r10/sql/sql_select.cc:1119
  #4  0x00000000008ac8a9 in mysql_select (thd=0x1cff150, tables=0x7fffa8005768, wild_num=1, fields=..., conds=0x0, order=0x1d02a40, group=0x1d02978, having=0x0, select_options=2147748612, result=0x7fffa8005f10, unit=0x1d02228, select_lex=0x1d02878) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-r10/sql/sql_select.cc:1254
  #5  0x00000000009f6bae in mysql_explain_unit (thd=0x1cff150, unit=0x1d02228, 
...

Here,
the first suspect is the same for both tables:

(gdb) p table->covering_keys
  $10 = {map = 0}

the difference is here: InnoDB returns:

(gdb) p table->file->keys_to_use_for_scanning()
  $14 = (const key_map *) 0x19a94d8 <key_map_full>

RocksDB-SE returns:

(gdb) p (table)->file->keys_to_use_for_scanning()
  $11 = (const key_map *) 0x1bbe250 <key_map_empty>
@spetrunia spetrunia self-assigned this
@spetrunia
Collaborator

Looking at how various engines implement keys_to_use_for_scanning(). It looks like RocksDB-SE should follow InnoDB and return all keys.

@spetrunia
Collaborator

After keys_to_use_for_scanning() is fixed, I can proceed till I hit the second difference:

      bool is_covering= table->covering_keys.is_set(nr) ||
                        (nr == table->s->primary_key &&
                        table->file->primary_key_is_clustered());

Here,
table->covering_keys.is_set(nr)==FALSE for both engines

table->file->primary_key_is_clustered() is TRUE for InnoDB and FALSE for RocksDB-SE.

AFAIU RocksDB-SE meets all criteria that SQL layer has about storage engines that have primary_key_is_clustered()=TRUE.

@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #33: SELECT ... FROM rocksdb_table ORDER BY primary_key uses so…
…rting

Summary:
Fix the index properties returned by the storage engine:
- ha_rocksdb::primary_key_is_clustered()= true;
- ha_rocksdb::keys_to_use_for_scanning()= &key_map_full;

Test Plan: mtr

Reviewers: maykov, jonahcohen, hermanlee4, yoshinorim

Reviewed By: yoshinorim

Subscribers: jtolmer

Differential Revision: https://reviews.facebook.net/D33957
ae98cf0
@spetrunia spetrunia closed this
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #33: SELECT ... FROM rocksdb_table ORDER BY primary_key uses so…
…rting

Summary:
Fix the index properties returned by the storage engine:
- ha_rocksdb::primary_key_is_clustered()= true;
- ha_rocksdb::keys_to_use_for_scanning()= &key_map_full;

Test Plan: mtr

Reviewers: maykov, jonahcohen, hermanlee4, yoshinorim

Reviewed By: yoshinorim

Subscribers: jtolmer

Differential Revision: https://reviews.facebook.net/D33957
3d85af3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.