Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

index_read_map(HA_READ_PREFIX_LAST) does not work in reverse CF #48

Closed
spetrunia opened this Issue · 4 comments

1 participant

@spetrunia
Collaborator

This is a continuation of issue #16. It is the last part.

Let's work with the same dataset:

create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1(a int);
insert into t1 select A.a + B.a* 10 + C.a * 100 from t0 A, t0 B, t0 C;

create table t2 (
  pk int not null,
  a  int not null,
  b  int not null,
  primary key(pk),
  key(a) comment 'rev:cf1'
) engine=rocksdb;
insert into t2 select A.a, FLOOR(A.a/10), A.a from t1 A;

Try this:

MySQL [j5]>  select * from t2 where a between 98 and 2000 order by a desc; 
Empty set (4.83 sec)

The correct result should be:

MySQL [j5]>  select * from t2 use index() where a between 98 and 2000 order by a desc; 
+-----+----+-----+
| pk  | a  | b   |
+-----+----+-----+
| 999 | 99 | 999 |
| 991 | 99 | 991 |
...
| 980 | 98 | 980 |
+-----+----+-----+
20 rows in set (0.00 sec)
@spetrunia spetrunia self-assigned this
@spetrunia spetrunia added this to the high-pri milestone
@spetrunia spetrunia added the bug label
@spetrunia
Collaborator

Adding tests for the fix. It is easy to hit HA_READ_PREFIX_LAST_OR_PREV - they are used by min/max optimizer, and by the range scan producing ORDER BY DESC.

It seems to be impossible to hit HA_READ_PREFIX_LAST: It is only used by Loose Index Scan, which is never used for MyRocks tables because of the following chain:

  • cost_group_min_max() will always return cur_read_cost=INF to get_best_group_min_max()
  • INF comes from tree_traversal_cost (other components have finite values)
  • tree_traversal_cost gets it from this
  const double tree_traversal_cost= 
    ceil(log(static_cast<double>(table_records))/
         log(static_cast<double>(keys_per_block))) * ROWID_COMPARE_COST; 

here, keys_per_block=1

  • the value 1 comes from here:
  keys_per_block= (table->file->stats.block_size / 2 /
                   (index_info->key_length + table->file->ref_length)
                        + 1);

Since table->file->stats.block_size=0, we can only get 1.

@spetrunia
Collaborator

Checking what other storage engvines set block_size to.
myisam: myisam_block_size= MI_KEY_BLOCK_LENGTH= 1024
innodb:

#define UNIV_PAGE_SIZE      ((ulint) srv_page_size)
srv_page_size=16384

Users of ha_statistics::block_size

  • handler::index_only_read_time (the same keys_per_block calculation)
  • cost_group_min_max
@spetrunia
Collaborator

Actually, there is a way to get HA_READ_PREFIX_LAST to be called. For many (or all?) engines, MyRocks included, h->index_read_last[_map()] will call index_read(HA_READ_PREFIX_LAST).

  #0  0x0000000000fe27ce in ha_rocksdb::index_read_map (this=0x7fffcc1e3110, buf=0x7fffcc1d6280 "\374\001", key=0x7fffcc4f5028 "\001", keypart_map=3, find_flag=HA_READ_PREFIX_LAST) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-splitthefix/storage/rocksdb/ha_rocksdb.cc:2891
  #1  0x0000000000fe2f22 in ha_rocksdb::index_read_last_map (this=0x7fffcc1e3110, buf=0x7fffcc1d6280 "\374\001", key=0x7fffcc4f5028 "\001", keypart_map=3) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-splitthefix/storage/rocksdb/ha_rocksdb.cc:2977
  #2  0x0000000000729ec7 in handler::ha_index_read_last_map (this=0x7fffcc1e3110, buf=0x7fffcc1d6280 "\374\001", key=0x7fffcc4f5028 "\001", keypart_map=3) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-splitthefix/sql/handler.cc:2939
  #3  0x00000000009e6012 in join_read_last_key (tab=0x7fffcc4f4870) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-splitthefix/sql/sql_executor.cc:2276
  #4  0x00000000009e1b6e in sub_select (join=0x7fffcc006660, join_tab=0x7fffcc4f4870, end_of_records=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-splitthefix/sql/sql_executor.cc:1294
  #5  0x00000000009e0b91 in do_select (join=0x7fffcc006660) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-splitthefix/sql/sql_executor.cc:950
  #6  0x00000000009dc928 in JOIN::exec (this=0x7fffcc006660) at /home/psergey/dev-git/mysql-5.6-rocksdb-issue16-splitthefix/sql/sql_executor.cc:207
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #48: index_read_map(HA_READ_PREFIX_LAST) does not work in rever…
…se CF

Summary:
Make ha_rocksdb::index_read_map() correctly handle find_flag values
HA_READ_PREFIX_LAST and HA_READ_PREFIX_LAST_OR_PREV.

Explanations how they should be handled are provided in
storage/rocksdb/rocksdb-range-access.txt

Test Plan:
mtr, mtr --gcov rocksdb_range, made sure that the new
code is all covered.

Reviewers: maykov, hermanlee4, jtolmer, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D35415
f394be7
@spetrunia spetrunia closed this
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #48: index_read_map(HA_READ_PREFIX_LAST) does not work in rever…
…se CF

Summary:
Make ha_rocksdb::index_read_map() correctly handle find_flag values
HA_READ_PREFIX_LAST and HA_READ_PREFIX_LAST_OR_PREV.

Explanations how they should be handled are provided in
storage/rocksdb/rocksdb-range-access.txt

Test Plan:
mtr, mtr --gcov rocksdb_range, made sure that the new
code is all covered.

Reviewers: maykov, hermanlee4, jtolmer, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D35415
f848227
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.