Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

Range scan with equal predicates returns wrong results with bloom filter #109

Closed
yoshinorim opened this Issue · 7 comments

3 participants

@yoshinorim
Owner

Suppose there is a multi-column index on (id1, id2), and selecting rows by "id1=1 AND id2 >= 2". Both id1 and id2 are BIGINT, and prefix bloom filter length was defined as 20. In this case, bloom filter shouldn't be used, but actually it's used -- and filtering rule is (id1=1 and id2=2). Any file that doesn't contain "id2=2" is potentially filtered out by the bloom filter, and query results end up missing rows.

Herman found the bug. Here is a test case.

$ cat mysql-test/suite/rocksdb/t/bloomfilter2-master.opt
--rocksdb_default_cf_options=write_buffer_size=64k;block_based_table_factory={filter_policy=bloomfilter:10:true;whole_key_filtering=0;};prefix_extractor=capped:24

$ cat mysql-test/suite/rocksdb/t/bloomfilter2.test
--source include/have_rocksdb.inc

CREATE TABLE t1 (id1 BIGINT, id2 INT, id3 BIGINT, value INT, PRIMARY KEY (id1, id2, id3)) ENGINE=rocksdb;

--disable_query_log
let $i = 1;
while ($i <= 10000) {
  let $insert = INSERT INTO t1 VALUES(1, 1, $i, $i);
  inc $i;
  eval $insert;
}
--enable_query_log

SELECT COUNT(*) FROM t1 WHERE id1=1 AND id2=1 AND id3=1;
SELECT COUNT(*) FROM t1 WHERE id1=1 AND id2=1 AND id3>=2;
SELECT COUNT(*) FROM t1 WHERE id1=1 AND id2>=1 AND id3>=2;

DROP TABLE t1;

$ cat mysql-test/suite/rocksdb/r/bloomfilter2.result
CREATE TABLE t1 (id1 BIGINT, id2 INT, id3 BIGINT, value INT, PRIMARY KEY (id1, id2, id3)) ENGINE=rocksdb;
SELECT COUNT(*) FROM t1 WHERE id1=1 AND id2=1 AND id3=1;
COUNT(*)
1
SELECT COUNT(*) FROM t1 WHERE id1=1 AND id2=1 AND id3>=2;
COUNT(*)
9999
SELECT COUNT(*) FROM t1 WHERE id1=1 AND id2>=1 AND id3>=2;
COUNT(*)
9999
DROP TABLE t1;

In my environment, both SELECT returned 6600 rows, not 9999 rows.

can_use_bloom_filter() was called. It should have returned false, but it returned true. eq_cond.size was 24 bytes in both select (-- should have been 16 and 12).

The problem was eq_cond contained both "=" and ">=" condition columns. eq_cond had to contain only "=" condition columns. Call stack was handler::multi_range_read_next => handler::read_range_first => ha_rocksdb::index_read_map => ha_rocksdb::setup_index_scan => can_use_bloom_filter. It seemed like "=" condition was included in mrr_cur_range->end_key (handler::multi_range_read_next), but the variable seems not passed to ha_rocksdb::index_read_map. I'm not sure if there is any way to pass "=" column values without modifying storage engine api.

@yoshinorim yoshinorim added the bug label
@yoshinorim yoshinorim changed the title from Bloom filter works wrong with range scan with equal predicates to Range scan with equal predicates returns wrong results with bloom filter
@yoshinorim
Owner

@spetrunia In the above example, do you know if there is any way to know the length of equal conditions at ha_rocksdb::index_read_map()? ("id1=1 AND id2=1 AND id3>=2" => equal cond length is 4 + 8 + 4=16 (index_id + 8 byte id1 + 4 byte id2), and "id1=1 AND id2>=1 AND id3>=2" => equal cond length is 4+8=12)

@spetrunia spetrunia self-assigned this
@spetrunia
Collaborator

One can't see the end of the range inside ha_rocksdb::index_read_map() call.

We need to overload handler::read_range_first(). It sees both ends of the range, so we can compare both endpoints and find the length of the common prefix.

@spetrunia spetrunia removed their assignment
@yoshinorim yoshinorim self-assigned this
@yoshinorim
Owner

Thanks @spetrunia! I'm going to fix it.

@rven1
Collaborator
@yoshinorim
Owner

Handling IN clause is also interesting.

example: select * from r1 where id1=1 and id2 in (1,2,3);

read_range_first() is called three times, and both start_key and end_key matched on both id1 and id2. can_use_bloom_filter() is called only once at ha_rocksdb::setup_index_scan(), because Iterator is reused in this statement. MyRocks needs either of the following I think.

  1. Do not include id2 for bloom filter. Only (index_id + id1=1) should be used. The problem is this is less efficient, and I'm not sure how to get that from read_range_first().

  2. Decide if bloom filter can be used or not per each setup_index_scan().

@yoshinorim
Owner

https://reviews.facebook.net/D46851

@spetrunia Could you please review this diff? This diff calls kd->pack_index_tuple() twice (for end key and start key), and I'm still not 100% confident that it is safe.

    end_key_packed_size= kd->pack_index_tuple(table, pack_buffer,
                                              end_key_packed_tuple, end_key->key,
                                              end_key->keypart_map);

    packed_size= kd->pack_index_tuple(table, pack_buffer,
                                      sec_key_packed_tuple, key,
                                      keypart_map);
@spetrunia
Collaborator

@yoshinorim, reviewed, ok to push. pack_index_tuple modifies table->record0, but this should be ok because the column values will be decoded back from the storage format when the record is read.

@yoshinorim yoshinorim closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.