Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

Make MyRocksTablePropertiesCollector trigger range compaction if there are many delete-marked entries #71

Closed
yoshinorim opened this Issue · 5 comments

4 participants

@yoshinorim
Owner

LinkBench showed that some of (id1, link_type) pairs of id1_type index had huge number of delete-marked entries in sst files. This made point lookup with the (id1, link_type) much slower, because Next() needs to scan huge number of delete-marked keys. We need to optimize more so that delete-marked entries are compacted.

I'm thinking of making MyRocksTablePropertiesCollector trigger range compaction asynchronously, if some of the index prefixes had decent amount of delete-marked keys. MyRocksTablePropertiesCollector is called when new sst files are created, and it knows index definitions (key parts), so it would be possible to know which key ranges have many delete-marked keys. Then it would be easy to trigger CompactRange asynchronously. Siying suggested using an experimental API MarkForCompaction() for that (https://reviews.facebook.net/D37083).

@mdcallag
Owner

Will we be able to remove tombstones before they reach the max level? Looking at current compaction code and the call for KeyNotExistsBeyondOutputLevel my guess is that most tombstones are not dropped prior to compaction with the max level because the KeyNotExists check isn't exact. But compaction to the max level means we use much more IO and more write amp.
https://github.com/facebook/rocksdb/blob/master/db/compaction_job.cc#L763

@maykov maykov was assigned by yoshinorim
@siying
Collaborator

SuggestCompactRange() is the function to use.

@mdcallag these special range compactions will also drop sizes from source levels and have the same effects as normal compactions. So it's not totally extra costs to automatic compactions. L0->L1 compactions may be triggered more frequently, which would be extra costs.

@mdcallag
Owner

If they drop tombstones like normal compactions then my guess is they are unlikely to drop tombstones prior to compacting into the max level.

@siying
Collaborator

@mdcallag strictly speaking you are right. But in reality every levels' size would be always larger or slightly smaller than level target. So doing an extra compaction each level (assuming it's a small percentage of the level) shouldn't change the LSM dynamics much.

@yoshinorim
Owner

Here is a reproducible test case.

  • Start my.cnf as follows. Smaller write buf size and sst file size, and no compression so that compactions happen easily.
    loose_rocksdb_default_cf_options=write_buffer_size=64k;target_file_size_base=64k;max_bytes_for_level_base=512k;compression_per_level=kNoCompression"

  • Create two tables, primary key and secondary key, and dedicated CF for the secondary key

create table r1 (
 id1 int,
 id2 int,
 type int,
 value varchar(100),
 value2 int,
 value3 int,
 primary key (type, id1, id2),
 index id1_type (id1, type, value2, value, id2) COMMENT 'cf1'
) engine=rocksdb collate latin1_bin;

create table r2 like r1;
  • Insert 50,000 rows and compact these tables. Make sure compact r1 before r2.
# generate 50,000 rows like below.
#!/usr/bin/perl

for(my $i= 1; $i <= 100000; $i++) {
  my $value = 'x' x 50;
  print "$i,$i,$i,$value,$i,$i\n";
}

Then


load data local infile 'foo' into table r1 fields terminated by ',';
optimize table r1;

load data local infile 'foo' into table r2 fields terminated by ',';
optimize table r2;
  • Update a secondary index of r1 table 10,000 times. You can generate update statements like this.
#!/usr/bin/perl
for(my $i= 1; $i <= 10000; $i++) {
  print "update r1 set value2=value2+1 where id1=500;\n";
}

Then

mysql> source 'bar'
  • Parse all sst files with rocksdb/sst_dump. You can find that some sst files had mostly deleted entries. Example:
for f in `ls /data/mysql/3306/data/.rocksdb/*.sst`
do
DELETED=`./sst_dump --command=scan --output_hex --file=$f | grep " : 0" | wc -l`
EXISTS=`./sst_dump --command=scan --output_hex --file=$f | grep " : 1" | wc -l`
echo "$f $DELETED $EXISTS"
done

=>
/data/mysql/3306/data/.rocksdb/000651.sst 289 1

Our goal will eventually eliminate these files.

@yoshinorim yoshinorim closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.