Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

MyRocks data size is greater than InnoDB #80

Closed
BohuTANG opened this Issue · 11 comments

5 participants

@BohuTANG

From our benchmarks under the same datasets for MyRocks/InnoDB/TokuDB, data sizes are:

MyRocks: 43GB ( the ./rocksdb dir)
InnoDB:   33GB (without compress)
TokuDB:  15GB (compression is zlib, and compress-ratio is 2, so the raw abous 30GB)

All configuration of MyRocks is in defaults, the 'show engine rocksdb status' as follows:

mysql> show engine rocksdb status\G;
*************************** 1. row ***************************
  Type: DBSTATS
  Name: rocksdb
Status: 
** DB Stats **
Uptime(secs): 79985.6 total, 1704.4 interval
Cumulative writes: 54K writes, 280M keys, 54K batches, 1.0 writes per batch, ingest: 27.78 GB, 0.36 MB/s
Cumulative WAL: 54K writes, 54K syncs, 1.00 writes per sync, written: 27.78 GB, 0.36 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 0 writes, 0 keys, 0 batches, 0.0 writes per batch, ingest: 0.00 MB, 0.00 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent

*************************** 2. row ***************************
  Type: CF_COMPACTION
  Name: __system__
Status: 
** Compaction Stats [__system__] **
Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)  KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      3.3         0       222    0.002          0       0      0
  L1      1/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.9     12.8      6.1         0        56    0.004          0    110K   110K
 Sum      1/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.9      4.2      4.2         1       278    0.002          0    110K   110K
 Int      0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000          0       0      0
Flush(GB): cumulative 0.002, interval 0.000
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

*************************** 3. row ***************************
  Type: CF_COMPACTION
  Name: default
Status: 
** Compaction Stats [default] **
Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)  KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      2/0          1   0.5      0.0     0.0      0.0      27.7     27.7       0.0   0.0      0.0     62.3       456      7841    0.058          0       0      0
  L1      8/0          8   0.8     30.8    27.8      3.0      30.7     27.7       0.0   1.1     41.8     41.7       753      3239    0.233          0    280M      0
  L2     68/0         99   1.0      0.0     0.0      0.0       0.0      0.0      27.7   0.0      0.0      0.0         0         0    0.000          0       0      0
  L3    543/0        998   1.0      0.0     0.0      0.0       0.0      0.0      27.7   0.0      0.0      0.0         0         0    0.000          0       0      0
  L4   5078/0       9998   1.0      4.7     3.2      1.5       4.5      3.1      24.5   1.4     30.1     29.3       158       786    0.202          0     57M   669K
  L5  15910/0      31843   0.3     45.9    24.3     21.5      45.3     23.7       3.3   1.9     19.4     19.1      2427      4600    0.528          0    252M  1024K
 Sum  21609/0      42947   0.0     81.3    55.3     26.0     108.2     82.2      83.1   3.9     21.9     29.2      3794     16466    0.230          0    589M  1693K
 Int      0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000          0       0      0
Flush(GB): cumulative 27.749, interval 0.000
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

3 rows in set (0.00 sec)

and

mysql> select * from ROCKSDB_CF_OPTIONS where value like '%snappy%'\G;
*************************** 1. row ***************************
    CF_NAME: __system__
OPTION_TYPE: COMPRESSION_TYPE
      VALUE: kSnappyCompression
*************************** 2. row ***************************
    CF_NAME: default
OPTION_TYPE: COMPRESSION_TYPE
      VALUE: kSnappyCompression
2 rows in set (0.00 sec)

ERROR: 
No query specified
@mdcallag
Owner

Excellent, can you also send me the values of any rocksdb config options set in my.cnf? Will take me a few hours to respond.

@BohuTANG

There is no rocksdb configuration in my.cnf, so all is default.

@mdcallag
Owner

Can you tell me what is in your RocksDB LOG file (name "LOG") for "Compression algorithms supported"? Mine shows:

2015/06/04-04:33:23.366528 7faff86d38c0 Compression algorithms supported:
2015/06/04-04:33:23.366530 7faff86d38c0 Snappy supported: 1
2015/06/04-04:33:23.366531 7faff86d38c0 Zlib supported: 1
2015/06/04-04:33:23.366540 7faff86d38c0 Bzip supported: 1
2015/06/04-04:33:23.366542 7faff86d38c0 LZ4 supported: 1

@yoshinorim
Owner

Could you try the following my.cnf settings and share results?
This makes MyRocks use zlib level 2 compression for most levels (compression_per_level and compression_opts), and locates files more efficiently (level_compaction_dynamic_level_bytes). By default RocksDB block size is 4KB, and increasing to 16KB will reduce some space.

rocksdb_block_size=16384
rocksdb_max_total_wal_size=4096000000
rocksdb_block_cache_size=12G
rocksdb_default_cf_options=write_buffer_size=128m;target_file_size_base=32m;max_bytes_for_level_base=512m;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;max_write_buffer_number=4;compression_per_level=kNoCompression:kNoCompression:kSnappyCompression:kZlibCompression:kZlibCompression:kZlibCompression:kZlibCompression;compression_opts=-14:2:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0;};prefix_extractor=capped:20;level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true
@mdcallag
Owner

Yoshi - before trying to tune we need to confirm that compression was enabled during his RocksDB build. Then we can tune. MyRocks has a lousy RocksDB configuration and this issue can be kept open for that. Using my test server the defaults are:

Started an instance locally with default my.cnf:
Options.max_open_files: 5000
Options.max_background_compactions: 1
Options.max_background_flushes: 1
--> max_open_files should be larger, max_background_compactions and max_background_flushes should be >= 4 for many systems

Compression algorithms supported:
Snappy supported: 1
Zlib supported: 1
Bzip supported: 1
LZ4 supported: 1

cache_index_and_filter_blocks: 1
index_type: 0
hash_index_allow_collision: 1
checksum: 1
no_block_cache: 0
block_cache: 0x7faff4c88078
block_cache_size: 8388608
block_cache_compressed: (nil)
block_size: 4096
block_size_deviation: 10
block_restart_interval: 16
filter_policy: nullptr
format_version: 2
--> block_size should be larger, since many blocks will be compressed and 4kb compressed to much less than 4kb can waste IO when file system page size is 4k

Options.write_buffer_size: 4194304
Options.max_write_buffer_number: 2
Options.compression: Snappy
Options.num_levels: 7
--> should use a larger value for write_buffer_size, default is 4M, maybe 64M

Options.min_write_buffer_number_to_merge: 1
--> probably OK

Options.level0_file_num_compaction_trigger: 4
Options.level0_slowdown_writes_trigger: 20

Options.level0_stop_writes_trigger: 24
--> probably OK

Options.target_file_size_base: 2097152
Options.max_bytes_for_level_base: 10485760
--> ugh, maybe 32MB for target_file_size_base and 512MB for max_bytes_for_level_base. Default here means that sizeof(L0) is 10M

Options.level_compaction_dynamic_level_bytes: 0
--> we want this to be 1

Options.soft_rate_limit: 0.00
Options.hard_rate_limit: 0.00
--> want these to be set, maybe 2.5 for soft and 3.0 for hard

@BohuTANG

This issue due to: *.sst not properly cleaned on DROP DATABASE.
I clean the .rocksdb dir and re-install database and to the same benchmark with yoshinorim's configurations, it's OK now to me:
datasize: 19GB (snappy)

one 33MB sst dump:

$./sst_dump --show_properties --file=../../myrocks_mysql/data/.rocksdb/002120.sst
from [] to []
Process ../../myrocks_mysql/data/.rocksdb/002120.sst
Sst file format: block-based
Table Properties:
------------------------------
  # data blocks: 3840
  # entries: 310960
  raw key size: 4975360
  raw average key size: 16.000000
  raw value size: 57838560
  raw average value size: 186.000000
  data block size: 33559362
  index block size: 126490
  filter block size: 0
  (estimated) table size: 33685852
  filter policy name: rocksdb.BuiltinBloomFilter
  # deleted keys: 0

(4975360+57838560)/33559362 ~1.8X ratio

Another question: how to find the mapping between table and *.sst files?

@mdcallag
Owner
@yoshinorim
Owner

We have not started implementing mappings between table and *.sst files yet. I'll file another task to track this.

@hermanlee
Owner

We can also dump out some of the rocksdb configuration options out through information schema, rather than have to look through the rocksdb LOG file:

select * from information_schema.rocksdb_cf_options;

The db options are mostly available through:

show global variables like 'rocksdb%';

@yoshinorim
Owner

@BohuTANG : BTW you can get MyRocks each table size via usual MySQL commands (SHOW TABLE STATUS or SELECT FROM information_schema.tables). Use these commands and compare compression ratio between tables. MyRocks calculates statistics every 600 seconds, and can be configured via rocksdb_stats_dump_period_sec global variable. And note that SHOW TABLE STATUS / I_S do not include size in Memstore (we're working in progress to include size from Memstore, not only from *sst).

@maykov
Owner

BohuTANG, optimize table t1; will run manual compaction for the table. However, if you already dropped the table, I can't think of an easy way to trigger compaction. One thing which you can do is to stop mysql and then use the ldb tool to run compaction. If the space is important than the deletion speed, may be you can do truncate table t1;optimize table t1;drop table t1; after https://reviews.facebook.net/D39579 is pushed.

There is no correspondence between .sst files and tables or databases. The data is spread out among sst files in the order of insertion and then intermixed through compaction process.

Yoshi, I have this task: #55 to expose what is stored in each sst files through the information schema.

@hermanlee hermanlee referenced this issue in facebook/mysql-5.6
Open

MyRocks data size is greater than InnoDB #44

@hermanlee hermanlee closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.