Support Value Handler in KV separation #199

yapple · 2021-12-31T14:46:00Z

[Enhancement]

Problem

If we support the value handler in KV separation, the cost of the blob file's index block is expected to be reduced.

Solution

support read path when the value handler is invalid
support update the value handler during compaction
Ensure the read performance at the bad case is not worse than now

skyzh · 2021-12-31T16:18:52Z

This is really interesting!

… will benifit from value handle bytedance#199

yapple · 2022-01-06T13:38:47Z

I have run readrandomwriterandom benchmark using db_bench to statistics the read blob info
the db_bench command like this

common='--benchmarks=readrandomwriterandom --use_terark_table=false --statistics=true --threads=20 --enable_lazy_compaction=false --num=167772160 --key_size=8 --value_size=1000 --duration=36000 --use_existing_db=true --histogram=true --cache_size=4294967296'
./db_bench $common --readwritepercent=10 --db=/data02/wangyi/11_4

after the bench run one hour, some key statistic info as below

rocksdb.num.read.blob_valid COUNT : 3165404
rocksdb.num.read.blob_invalid COUNT : 258019
2022/01/06-21:16:42.632655 7f090c3fe640 (Original Log Time 2022/01/06-21:16:42.632127) EVENT_LOG_v1 {"time_micros": 1641475002632109, "job": 782, "cf_name": "default", "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [9, 1, 3, 9, 10, 0, 0], "edge_state": [9, 21, 1383, 6818, 2535, 0, 0], "blob_count": 1776, "invalid_file_cnt": 405, "valid_file_cnt": 1734, "invalid_entry_cnt": 6155508, "valid_entry_cnt": 110549454, "immutable_memtables": 1}

read.blob_invalid represents the invalid count of the readrandomwriterandom workload query the blob file to fetch the value, but the file number is not the newest

read.blob_invalid	read.blob_valid	invalid ratio
258019	3165404	7.53%

invalid_file_cnt represents the invalid count of blob file stored in the SST's key-value which is not the newest

blob_cnt	invalid_file_cnt	valid_file_cnt	invalid ratio
1776	405	1734	22.8%

invalid_entry_cnt represents the invalid count of entry in the SST's key-value which is not the newest

invalid_entry_cnt	valid_entry_cnt	invalid ratio
6155508	110549454	5.27%

so, most of the queries needn't query the blob file's index block

yapple · 2022-01-06T13:41:57Z

2022/01/06-20:30:33.826999 7f090c3fe640 (Original Log Time 2022/01/06-20:30:33.826487) EVENT_LOG_v1 {"time_micros": 1641472233826469, "job": 14, "cf_name": "default", "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [2, 1, 3, 9, 9, 0, 0], "edge_state": [2, 232, 1499, 5244, 660, 0, 0], "blob_count": 1558, "invalid_file_cnt": 99, "valid_file_cnt": 1545, "invalid_entry_cnt": 2046902, "valid_entry_cnt": 114094763, "immutable_memtables": 1}

from 2022/01/06-21:16:42.632655 to 2022/01/06-20:30:33.826999 the invalid_file_cnt form 405 decreases to 99 as the LSM-tree's compaction

blob_cnt	invalid_file_cnt	valid_file_cnt	invalid ratio
1558	99	1545	6.35%

So, the cost of the index block of blob files will be reduced to the invalid ratio of invalid_file_cnt.

yapple · 2022-01-07T01:50:18Z

2022/01/07-06:26:46.446339 7f090c3fe640 (Original Log Time 2022/01/07-06:26:46.445294) EVENT_LOG_v1 {"time_micros": 1641508006445274, "job": 6876, "cf_name": "default", "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [21, 1, 3, 27, 13, 0, 0], "edge_state": [21, 21, 1535, 52059, 4912, 0, 0], "blob_count": 3272, "invalid_
file_cnt": 3248, "valid_file_cnt": 3075, "invalid_entry_cnt": 28257342, "valid_entry_cnt": 179659877, "immutable_memtables": 0}

after the bench run 10 hours, we can see the blob info of the engine, the invalid_
file_cnt from 99 increases to 3248.

blob_cnt	invalid_file_cnt	valid_file_cnt	invalid ratio
3272	3248	3075	99.26%

the invalid_file_cnt ratio is increasing to 99.26% in the heavy write workload

invalid_entry_cnt	valid_entry_cnt	invalid ratio
28257342	179659877	13.59%

the invalid_entry_cnt ratio is increasing to 13.59%, still considerd a low level

yapple · 2022-01-07T03:08:40Z

I fixed the invalid_file_cnt statistic logic like that

void VersionStorageInfo::CalculateBlobInfo() {
  valid_file_cnt_ = 0;
  valid_entry_cnt_ = 0;
  invalid_file_cnt_ = 0;
  invalid_entry_cnt_ = 0;
  std::unordered_map<uint64_t, uint64_t> file_map;
  std::unordered_set<uint64_t> invalid_file_set;
  for (int i = 0; i < num_levels_; i++) {
    for (auto& f : LevelFiles(i)) {
      for (auto fn : f->prop.dependence) {
        file_map[fn.file_number] += fn.entry_count;
      }
    }
  }
  for (auto f : file_map) {
    if (dependence_map_.find(f.first)->second->fd.GetNumber() != f.first) {
      invalid_file_set.insert(
          dependence_map_.find(f.first)->second->fd.GetNumber());
      invalid_entry_cnt_ += f.second;
    } else {
      valid_file_cnt_++;
      valid_entry_cnt_ += f.second;
    }
  }
  invalid_file_cnt_ = invalid_file_set.size();
}

so, the invalid_fiule_cnt is the real blob file index block needs to be loaded.
we rerun the bench and get the engine status as below

2022/01/07-11:06:13.504100 7fc0eadfe640 (Original Log Time 2022/01/07-11:06:13.503938) EVENT_LOG_v1 {"time_micros": 1641524773503912, "job": 39, "cf_name": "default", "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [6, 1, 3, 25, 14, 0, 0], "edge_state": [6, 20, 2386, 51141, 6540, 0, 0], "blob_count": 3232, "invalid_file_cnt": 826, "valid_file_cnt": 3102, "invalid_entry_cnt": 25118108, "valid_entry_cnt": 175587901, "immutable_memtables": 0}

blob_cnt	invalid_file_cnt	valid_file_cnt	invalid ratio
3232	826	3102	25.55%

so, we just need load the quarter of blob index block as before

… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199

… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199

… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199 code format and add some assert

… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199 code format and add some assert code defend

… will benifit from value handle #199 add tickers of read blob valid #199 fix invalid_file_cnt statistic #199 fix invalid_file_cnt_ statistic #199 refine code style #199 refactor LOG lsm_state #199 code format and add some assert code defend

yapple · 2022-01-18T13:11:11Z

I have benched blobDB to observe the entry size when building an SST file.

{"file_number": 10, "file_size": 18351744,  
"table_properties": 
{"data_size": 18213147, "index_size": 200391, "raw_key_size": 35792208, "raw_average_key_size": 36, "raw_value_size": 8940580, "raw_average_value_size": 8, "num_data_blocks": 10266,
 "num_entries": 994228,  "compression": "Snappy"}}

18351744/994228 = 18.458 Byte / Key

… will benifit from value handle bytedance#199 add tickers of read blob valid bytedance#199 fix invalid_file_cnt statistic bytedance#199 fix invalid_file_cnt_ statistic bytedance#199 refine code style bytedance#199 refactor LOG lsm_state bytedance#199 code format and add some assert code defend

yapple · 2022-03-14T09:23:30Z

we should backward compatibility with historical data，such as the behavior of decode value_meta

yapple added a commit to yapple/terarkdb that referenced this issue Jan 6, 2022

support calculate blob info of valid blob and valid entry, this entry…

d9567d4

… will benifit from value handle bytedance#199

yapple added a commit to yapple/terarkdb that referenced this issue Jan 6, 2022

add tickers of read blob valid bytedance#199

ccc3c1d

yapple added a commit to yapple/terarkdb that referenced this issue Jan 7, 2022

fix invalid_file_cnt_ statistic bytedance#199

5ff5f28

yapple added a commit to yapple/terarkdb that referenced this issue Mar 6, 2022

support mutable column family option : collect_blob_info bytedance#199

86f1c34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Value Handler in KV separation #199

Support Value Handler in KV separation #199

yapple commented Dec 31, 2021 •

edited

skyzh commented Dec 31, 2021

yapple commented Jan 6, 2022 •

edited

yapple commented Jan 6, 2022 •

edited

yapple commented Jan 7, 2022 •

edited

yapple commented Jan 7, 2022

yapple commented Jan 18, 2022

yapple commented Mar 14, 2022

Support Value Handler in KV separation #199

Support Value Handler in KV separation #199

Comments

yapple commented Dec 31, 2021 • edited

[Enhancement]

Problem

Solution

skyzh commented Dec 31, 2021

yapple commented Jan 6, 2022 • edited

yapple commented Jan 6, 2022 • edited

yapple commented Jan 7, 2022 • edited

yapple commented Jan 7, 2022

yapple commented Jan 18, 2022

yapple commented Mar 14, 2022

yapple commented Dec 31, 2021 •

edited

yapple commented Jan 6, 2022 •

edited

yapple commented Jan 6, 2022 •

edited

yapple commented Jan 7, 2022 •

edited