Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Value Handler in KV separation #199

Open
yapple opened this issue Dec 31, 2021 · 7 comments
Open

Support Value Handler in KV separation #199

yapple opened this issue Dec 31, 2021 · 7 comments

Comments

@yapple
Copy link
Collaborator

yapple commented Dec 31, 2021

[Enhancement]

Problem

If we support the value handler in KV separation, the cost of the blob file's index block is expected to be reduced.

Solution

  1. support read path when the value handler is invalid
  2. support update the value handler during compaction
  3. Ensure the read performance at the bad case is not worse than now
@skyzh
Copy link
Collaborator

skyzh commented Dec 31, 2021

This is really interesting!

yapple added a commit to yapple/terarkdb that referenced this issue Jan 6, 2022
yapple added a commit to yapple/terarkdb that referenced this issue Jan 6, 2022
@yapple
Copy link
Collaborator Author

yapple commented Jan 6, 2022

I have run readrandomwriterandom benchmark using db_bench to statistics the read blob info
the db_bench command like this

common='--benchmarks=readrandomwriterandom --use_terark_table=false --statistics=true --threads=20 --enable_lazy_compaction=false --num=167772160 --key_size=8 --value_size=1000 --duration=36000 --use_existing_db=true --histogram=true --cache_size=4294967296'
./db_bench $common --readwritepercent=10 --db=/data02/wangyi/11_4

after the bench run one hour, some key statistic info as below

rocksdb.num.read.blob_valid COUNT : 3165404
rocksdb.num.read.blob_invalid COUNT : 258019
2022/01/06-21:16:42.632655 7f090c3fe640 (Original Log Time 2022/01/06-21:16:42.632127) EVENT_LOG_v1 {"time_micros": 1641475002632109, "job": 782, "cf_name": "default", "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [9, 1, 3, 9, 10, 0, 0], "edge_state": [9, 21, 1383, 6818, 2535, 0, 0], "blob_count": 1776, "invalid_file_cnt": 405, "valid_file_cnt": 1734, "invalid_entry_cnt": 6155508, "valid_entry_cnt": 110549454, "immutable_memtables": 1}

read.blob_invalid represents the invalid count of the readrandomwriterandom workload query the blob file to fetch the value, but the file number is not the newest

read.blob_invalid read.blob_valid invalid ratio
258019 3165404 7.53%

invalid_file_cnt represents the invalid count of blob file stored in the SST's key-value which is not the newest

blob_cnt invalid_file_cnt valid_file_cnt  invalid ratio
1776 405 1734 22.8%

invalid_entry_cnt represents the invalid count of entry in the SST's key-value which is not the newest

invalid_entry_cnt valid_entry_cnt  invalid ratio
6155508 110549454 5.27%

so, most of the queries needn't query the blob file's index block

@yapple
Copy link
Collaborator Author

yapple commented Jan 6, 2022

2022/01/06-20:30:33.826999 7f090c3fe640 (Original Log Time 2022/01/06-20:30:33.826487) EVENT_LOG_v1 {"time_micros": 1641472233826469, "job": 14, "cf_name": "default", "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [2, 1, 3, 9, 9, 0, 0], "edge_state": [2, 232, 1499, 5244, 660, 0, 0], "blob_count": 1558, "invalid_file_cnt": 99, "valid_file_cnt": 1545, "invalid_entry_cnt": 2046902, "valid_entry_cnt": 114094763, "immutable_memtables": 1}

from 2022/01/06-21:16:42.632655 to 2022/01/06-20:30:33.826999 the invalid_file_cnt form 405 decreases to 99 as the LSM-tree's compaction

blob_cnt invalid_file_cnt valid_file_cnt invalid ratio
1558 99 1545 6.35%

So, the cost of the index block of blob files will be reduced to the invalid ratio of invalid_file_cnt.

@yapple
Copy link
Collaborator Author

yapple commented Jan 7, 2022

2022/01/07-06:26:46.446339 7f090c3fe640 (Original Log Time 2022/01/07-06:26:46.445294) EVENT_LOG_v1 {"time_micros": 1641508006445274, "job": 6876, "cf_name": "default", "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [21, 1, 3, 27, 13, 0, 0], "edge_state": [21, 21, 1535, 52059, 4912, 0, 0], "blob_count": 3272, "invalid_
file_cnt": 3248, "valid_file_cnt": 3075, "invalid_entry_cnt": 28257342, "valid_entry_cnt": 179659877, "immutable_memtables": 0}

after the bench run 10 hours, we can see the blob info of the engine, the invalid_
file_cnt from 99 increases to 3248.

blob_cnt invalid_file_cnt valid_file_cnt invalid ratio
3272 3248 3075 99.26%

the invalid_file_cnt ratio is increasing to 99.26% in the heavy write workload

invalid_entry_cnt valid_entry_cnt invalid ratio
28257342 179659877 13.59%

the invalid_entry_cnt ratio is increasing to 13.59%, still considerd a low level

yapple added a commit to yapple/terarkdb that referenced this issue Jan 7, 2022
@yapple
Copy link
Collaborator Author

yapple commented Jan 7, 2022

I fixed the invalid_file_cnt statistic logic like that

void VersionStorageInfo::CalculateBlobInfo() {
  valid_file_cnt_ = 0;
  valid_entry_cnt_ = 0;
  invalid_file_cnt_ = 0;
  invalid_entry_cnt_ = 0;
  std::unordered_map<uint64_t, uint64_t> file_map;
  std::unordered_set<uint64_t> invalid_file_set;
  for (int i = 0; i < num_levels_; i++) {
    for (auto& f : LevelFiles(i)) {
      for (auto fn : f->prop.dependence) {
        file_map[fn.file_number] += fn.entry_count;
      }
    }
  }
  for (auto f : file_map) {
    if (dependence_map_.find(f.first)->second->fd.GetNumber() != f.first) {
      invalid_file_set.insert(
          dependence_map_.find(f.first)->second->fd.GetNumber());
      invalid_entry_cnt_ += f.second;
    } else {
      valid_file_cnt_++;
      valid_entry_cnt_ += f.second;
    }
  }
  invalid_file_cnt_ = invalid_file_set.size();
}

so, the invalid_fiule_cnt is the real blob file index block needs to be loaded.
we rerun the bench and get the engine status as below

2022/01/07-11:06:13.504100 7fc0eadfe640 (Original Log Time 2022/01/07-11:06:13.503938) EVENT_LOG_v1 {"time_micros": 1641524773503912, "job": 39, "cf_name": "default", "event": "flush_finished", "output_compression": "Snappy", "lsm_state": [6, 1, 3, 25, 14, 0, 0], "edge_state": [6, 20, 2386, 51141, 6540, 0, 0], "blob_count": 3232, "invalid_file_cnt": 826, "valid_file_cnt": 3102, "invalid_entry_cnt": 25118108, "valid_entry_cnt": 175587901, "immutable_memtables": 0}
blob_cnt invalid_file_cnt valid_file_cnt invalid ratio
3232 826 3102 25.55%

so, we just need load the quarter of blob index block as before

yapple added a commit to yapple/terarkdb that referenced this issue Jan 13, 2022
… will benifit from value handle bytedance#199

add tickers of read blob valid bytedance#199

fix invalid_file_cnt statistic

fix invalid_file_cnt_ statistic bytedance#199

refine code style bytedance#199
yapple added a commit to yapple/terarkdb that referenced this issue Jan 13, 2022
… will benifit from value handle bytedance#199

add tickers of read blob valid bytedance#199

fix invalid_file_cnt statistic bytedance#199

fix invalid_file_cnt_ statistic bytedance#199

refine code style bytedance#199

refactor LOG lsm_state bytedance#199
yapple added a commit to yapple/terarkdb that referenced this issue Jan 13, 2022
… will benifit from value handle bytedance#199

add tickers of read blob valid bytedance#199

fix invalid_file_cnt statistic bytedance#199

fix invalid_file_cnt_ statistic bytedance#199

refine code style bytedance#199

refactor LOG lsm_state bytedance#199
yapple added a commit to yapple/terarkdb that referenced this issue Jan 13, 2022
… will benifit from value handle bytedance#199

add tickers of read blob valid bytedance#199

fix invalid_file_cnt statistic bytedance#199

fix invalid_file_cnt_ statistic bytedance#199

refine code style bytedance#199

refactor LOG lsm_state bytedance#199

code format and add some assert
yapple added a commit to yapple/terarkdb that referenced this issue Jan 17, 2022
… will benifit from value handle bytedance#199

add tickers of read blob valid bytedance#199

fix invalid_file_cnt statistic bytedance#199

fix invalid_file_cnt_ statistic bytedance#199

refine code style bytedance#199

refactor LOG lsm_state bytedance#199

code format and add some assert

code defend
mm304321141 pushed a commit that referenced this issue Jan 17, 2022
… will benifit from value handle #199

add tickers of read blob valid #199

fix invalid_file_cnt statistic #199

fix invalid_file_cnt_ statistic #199

refine code style #199

refactor LOG lsm_state #199

code format and add some assert

code defend
@yapple
Copy link
Collaborator Author

yapple commented Jan 18, 2022

I have benched blobDB to observe the entry size when building an SST file.

{"file_number": 10, "file_size": 18351744,  
"table_properties": 
{"data_size": 18213147, "index_size": 200391, "raw_key_size": 35792208, "raw_average_key_size": 36, "raw_value_size": 8940580, "raw_average_value_size": 8, "num_data_blocks": 10266,
 "num_entries": 994228,  "compression": "Snappy"}}

18351744/994228 = 18.458 Byte / Key

noobpwnftw pushed a commit to noobpwnftw/terarkdb that referenced this issue Jan 18, 2022
… will benifit from value handle bytedance#199

add tickers of read blob valid bytedance#199

fix invalid_file_cnt statistic bytedance#199

fix invalid_file_cnt_ statistic bytedance#199

refine code style bytedance#199

refactor LOG lsm_state bytedance#199

code format and add some assert

code defend
yapple added a commit to yapple/terarkdb that referenced this issue Mar 6, 2022
yapple added a commit to yapple/terarkdb that referenced this issue Mar 13, 2022
… will benifit from value handle bytedance#199

add tickers of read blob valid bytedance#199

fix invalid_file_cnt statistic bytedance#199

fix invalid_file_cnt_ statistic bytedance#199

refine code style bytedance#199

refactor LOG lsm_state bytedance#199

code format and add some assert

code defend
@yapple
Copy link
Collaborator Author

yapple commented Mar 14, 2022

we should backward compatibility with historical data,such as the behavior of decode value_meta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants