Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Add bloom filter related metrics #521

Merged
merged 3 commits into from
Apr 24, 2020

Conversation

acelyc111
Copy link
Member

@acelyc111 acelyc111 commented Apr 23, 2020

What problem does this PR solve?

Ref #496

  • Support to monitor more rocksdb metrics to observe how the storage system works
  • Provide a way to optimize rocksdb configurations and user workload

New metrics

  • replica*app.pegasus*rdb.bf_seek_total@<gpid>

Aka rocksdb::Tickers::BLOOM_FILTER_PREFIX_CHECKED. Number of times bloom was checked before creating iterator on a file.

  • replica*app.pegasus*rdb.bf_seek_negatives<gpid>

Aka rocksdb::Tickers::BLOOM_FILTER_PREFIX_USEFUL. The number of times the check was useful in avoiding iterator creation (and thus likely IOPs).

  • replica*app.pegasus*rdb.bf_point_positive_true<gpid>

Aka rocksdb::Tickers::BLOOM_FILTER_FULL_TRUE_POSITIVE. Of times bloom FullFilter has not avoided the reads and data actually exist.

  • replica*app.pegasus*rdb.bf_point_positive_total<gpid>

Aka rocksdb::Tickers::BLOOM_FILTER_FULL_POSITIVE. Of times bloom FullFilter has not avoided the reads.

  • replica*app.pegasus*rdb.bf_point_negatives<gpid>

Aka rocksdb::Tickers::BLOOM_FILTER_USEFUL. Of times bloom filter has avoided file reads, i.e., negatives.

  • collector*app.pegasus*app.stat.rdb_bf_seek_negatives_rate#<app_name>

Rate of avoided iterator creations (and thus likely IOPs) after checking prefix bloom filter.

value = SUM(bf_seek_negatives) / SUM(bf_seek_total)

  • collector*app.pegasus*app.stat.rdb_bf_point_negatives_rate#<app_name>

Rate of avoided point lookups after checking full key bloom filter.

value = SUM(bf_point_negatives) / (SUM(bf_point_negatives) + SUM(point_positive_total))

  • collector*app.pegasus*app.stat.rdb_bf_point_false_positive_rate#<app_name>

False positive rate of checking full key bloom filter.

value = (SUM(bf_point_positive_total) - SUM(bf_point_positive_true)) / (SUM(bf_point_positive_total) - SUM(bf_point_positive_true) + SUM(bf_point_negatives))

The naming of the above metrics are according to rocksdb document about bloom filter:

image

What is changed and how it works?

  • Add 3 columns (seek_n_rate, point_n_rate, point_fp_rate) in shell command app_stat

table level:

>>> app_stat
[app_stat]
app_name     app_id  pcount   GET  MGET   PUT  MPUT   DEL  MDEL  INCR   CAS   CAM  SCAN   RCU   WCU  expire  filter  abnormal  delay  reject  file_mb  file_num  mem_tbl_mb  mem_idx_mb  hit_rate  seek_n_rate  point_n_rate  point_fp_rate
temp             10      16  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00      0.00   0.00    0.00     0.00        16      108.18        0.03      0.99         0.50          0.99           0.01
(total:1)         0      16  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00      0.00   0.00    0.00     0.00        16      108.18        0.03      0.99         0.50          0.99           0.01

partition level:

>>> app_stat -a temp
[app_stat]
pidx           GET  MGET    PUT  MPUT   DEL  MDEL  INCR   CAS   CAM  SCAN   RCU     WCU  expire  filter  abnormal  delay  reject  file_mb  file_num  mem_tbl_mb  mem_idx_mb  hit_rate  seek_n_rate  point_n_rate  point_fp_rate
0             0.00  0.00   5.98  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00   60.00    0.00    0.00      0.00   0.00    0.00     0.00         1        6.62        0.00      0.99         0.49          0.99           0.01
1             0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        5.54        0.00      0.99         0.50          0.99           0.01
2             0.00  0.00  24.03  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  241.00    0.00    0.00      0.00   0.00    0.00     0.00         1        7.48        0.00      0.99         0.49          0.99           0.01
3             0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        6.45        0.00      0.99         0.50          0.99           0.01
4             0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        7.49        0.00      0.99         0.50          0.99           0.01
5             0.00  0.00   4.68  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00   47.00    0.00    0.00      0.00   0.00    0.00     0.00         1        7.78        0.00      0.99         0.49          0.99           0.01
6             0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        6.39        0.00      0.99         0.50          0.99           0.01
7             0.00  0.00  22.04  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  221.00    0.00    0.00      0.00   0.00    0.00     0.00         1        6.18        0.00      0.99         0.50          0.99           0.01
8             0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        5.70        0.00      0.99         0.50          0.99           0.01
9             0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        5.61        0.00      0.99         0.49          0.99           0.01
10            0.00  0.00   3.78  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00   38.00    0.00    0.00      0.00   0.00    0.00     0.00         1        6.35        0.00      0.99         0.50          0.99           0.01
11            0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        8.06        0.00      0.99         0.49          0.99           0.01
12            0.00  0.00  22.23  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  223.00    0.00    0.00      0.00   0.00    0.00     0.00         1        7.69        0.00      0.99         0.50          0.99           0.01
13            0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        5.46        0.00      0.99         0.50          0.99           0.01
14            0.00  0.00   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00    0.00    0.00    0.00      0.00   0.00    0.00     0.00         1        6.81        0.00      0.99         0.49          0.99           0.01
15            0.00  0.00   5.28  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00   53.00    0.00    0.00      0.00   0.00    0.00     0.00         1        7.95        0.00      0.99         0.49          0.99           0.01
(total:16)    0.00  0.00  88.02  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  883.00    0.00    0.00      0.00   0.00    0.00     0.00        16      107.56        0.03      0.99         0.50          0.99           0.01
  • Add 3 metrics (rdb_bf_point_false_positive_rate , rdb_bf_point_negatives_rate , rdb_bf_seek_negatives_rate ) to app level entity
    image

Check List

Tests

  • Manual test (add detailed scripts or steps below)
#!/usr/bin/env python
# coding:utf-8

from pypegasus.pgclient import *

from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks, Deferred


@inlineCallbacks
def basic_test():
    # init
    c = Pegasus(['meta1:port', 'meta2:port'], 'temp')

    suc = yield c.init()
    if not suc:
        reactor.stop()
        print('ERROR: connect pegasus server failed')
        return

    kCount = 10000
    # write test data set A
    print("start to set")
    for i in range(kCount):
        (ret, ign) = yield c.set('hkey_' + str(i), 'skey', 'value_' + str(i), 0, 500)
        #if ret == error_types.ERR_OK.value: continue
        #print('set hkey_' + str(i) + ' : skey => value_' + str(i) + ' ' + str(ret))

    # get test data set A and B, B has the same size of A
    print("start to get")
    for i in range(2*kCount):
        (ret, v) = yield c.get('hkey_' + str(i), 'skey')
        #if ret != error_types.ERR_OK.value:
        #    print('hkey_' + str(i) + ' : skey => ' + v)

    # scan test data set A and B, B has the same size of A. we can get 'seek_n_rate' in shell and 'rdb_bf_seek_negatives_rate' in metric around value of 0.5
    print("start to scan")
    o = ScanOptions()
    o.batch_size = 1
    for i in range(2*kCount):
        s = c.get_scanner('hkey_' + str(i), '6', '8', o)
        while True:
            try:
                ret = yield s.get_next()
                #print('get_next ret: ', ret)
            except Exception as e:
                print(e)
                break

            if not ret:
                break
        s.close()

    reactor.stop()


if __name__ == "__main__":
    reactor.callWhenRunning(basic_test)
    reactor.run()

Related changes

  • Need to cherry-pick to the release branch
    Yes
  • Need to update the documentation
    Yes
  • Need to be included in the release note
    Yes

@acelyc111 acelyc111 force-pushed the rocksdb_bf_metrics branch 2 times, most recently from 0a5b8d8 to b0104b9 Compare April 23, 2020 10:46
@acelyc111 acelyc111 changed the title [metrics] Add bloom filter related metrics from rocksdb [metrics] Add bloom filter related metrics Apr 23, 2020
@acelyc111 acelyc111 changed the title [metrics] Add bloom filter related metrics [feat] Add bloom filter related metrics Apr 23, 2020
@acelyc111 acelyc111 closed this Apr 23, 2020
@acelyc111 acelyc111 reopened this Apr 23, 2020
@acelyc111 acelyc111 changed the title [feat] Add bloom filter related metrics feat(metrics): Add bloom filter related metrics Apr 23, 2020
@acelyc111 acelyc111 marked this pull request as ready for review April 23, 2020 12:28
hycdong
hycdong previously approved these changes Apr 24, 2020
@neverchanje neverchanje added the type/perf-counter PR that made modification on perf-counter, which should be noted in release note. label Apr 24, 2020
@neverchanje neverchanje linked an issue Apr 24, 2020 that may be closed by this pull request
@neverchanje
Copy link
Contributor

neverchanje commented Apr 24, 2020

Why BLOOM_FILTER_PREFIX_USEFUL was called "seek_negatives"? I prefer what rocksdb called - "useful". "bf_seek_negatives_rate" to "bf_seek_useful_rate" is definitely easier to remember.

@acelyc111
Copy link
Member Author

acelyc111 commented Apr 24, 2020

Why BLOOM_FILTER_PREFIX_USEFUL was called "seek_negatives"? I prefer what rocksdb called - "useful". "bf_seek_negatives_rate" to "bf_seek_useful_rate" is definitely easier to remember.

Of course, 'negative' is longer than 'useful', but when 'negative' stand with 'positive', or 'false negative', it's more clear IMO.
Like test results is 'negative' or 'positive' for Coronavirus

@neverchanje
Copy link
Contributor

Yes. But if you want to make it clear, you should call the metrics "false_negative" rather than "negative".

    INIT_COUNTER(rdb_bf_seek_negatives_rate);
    INIT_COUNTER(rdb_bf_point_negatives_rate);
    INIT_COUNTER(rdb_bf_point_false_positive_rate);

These names some have "false" prefixed but some are not.

@acelyc111
Copy link
Member Author

acelyc111 commented Apr 24, 2020

Yes. But if you want to make it clear, you should call the metrics "false_negative" rather than "negative".

    INIT_COUNTER(rdb_bf_seek_negatives_rate);
    INIT_COUNTER(rdb_bf_point_negatives_rate);
    INIT_COUNTER(rdb_bf_point_false_positive_rate);

These names some have "false" prefixed but some are not.

rdb_bf_point_negatives_rate doesn't mean rdb_bf_point_false_negatives_rate , but means rdb_bf_point_[true]_negatives_rate, that is to say, BF says this key definity not exist (negative).
On the other hand, for rdb_bf_point_false_positive_rate , BF says this key may exist (positive), but actual not exist after read data file, so it's a false positive.

@neverchanje neverchanje merged commit fb811ba into apache:master Apr 24, 2020
@neverchanje neverchanje mentioned this pull request May 14, 2020
@neverchanje neverchanje mentioned this pull request Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/perf-counter PR that made modification on perf-counter, which should be noted in release note. v2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monitor and optimize options for bloom filter
4 participants