-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-9924: Add docs for RocksDB properties-based metrics #9895
Conversation
Call for review: @guozhangwang @ableegoldman @lct45 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr @cadonna ! Just left a few nits
each metric reports an aggregation over the RocksDB instances of the state store. | ||
RocksDB Metrics are grouped into statistics-based metrics and properties-based metrics. | ||
The former are recorded from statistics that a RocksDB state store collects whereas the latter are recorded from | ||
properties that RocksDB exposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little confused about the difference between the two after reading this. Maybe an example of something that is exposed by RocksDB but not collected by RocksDB would help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I elaborated on the two types of metrics. Let me know if it is better now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I also didn't understand the difference at first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see now, thanks for the additional info! LGTM
Co-authored-by: leah <lthomas@confluent.io>
each metric reports an aggregation over the RocksDB instances of the state store. | ||
RocksDB Metrics are grouped into statistics-based metrics and properties-based metrics. | ||
The former are recorded from statistics that a RocksDB state store collects whereas the latter are recorded from | ||
properties that RocksDB exposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I also didn't understand the difference at first
If a state store consists of multiple RocksDB instances, as is the case for aggregations over time and session windows, | ||
each metric reports the sum over all the RocksDB instances of the state store, except for the block cache metrics | ||
<code>block-cache-*</code>. The block cache metrics report the sum over all RocksDB instances if each instance uses its | ||
own block cache, and they report the recorded value from only one instance if a single block cache is shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically I think it's possible for users to share a single block cache among only some stores, but not others. Or they could have two block caches shared across different state stores, etc. Would we detect this case and report the correct block cache for a given state store?
(Idk how common that pattern could possibly be, but this made me wonder)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Currently, we treat this mixed pattern as an illegal state and throw an IllegalStateException
. Probably not the best way to handle it. Allowing such a mixed pattern complicates the measurement of the cache metrics. I opened KAFKA-12223 to document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
</tr> | ||
<tr> | ||
<td>compaction-pending</td> | ||
<td>This metric reports 1 if at least one compaction is pending, otherwise it reports 0.</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if this applies only to a single rocksdb instance, or across all instances? By default rocksdb has a single shared Environment (basically a thread pool) for compactions so it seems like it could go either way. But it's ok if you don't know, I wouldn't waste hours trying to understand the rocksdb code trying to figure it out 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I do not know. I supposed that it only applies to one single RocksDB instance. I am wondering how RocksDB can share a thread pool between state stores that are started independently from each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has to do with the Env
class which specifies the thread pool. By default it uses a static Env
, so the Env -- and also the underlying threads -- are shared between all stores within the process. Something like that
Co-authored-by: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged to trunk and cherrypicked to 2.7 |
Document the new properties-based metrics for RocksDB Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Looks like this broke the build, see #9935. I can see a number of test failures in the last Jenkins job, let's please verify the tests before merging. |
@ijuma I'm confused, I don't see any failures for the test that this broke (RocksDBMetricsTest) in the build on the last commit. I saw some test failures, but all unrelated. Am I looking in the wrong place? |
Committer Checklist (excluded from commit message)