Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-9924: Add docs for RocksDB properties-based metrics #9895

Merged
merged 4 commits into from
Jan 19, 2021

Conversation

cadonna
Copy link
Contributor

@cadonna cadonna commented Jan 14, 2021

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@cadonna
Copy link
Contributor Author

cadonna commented Jan 14, 2021

Call for review: @guozhangwang @ableegoldman @lct45

Copy link
Contributor

@lct45 lct45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr @cadonna ! Just left a few nits

each metric reports an aggregation over the RocksDB instances of the state store.
RocksDB Metrics are grouped into statistics-based metrics and properties-based metrics.
The former are recorded from statistics that a RocksDB state store collects whereas the latter are recorded from
properties that RocksDB exposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little confused about the difference between the two after reading this. Maybe an example of something that is exposed by RocksDB but not collected by RocksDB would help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I elaborated on the two types of metrics. Let me know if it is better now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I also didn't understand the difference at first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see now, thanks for the additional info! LGTM

docs/ops.html Outdated Show resolved Hide resolved
docs/ops.html Outdated Show resolved Hide resolved
docs/ops.html Outdated Show resolved Hide resolved
docs/ops.html Outdated Show resolved Hide resolved
each metric reports an aggregation over the RocksDB instances of the state store.
RocksDB Metrics are grouped into statistics-based metrics and properties-based metrics.
The former are recorded from statistics that a RocksDB state store collects whereas the latter are recorded from
properties that RocksDB exposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I also didn't understand the difference at first

docs/ops.html Outdated Show resolved Hide resolved
docs/ops.html Outdated Show resolved Hide resolved
If a state store consists of multiple RocksDB instances, as is the case for aggregations over time and session windows,
each metric reports the sum over all the RocksDB instances of the state store, except for the block cache metrics
<code>block-cache-*</code>. The block cache metrics report the sum over all RocksDB instances if each instance uses its
own block cache, and they report the recorded value from only one instance if a single block cache is shared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I think it's possible for users to share a single block cache among only some stores, but not others. Or they could have two block caches shared across different state stores, etc. Would we detect this case and report the correct block cache for a given state store?
(Idk how common that pattern could possibly be, but this made me wonder)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Currently, we treat this mixed pattern as an illegal state and throw an IllegalStateException. Probably not the best way to handle it. Allowing such a mixed pattern complicates the measurement of the cache metrics. I opened KAFKA-12223 to document this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

</tr>
<tr>
<td>compaction-pending</td>
<td>This metric reports 1 if at least one compaction is pending, otherwise it reports 0.</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this applies only to a single rocksdb instance, or across all instances? By default rocksdb has a single shared Environment (basically a thread pool) for compactions so it seems like it could go either way. But it's ok if you don't know, I wouldn't waste hours trying to understand the rocksdb code trying to figure it out 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I do not know. I supposed that it only applies to one single RocksDB instance. I am wondering how RocksDB can share a thread pool between state stores that are started independently from each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has to do with the Env class which specifies the thread pool. By default it uses a static Env, so the Env -- and also the underlying threads -- are shared between all stores within the process. Something like that

Co-authored-by: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Copy link
Contributor

@ableegoldman ableegoldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ableegoldman ableegoldman merged commit add160d into apache:trunk Jan 19, 2021
@ableegoldman
Copy link
Contributor

Merged to trunk and cherrypicked to 2.7

ableegoldman pushed a commit that referenced this pull request Jan 19, 2021
Document the new properties-based metrics for RocksDB

Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
@ijuma
Copy link
Contributor

ijuma commented Jan 20, 2021

Looks like this broke the build, see #9935.

I can see a number of test failures in the last Jenkins job, let's please verify the tests before merging.

@ableegoldman
Copy link
Contributor

@ijuma I'm confused, I don't see any failures for the test that this broke (RocksDBMetricsTest) in the build on the last commit. I saw some test failures, but all unrelated. Am I looking in the wrong place?
(I followed the link on the "Apply suggestions from code review" commit build --> https://github.com/apache/kafka/runs/1722608276 )

@ijuma
Copy link
Contributor

ijuma commented Jan 20, 2021

Jenkins is the best place to look:

Screenshot from 2021-01-20 10-36-13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants