KAFKA-8580: Compute RocksDB metrics #7263

cadonna · 2019-08-28T13:04:20Z

A metric recorder runs in it own thread and regularly records RocksDB metrics from
RocksDB's statistics. For segmented state stores the metrics are aggregated over the
segments.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

cadonna · 2019-08-28T13:08:55Z

streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java

@@ -209,7 +210,8 @@ private void setUpMetrics(final ProcessorContext context, final Map<String, Obje
                (StreamsMetricsImpl) context.metrics(),
                context.taskId()
            );
-            removeStatisticsFromMetricsRecorder = true;
+            metricsRecorder.startRecording(Duration.ofMinutes(10));


Here, I set RocksDB's standard interval for its own regular statistics dump. If anybody has a better idea please speak up. In future, we could provide a user configuration to set this.

cadonna · 2019-08-28T13:10:42Z

streams/src/test/java/org/apache/kafka/streams/integration/MetricsIntegrationTest.java

@@ -461,19 +461,13 @@ private void checkRocksDBMetricsByTag(final String tag) {
        checkMetricByName(listMetricStore, MEMTABLE_BYTES_FLUSHED_RATE, 1);
        checkMetricByName(listMetricStore, MEMTABLE_BYTES_FLUSHED_TOTAL, 1);
        checkMetricByName(listMetricStore, MEMTABLE_HIT_RATIO, 1);
-        checkMetricByName(listMetricStore, MEMTABLE_FLUSH_TIME_AVG, 1);
-        checkMetricByName(listMetricStore, MEMTABLE_FLUSH_TIME_MIN, 1);
-        checkMetricByName(listMetricStore, MEMTABLE_FLUSH_TIME_MAX, 1);


Unfortunately, I could not compute all metrics proposed in KIP-471 because some require functionality of RocksDB 6+ and Streams currently uses 5.18.

cadonna · 2019-08-28T13:10:56Z

streams/src/test/java/org/apache/kafka/streams/integration/MetricsIntegrationTest.java

        checkMetricByName(listMetricStore, WRITE_STALL_DURATION_AVG, 1);
        checkMetricByName(listMetricStore, WRITE_STALL_DURATION_TOTAL, 1);
        checkMetricByName(listMetricStore, BLOCK_CACHE_DATA_HIT_RATIO, 1);
        checkMetricByName(listMetricStore, BLOCK_CACHE_INDEX_HIT_RATIO, 1);
        checkMetricByName(listMetricStore, BLOCK_CACHE_FILTER_HIT_RATIO, 1);
        checkMetricByName(listMetricStore, BYTES_READ_DURING_COMPACTION_RATE, 1);
        checkMetricByName(listMetricStore, BYTES_WRITTEN_DURING_COMPACTION_RATE, 1);
-        checkMetricByName(listMetricStore, COMPACTION_TIME_AVG, 1);
-        checkMetricByName(listMetricStore, COMPACTION_TIME_MIN, 1);
-        checkMetricByName(listMetricStore, COMPACTION_TIME_MAX, 1);


Same as above

cadonna · 2019-08-28T13:12:14Z

Call for review @guozhangwang @ableegoldman @mjsax @vvcephei @bbejeck @abbccdda

cadonna · 2019-08-29T13:26:19Z

For JDK 11/Scala 2.12 the following tests failed:

kafka.api.PlaintextConsumerTest.testLowMaxFetchSizeForRequestAndPartition
kafka.api.SaslGssapiSslEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

The other two builds exceeded the timeout.

Retest this, please

cadonna · 2019-08-30T07:26:36Z

In JDK8/Scala 2.11 the following tests failed:

kafka.network.DynamicConnectionQuotaTest.testDynamicConnectionQuota

cadonna · 2019-08-30T07:26:47Z

Retest this, please

guozhangwang

@cadonna thanks for your PR, I have a meta comment plus some minor ones on the non-testing code.

ping @mjsax @vvcephei for another look.

guozhangwang · 2019-09-03T16:49:16Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+        this.recordingInterval = recordingInterval;
+        if (!recordingAlreadyStarted) {
+            final Thread thread = new Thread(this::recordLoop);
+            thread.setName(storeName.replace(" ", "-") + "-RocksDB-metrics-recorder");


Why we want to replace spaces in thread name?

Overthought it. Will remove the replacement.

guozhangwang · 2019-09-03T16:53:54Z

streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java

@@ -471,7 +473,8 @@ private void closeOpenIterators() {
    private void closeOrUpdateMetricsRecorder() {
        if (closeMetricsRecorder) {
            metricsRecorder.close();
-        } else if (removeStatisticsFromMetricsRecorder) {
+            metricsRecorderIsRunning = false;


The logic of closeMetricsRecorder and metricsRecorderIsRunning is a bit awkward here since the logic of RocksDBMetricsRecorder#removeStatistics and #close are actually the same except stopRecording. How about simplifying it to the following:

Just have a hard-coded recording interval inside RocksDBMetricsRecorder instead of letting caller pass in the value. Also maintaining reference to the background thread but not starting it at construction time.

In addStatistics, if the map is no longer empty afterwards (in practice it should be), starts the thread if it has not been started.

In removeStatistics, if the map becomes empty afterwards, stop the thread.

Then in the caller, we only need addStatistics / removeStatics and do not need the flags for closing any more, and the logic of this function can be simplified also.

For segmented store, we need to make sure that if the last segment was indeed removed, it would not try to create a new one, which means that at least two segments are maintained at the same time which I think can be guaranteed programmatically. To be safe, we can also maintain a flag or a enum state of the recorder of created -> (addStatistics) -> started -> (removeStatistics) stopped, and if addStatistics was called after it has transit to stopped throw an exception.

That is really a good point. I changed the design of the recorder. Now it starts a thread when the first Statistics object is added, then it stops the thread when the last Statistics object is removed and it starts a new thread when a Statistics object is re-added. This was necessary since a RocksDB instance may be opened and closed a couple of times during its life cycle due to restoration.

guozhangwang · 2019-09-03T16:56:19Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+        log.debug("Removed Statistics for store segment {}", storeName);
+    }
+
+    public void startRecording(final Duration recordingInterval) {


If startRecording is called multiple times with different recording intervals, we would change the sleeping intervals, is that intentional? Or should we just stick with the first passed in value?

guozhangwang · 2019-09-03T16:57:46Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

-        compactionTimeAvgSensor = RocksDBMetrics.compactionTimeAvgSensor(streamsMetrics, metricContext);
-        compactionTimeMinSensor = RocksDBMetrics.compactionTimeMinSensor(streamsMetrics, metricContext);
-        compactionTimeMaxSensor = RocksDBMetrics.compactionTimeMaxSensor(streamsMetrics, metricContext);
+        bytesReadDuringCompactionSensor = RocksDBMetrics.bytesReadDuringCompactionSensor(streamsMetrics, metricContext);


Some metrics are removed here, is that intentional?

Yes it was intentional. Unfortunately, some metrics require functionality of RocksDB 6+ and Streams currently uses 5.18.

So there are metrics we would like to add but can't until we upgrade RocksDB? Can we create a 3.0 blocker ticket to add them back in when we bump rocks (and/or maybe a separate ticket to consider a major version bump of rocks with the next major version bump of kafka)

@ableegoldman I created ticket https://issues.apache.org/jira/browse/KAFKA-8897
I did not specify it as a blocker for 3.0 because a major version increase of Streams is not required for a major version increase of RocksDB. If I miss something regarding this, please feel free to comment on the ticket and change it to a blocker.

guozhangwang · 2019-09-03T17:04:47Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

        for (final Statistics statistics : statisticsToRecord.values()) {
            statistics.close();
        }
        statisticsToRecord.clear();
+        log.debug("Closed", storeName);


Should we check for the thread has eventually joined, say when the streams instance is closed? With 10min we should not block on waiting, but also we should make sure we are not accidentally leaking threads.

When the last Statistics object is removed the thread is interrupted, stopped, and the removing thread waits for the thread to die. That should ensure no leak. Please review that code with special care.

cadonna · 2019-09-04T11:15:24Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

-        compactionTimeAvgSensor = RocksDBMetrics.compactionTimeAvgSensor(streamsMetrics, metricContext);
-        compactionTimeMinSensor = RocksDBMetrics.compactionTimeMinSensor(streamsMetrics, metricContext);
-        compactionTimeMaxSensor = RocksDBMetrics.compactionTimeMaxSensor(streamsMetrics, metricContext);
+        bytesReadDuringCompactionSensor = RocksDBMetrics.bytesReadDuringCompactionSensor(streamsMetrics, metricContext);


Yes it was intentional. Unfortunately, some metrics require functionality of RocksDB 6+ and Streams currently uses 5.18.

cadonna · 2019-09-04T11:27:04Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+        this.recordingInterval = recordingInterval;
+        if (!recordingAlreadyStarted) {
+            final Thread thread = new Thread(this::recordLoop);
+            thread.setName(storeName.replace(" ", "-") + "-RocksDB-metrics-recorder");


Overthought it. Will remove the replacement.

cadonna · 2019-09-05T20:09:20Z

streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java

@@ -471,7 +473,8 @@ private void closeOpenIterators() {
    private void closeOrUpdateMetricsRecorder() {
        if (closeMetricsRecorder) {
            metricsRecorder.close();
-        } else if (removeStatisticsFromMetricsRecorder) {
+            metricsRecorderIsRunning = false;


That is really a good point. I changed the design of the recorder. Now it starts a thread when the first Statistics object is added, then it stops the thread when the last Statistics object is removed and it starts a new thread when a Statistics object is re-added. This was necessary since a RocksDB instance may be opened and closed a couple of times during its life cycle due to restoration.

cadonna · 2019-09-05T21:12:47Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

        for (final Statistics statistics : statisticsToRecord.values()) {
            statistics.close();
        }
        statisticsToRecord.clear();
+        log.debug("Closed", storeName);


When the last Statistics object is removed the thread is interrupted, stopped, and the removing thread waits for the thread to die. That should ensure no leak. Please review that code with special care.

cadonna · 2019-09-05T21:21:18Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+                bytesReadFromDatabase += statistics.getTickerCount(TickerType.BYTES_READ);
+                memtableBytesFlushed += statistics.getTickerCount(TickerType.FLUSH_WRITE_BYTES);
+                memtableHits += statistics.getAndResetTickerCount(TickerType.MEMTABLE_HIT);
+                memtableMisses += statistics.getAndResetTickerCount(TickerType.MEMTABLE_MISS);


At the moment, hit ratios consists only of the current value measured in the recording interval. Should we change this to AVG, MIN, and MAX?

guozhangwang

Made another pass, all are nit comments. @vvcephei could you take a look at this PR also?

guozhangwang · 2019-09-06T18:21:38Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+    private final String metricsScope;
+    private final String storeName;
+
+    private enum State { NEW, RUNNING, NOT_RUNNING, ERROR, MANUAL }


nit: maybe we can just merge NEW into NOT_RUNNING? I.e. the initialized state is just NOT_RUNNING.

guozhangwang · 2019-09-06T18:22:52Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+        return state == State.RUNNING;
+    }
+
+    public boolean error() {


This function is only used for testing?

guozhangwang · 2019-09-06T18:25:15Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+    }
+
+    // visible for testing
+    RocksDBMetricsRecorder(final String metricsScope, final String storeName, final boolean startRecordingThread) {


nit: the last parameter seems always true in non-testing code. I'm wondering if we can add a couple test-only functions (e.g. the following error() seems test only also), like a setManual which would then force the state to be MANUAL instead of setting it at the constructor.

Then we can remove the startRecordingThread flag, and in tests after we've created the recorder object call setManual so that later add/removeStatistics would not start or stop recording.

guozhangwang · 2019-09-06T18:25:42Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+        if (state == State.RUNNING && statisticsToRecord.isEmpty()) {
+            state = State.NOT_RUNNING;
+            thread.interrupt();
+            waitForThreadToDie();


One caveat is that when we are closing the Kafka Streams instance with a specified timeout value, this function may violate that timeout and wait for longer time since we call thread.join() without a timeout value.

Just to clarify, that add/removeStatistics do not need to be thread-safe since they would only be called within a single thread at a given time right?

vvcephei

Hey @cadonna ,

Thanks for the PR!

I'm curious about the threading model. Is there an advantage to having one thread per store? It seems like we could also just have one thread per Streams instance and have it iterate over all the RocksDBs. I'm not sure, but it seems like the management might be a little simpler, not to mention avoiding all those extra threads for an amount of work that is really quite small.

WDYT?

vvcephei · 2019-09-06T20:50:06Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+        }
+        boolean wait = true;
+        log.debug("Wait for recording thread to die");
+        while (wait) {


This seems suspicious...

Shouldn't all blocking operations have timeouts?

Should we be swallowing and ignoring InterruptedExceptions?

It seems like a recipe for Streams to hang forever un-killably. But I feel like I'm missing something.

vvcephei · 2019-09-06T20:52:42Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

        for (final Statistics statistics : statisticsToRecord.values()) {
-            statistics.close();
+            bytesWrittenToDatabase += statistics.getTickerCount(TickerType.BYTES_WRITTEN);


some of these don't reset the ticker, but they all use +=. Is this right?

Oh, Thanks! I totally overlooked that. Now all ticker counts are reset after each recording except for open files and file errors. Open files and file errors are recorded as values, i.e., no aggregation is applied on them on Kafka's metrics side. All other metrics apply some aggregation, thus we have to record only the delta at each recording.

vvcephei

Hey @cadonna , this LGTM, mod a few minor thoughts you can take or leave.

Thanks!
-John

vvcephei · 2019-09-23T14:59:07Z

streams/src/main/java/org/apache/kafka/streams/KafkaStreams.java

+                rocksDBMetricsRecordingTriggerThread.scheduleAtFixedRate(
+                    rocksDBMetricsRecordingTrigger,
+                    Duration.ZERO.toMinutes(),
+                    Duration.ofMinutes(1).toMinutes(),


(nit): not sure if this is any clearer that 1 (and 0) above.

I stored the values for delay and interval in variables named recordingDelay and recordingInterval, respectively.

vvcephei · 2019-09-23T15:15:35Z

streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java

+    private void maybeRemoveStatisticsFromMetricsRecorder() {
+        if (isRecordingLevelDebug) {
+            metricsRecorder.removeStatistics(name);
+            isRecordingLevelDebug = false;


This seems a little mis-named, since the recording level doesn't actually change when we remove the stats. Maybe something like isStatsRegistered?

On the other hand, would it be possible to just idempotently "remove" the stats even if they weren't registered? I.e., just no-op in that case?

It would be possible to idempotently remove the stats and I also thought about it. However, I have IllegalStateExceptions in place in RocksDBMetricsRecorder and RocksDBMetricsRecordingTrigger that ensure that the contract is satisfied to only remove stats that where added before. I thought, that would improve the correctness of the code.

I think it should be safe to call remove here multiple times, the important thing is just to make sure it's actually at least once

But @cadonna , will this remove the metrics whenever any store on the instance is closed? That seems to make sense with the current eager rebalancing, but with KIP-429 we will only close the ones that are migrated to another consumer.

We need to be able to remove just a few specific metrics without disrupting the others, while also making sure to actually close/cleanup during an actual shutdown

After closer look, I don't think there's a conflict between 429 and how the metrics are currently closed 😄

Metrics of a store within a task are removed from the streams metrics whenever the store of this task is closed. The metrics of the stores that are still open remain. That should be fine with KIP-429.

vvcephei · 2019-09-23T15:26:37Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+            this.streamsMetrics = streamsMetrics;
+            isInitialized = true;
+        }
+        if (this.taskId != taskId) {


I like this parity check. 👍

vvcephei · 2019-09-23T15:33:26Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+            bytesWrittenDuringCompaction += statistics.getAndResetTickerCount(TickerType.COMPACT_WRITE_BYTES);
+            bytesReadDuringCompaction += statistics.getAndResetTickerCount(TickerType.COMPACT_READ_BYTES);
+            numberOfOpenFiles += statistics.getTickerCount(TickerType.NO_FILE_OPENS)
+                - statistics.getTickerCount(TickerType.NO_FILE_CLOSES);


Hopefully, there's no danger of these counters overflowing, but regardless, I think the math would still work out if we did reset them here.

What if rocksDB opened 10 files and closed 0 files during a recording interval. The metrics would record 10. Then the counters are reset to 0. In the next recording interval 0 files are opened and 10 are closed. The metric would record -10. However, the metrics is defined as a Value(), i.e., the value is recorded as a plain number. Your proposal would work, if the metric were defined as CumulativeSum(). I will try to do that.

Ah, you're right. I was thinking that the numberOfOpenFiles variable was a field (i.e., persistent).

Metrics are now reset after each read and accumulated with CumulativeSum().

guozhangwang

Made another pass, only have nit ones left and I think after that it's good to merge.

guozhangwang · 2019-09-23T16:32:01Z

streams/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetrics.java

@@ -380,7 +379,7 @@ public static Sensor numberOfOpenFilesSensor(final StreamsMetricsImpl streamsMet
    public static Sensor numberOfFileErrorsSensor(final StreamsMetricsImpl streamsMetrics,
                                                  final RocksDBMetricContext metricContext) {
        final Sensor sensor = createSensor(streamsMetrics, metricContext, NUMBER_OF_FILE_ERRORS);
-        addSumMetricToSensor(
+        addValueMetricToSensor(


Ah good catch.

Actually should the previous sensors like bytesReadDuringCompactionSensor not be a addRateOfSumMetricToSensor metric than a rate? My understanding is that the values read from rocksDB is already sum, so we probably only cares about avg / max?

The statistics ticker that we read from RocksDB is a monotonically increasing sum of bytes that we reset after each read. I chose a rate because it should give an impression about the IO that is produced by compactions.

guozhangwang · 2019-09-23T16:56:27Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java


    public RocksDBMetricsRecorder(final String metricsScope, final String storeName) {
        this.metricsScope = metricsScope;
        this.storeName = storeName;
+        final LogContext logContext = new LogContext(String.format("[RocksDB Metrics Recorder for %s] ", storeName));


It's a bit confusing to have two storeNames variable, the first one is actually for the logical storeName (which would be the same to the physical store name if it is kv-store) and the second is for the physical storeName (e.g. for segmented store its name is storeName-segmentID).

Maybe rename the second to rocksDBName?

Renaming is a good idea. I renamed the physical storeName to segmentName.

guozhangwang · 2019-09-23T16:58:04Z

...c/test/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorderTest.java

    @Test
-    public void shouldSetStatsLevelToExceptDetailedTimers() {
+    public void shouldSetStatsLevelToExceptDetailedTimersWhenStatisticsIsAdded() {


Just wanted to say I really like the way this unit test is written! With the right usage of mocks we would avoid having any time-dependent flakiness.

ableegoldman · 2019-09-23T17:08:53Z

...s/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java

+        statisticsToRecord.put(storeName, statistics);
+    }
+
+


nit: extra linebreak?

ableegoldman · 2019-09-23T17:09:01Z

streams/src/main/java/org/apache/kafka/streams/KafkaStreams.java

@@ -665,6 +671,7 @@ private KafkaStreams(final InternalTopologyBuilder internalTopologyBuilder,
        reporters.add(new JmxReporter(JMX_PREFIX));
        metrics = new Metrics(metricConfig, reporters, time);

+


nit: extra linebreak?

ableegoldman · 2019-09-23T17:14:55Z

streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java

+    private void maybeRemoveStatisticsFromMetricsRecorder() {
+        if (isRecordingLevelDebug) {
+            metricsRecorder.removeStatistics(name);
+            isRecordingLevelDebug = false;


I think it should be safe to call remove here multiple times, the important thing is just to make sure it's actually at least once

A metric recorder runs in it own thread and regularly records RocksDB metrics from RocksDB's statistics. For segmented state stores the metrics are aggregated over the segments.

Changes the threading model from one recording thread per logical RocksDB instance to one recording thread per Kafka Streams instance.

vvcephei

Thanks again, @cadonna !

guozhangwang · 2019-09-24T20:29:29Z

Merged to trunk, thanks @cadonna ! Please make sure we have another PR for docs change as well :)

Conflicts: * .gitignore: addition of clients/src/generated-test was near local additions for support-metrics. * checkstyle/suppressions.xml: upstream refactoring of exclusions for generator were near the local changes for support-metrics. * gradle.properties: scala version bump caused a minor conflict due to the kafka version change locally. gradle/dependencies.gradle: bcpkix version bump was near avro additions in the local version. * apache-github/trunk: (49 commits) KAFKA-8471: Replace control requests/responses with automated protocol (apache#7353) MINOR: Don't generate unnecessary strings for debug logging in FetchSessionHandler (apache#7394) MINOR:fixed typo and removed outdated varilable name (apache#7402) KAFKA-8934: Create version file during build for Streams (apache#7397) KAFKA-8319: Make KafkaStreamsTest a non-integration test class (apache#7382) KAFKA-6883: Add toUpperCase support to sasl.kerberos.principal.to.local rule (KIP-309) KAFKA-8907; Return topic configs in CreateTopics response (KIP-525) (apache#7380) MINOR: Address review comments for KIP-504 authorizer changes (apache#7379) MINOR: add versioning to request and response headers (apache#7372) KAFKA-7273: Extend Connect Converter to support headers (apache#6362) MINOR: improve the Kafka RPC code generator (apache#7340) MINOR: Improve the org.apache.kafka.common.protocol code (apache#7344) KAFKA-8880: Docs on upgrade-guide (apache#7385) KAFKA-8179: do not suspend standby tasks during rebalance (apache#7321) KAFKA-8580: Compute RocksDB metrics (apache#7263) KAFKA-8880: Add overloaded function of Consumer.committed (apache#7304) HOTFIX: fix Kafka Streams upgrade note for broker backward compatibility (apache#7363) KAFKA-8848; Update system tests to use new AclAuthorizer (apache#7374) MINOR: remove unnecessary null check (apache#7299) KAFKA-6958: Overload methods for group and windowed stream to allow to name operation name using the new Named class (apache#6413) ...

cadonna commented Aug 28, 2019

View reviewed changes

guozhangwang reviewed Sep 3, 2019

View reviewed changes

bbejeck added the streams label Sep 3, 2019

cadonna force-pushed the AK8580-Compute_RocksDB_metrics branch from a024d01 to 62c57ee Compare September 5, 2019 21:24

cadonna commented Sep 5, 2019

View reviewed changes

guozhangwang reviewed Sep 6, 2019

View reviewed changes

vvcephei reviewed Sep 6, 2019

View reviewed changes

cadonna force-pushed the AK8580-Compute_RocksDB_metrics branch from 62c57ee to 4987afa Compare September 23, 2019 13:33

vvcephei approved these changes Sep 23, 2019

View reviewed changes

guozhangwang approved these changes Sep 23, 2019

View reviewed changes

ableegoldman reviewed Sep 23, 2019

View reviewed changes

cadonna added 7 commits September 23, 2019 22:38

KAFKA-8580: Compute RocksDB metrics

685c632

A metric recorder runs in it own thread and regularly records RocksDB metrics from RocksDB's statistics. For segmented state stores the metrics are aggregated over the segments.

Make check for stopped recorder in test more robust

567e7c5

Fix checkstyle issues

66c89a0

Change lifecycle of RocksDB metrics recorder

f6e081b

Correct reset of ticker counts

50b3865

Change threading model for RocksDB metrics computation

85977e6

Changes the threading model from one recording thread per logical RocksDB instance to one recording thread per Kafka Streams instance.

Include feedback

262b663

cadonna force-pushed the AK8580-Compute_RocksDB_metrics branch from 4987afa to 262b663 Compare September 23, 2019 22:15

vvcephei approved these changes Sep 24, 2019

View reviewed changes

guozhangwang merged commit ad3b843 into apache:trunk Sep 24, 2019

mjsax added the kip Requires or implements a KIP label Jun 12, 2020

		@@ -665,6 +671,7 @@ private KafkaStreams(final InternalTopologyBuilder internalTopologyBuilder,
		reporters.add(new JmxReporter(JMX_PREFIX));
		metrics = new Metrics(metricConfig, reporters, time);

KAFKA-8580: Compute RocksDB metrics #7263

KAFKA-8580: Compute RocksDB metrics #7263

Conversation

cadonna commented Aug 28, 2019

Committer Checklist (excluded from commit message)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadonna commented Aug 28, 2019

cadonna commented Aug 29, 2019

cadonna commented Aug 30, 2019

cadonna commented Aug 30, 2019

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei left a comment

Choose a reason for hiding this comment

guozhangwang commented Sep 24, 2019