-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-16811 Sliding window approach to calculate non-zero punctuate-ratio metric #16162
base: trunk
Are you sure you want to change the base?
Conversation
I have completed the implementation using the SlidingWindow approach with x=30 seconds for testing. Here are the changes: #16162 I have followed these steps to test the changes, but I still see the puncutate-ratio as zero for all the instances of example Demo class.
5. Open the jconsole and watch the metrics I see that all the metrics are getting calculated. When I run the debugger, I see that in this code tasks.activeTasks() is an empty list. Because of that punctuated values is becoming zero, hence the punctuate ratio. TaskExecutor.java
Is there a way to make active tasks list non-empty, thus I can test the changes and write some unit tests? Is this behaviour normal in the local environment? |
.../src/main/java/org/apache/kafka/streams/processor/internals/PunctuateRatioSlidingWindow.java
Outdated
Show resolved
Hide resolved
streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java
Outdated
Show resolved
Hide resolved
Following up from Jira. With 3 input topic partitions, there should be 3 tasks, one for each partition. But the application must first join the consumer group and the rebalancing must finish, before tasks are created. Do you might want to wait until the app goes into However, this condition should never be true (https://github.com/apache/kafka/pull/16162/files#diff-76f629d0df8bd30b2593cbcf4a2dc80de3167ebf55ef8b5558e6e6285a057496R1028) w/o a punctuator and thus the metric should stay at zero. In the end, we should have some unit test anyway? Build it failing right now, due to missing license header in the newly added files. You need to fix this. -- Did you locally run tests? I would hope we have some test coverage already, and would expect that these test would need to get updated, too? |
public double getAverageRatio() { | ||
return ratioQueue.stream() | ||
.mapToDouble(RatioTimeStamp::getRatio) | ||
.average() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we are computing an average over a ratio. Is this mathematically sound? I believe not.
When we record punctuation ratio, the "time frame" over with the ration is computed is not guaranteed to be of a fixed size. Thus, the different ratios would need to be weighted differently for a correct computation?
Seem, instead of keeping window of ratio samples, we should rather keep the raw latency and runtime values, ie, a window of pairs totalPunctuateLatency
and runOnceLatency
, including a pre-computed sumTotalPunctuateLatency
and sumRunOnceLatency
and compute this result as sumTotalPunctuateLatency / sumRunOnceLatency
?
When updating this queue, we can also update both running sums by adding new and removing old values.
Do we need to also update/refine the description of the metric in |
I agree with @mjsax that this would need a KIP. KIP-444 states that this metric is a dynamic gauge. I was also wondering whether it would be possible to use a |
@mjsax @cadonna Thanks for the insights. I will make the necessary changes and would love to write KIP for this change. I recently requested the confluence account for another KIP, and it got approved. This is one of my first two KIPs. If anything you want to share that would help me, please do it. Thank you! |
This pull request changes the method to calculate the
punctuate-ratio
metric. The current implementation calculates the metric after the last record of the poll loop. After a puntuate, the value is close to 1, but there is little chance that metric is sampled at this time. So its value is almost always 0.The updated implementation calculates the metric value over the window of last 30 seconds.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)