New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new metric to report real time missing top state for partition #2344
Conversation
Just a small nit: since we have an ask, you can create an Issue with all the details and then in Issues section, can say, 'Fixes #', |
helix-core/src/main/java/org/apache/helix/controller/stages/TopStateHandoffReportStage.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/stages/TopStateHandoffReportStage.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/stages/TopStateHandoffReportStage.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Outdated
Show resolved
Hide resolved
for (Long missingTopStateStartTime : resourcePartitionEntry.getValue().values()) { | ||
if (_missingTopStateDurationThreshold < Long.MAX_VALUE && System.currentTimeMillis() - missingTopStateStartTime > _missingTopStateDurationThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Java parallel compute lamda feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion, but even if we parallelize this loop but still all those threads would be updating same guage value so it would be sequential only, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But it will save our sequential computational time as this update is synchronized in Helix regular pipelines.
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I have several general question:
- Is there a reason why we need a long-running thread for AsyncMissingTopStateMonitor? Can we couple the update of ResourceMonitor with the update of updateMissingTopStateResourceMap in ClusterStatusMonitor, instead of checking the map periodically?
- Can we make this metric as optional, one that can be turned on and turned off by config?
/** | ||
* Missing top state resource map: resourceName-><PartitionName->startTimeOfMissingTopState> | ||
*/ | ||
private final Map<String, Map<String, Long>> _missingTopStateResourceMap = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this line and next line can be part of ClusterStatusMonitor constructor? Just define the type of variable here, but do the instantiation in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! Reason of defining (allocating memory) in main process is that main process (ClusterStatusMonitor) will be the one who is updating this map and async thread will be just reading from it.
* This thread will keep on iterating over resources and report missingTopStateDuration for them until there are at | ||
* least one resource with missing top state for a cluster. | ||
*/ | ||
_asyncMissingTopStateMonitor.start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to join this thread eventually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I think it has to be stopped/interrupted for sure by ClusterStatusMonitor when it's cleaning up all state. But main process (ClusterStatusMonitor) do not have to wait until it's finished because it's async long running thread and metrics reporting is never ending job so it's lifecycle can be tied with main process. Let me know if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I did not follow. When ClusterStatusMonitor terminates, we need to join this thread dont we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the flow and async thread should be tied with ClusterStatusMonitor thread. That means currently, all beans are registered in active() method and unregistered or reset in reset() method. So whenever Helix controller will activate cluster status monitor it should start this async thread and whenever helix controller stops/resets cluster status monitor it should stop this async thread. So just to re-iterate whenever cluster status monitor is reset() it has to be activated first by caller which will make sure that this async thread will be started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean after reset()
we should kill this metrics reporting thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. that's what is done here. In reset() we unregister all things, clear map and then stop/interrupt thread. Async thread will be started again in active() call.
TFTR @mgao0 please find answers inline!
|
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Show resolved
Hide resolved
A general question here. To me it seems the new metrics is using some non 0 random number indicating at least one partition had no top state. (and 0 means all partition is good). I feel like we could have a more logistic meaningful number. |
--> The reason of decoupling metric reporting with existing main process of clusterstatusmonitor is to report metric irrespective of any event is being handled or not. Please correct me if I am wrong. In your implementation, I think update happens at "TopStateHandoffReportStage" witch is trigged when event (update on current stage etc.) We didn't quite decouple these two...? |
@mgao0 synced up offline. Thanks a lot for your inputs. I think you are correct it won't make sense if we just have one counter with increasing value. |
Thanks @rahulrane50 for the update. To add more details, the conclusion is that if only for count of missing top state partition, we don't need an async thread, we can just couple it with ClusterStatusMonitor, but if we want to get a real time measurement for how long the missing top state has been lasting, then it makes sense to use an async thread. Thanks for making the change from counting the count to measuring the duration, and from gauge to histogram which shows the distribution of missing top state duration for different partitions, I think it makes sense. I'll take another look at your updated PR. |
@@ -85,6 +85,9 @@ private void updateTopStateStatus(ResourceControllerDataProvider cache, | |||
if (cache.getClusterConfig() != null) { | |||
durationThreshold = cache.getClusterConfig().getMissTopStateDurationThreshold(); | |||
} | |||
if (clusterStatusMonitor != null) { | |||
clusterStatusMonitor.updateMissingTopStateDurationThreshold(durationThreshold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if durationThreshold is Long.MAX-VALUE, our monitoring will never finish.. thread will remain spin with no-action..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct if threshold is not set then it's no-op thread. But in this case most of the metrics related to top state are not reported so i'm assuming that this config is used commonly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we can assume that. Normally we have default value and user can set -1 if they want to disable this.
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/stages/TopStateHandoffReportStage.java
Show resolved
Hide resolved
|
||
} | ||
} | ||
sleep(50); // Instead of providing stream of durtion values to histogram thread can sleep in between to save some CPU cycles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still concerning this number as hard coding here... It would not be a good thing. Could you explain why adding this number helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a offline discussion with @junkaixue . After giving it a thought i came with below solution please let me know if this looks okay. Summary :
- Original sleep here was to save few computational cycles in this tight for loop but it's difficult to justify correct sleep duration.
- In new solution, thread will report a metric per partition only if it's not reported within last sliding window reset interval. Now this has few benefits as : one if sleeping stops thread to report metrics for all resources and all partitions. But that's may be wrong because what if during that sleep time resource has new partitions with missing top state. Hence ideally if thread has reported duration at least once for that partition then it can skip that partition until it's sliding window has finished.
@desaikomal hence i didn't add sleep here for sliding window reset time but used that value to determine of duration should be reported for that partition or not.
…thin sliding window time.
@@ -55,6 +55,81 @@ | |||
import org.slf4j.LoggerFactory; | |||
|
|||
public class ClusterStatusMonitor implements ClusterStatusMonitorMBean { | |||
private class AsyncMissingTopStateMonitor extends Thread { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be meaningful if our pipeline speed is very fast. Otherwise, it will not help as the cache updated state is refreshed per pipeline. I dont believe we need build this thread but just rely on pipeline call as we did before.
But we need to remove the constraint of doing only final reporting.
We can discuss about it tomorrow f2f.
After internal syncup, it makes sense to not have this real-time metric. Helix will emit a metric as a part of current default pipeline (which is triggered on any helix event), to report number of partitions with missing top state beyond set threshold value. I will close this PR and create a new one to address this. |
Issues
Fixes Add new metric to report missing top state for partition #2345 : Adds a new metric to report real time missing top state for any partition for each resource in a cluster.
(#200 - Link your issue number here: You can write "Fixes #XXX". Please use the proper keyword so that the issue gets closed automatically. See https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue
Any of the following keywords can be used: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved)
Description
Issue :
Currently, if waged rebalancer is used then there could be a situation where multiple leader replicas are residing on same node. If that node goes down for maintainence or for any other reason then there is no current metric which reports about missing top state until any rebalancer event is triggered.
Ask :
Ask here is to add a metric which satisfies following conditions :
Solution :
The solution proposed in this PR starts a async thread as soon as first partition of any resource has top state missing and it continuously reports missing top state duration (atleast once in sliding window interval) for partitions at resource level as long as that resource has "at least one" partition with missing top state. If there are no resources with any partitions with missing top state then this async thread will sleep.
The guage reported by this thread is histogram which won't give exact values but will approximate about duration increasing and will be reset when all top state replicas for that resource are recovered.
(Write a concise description including what, why, how)
Tests
CI pipeline and added new tests for verifying metrics.
(List the names of added unit/integration tests)
(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)
CI is failing with helix-rest tests. Verified locally that tests are running
Failing test results :
Verified locally :
Changes that Break Backward Compatibility (Optional)
(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)