clusterloader2 - Empty scheduler latency metrics on cluster with multiple masters #2505
Labels
kind/bug
Categorizes issue or PR as related to a bug.
lifecycle/rotten
Denotes an issue or PR that has aged beyond stale and will be auto-closed.
What happened:
At the end of measuring scheduler metrics in a multi-master cluster, I got empty results for scheduler latency.
What you expected to happen:
I expect to get some meaningful metrics from this measurement.
How to reproduce it (as minimally and precisely as possible):
To reproduce you need to restart active scheduler (in my case I change manifest of scheduler) and need that scheduler from other node became active scheduler. In this case clusterloader2 will take wrong node to scrape metric from and you will get empty values of scheduler latency.
Anything else we need to know?:
I investigated this issue and found that this bug is related to choosing
masterName
. When clusterloader choosesmasterName
it takes first ControlPlane node from node list and returns its name, but active scheduler may locate on other ControlPlane node.My suggestion is to use Leases resources as a way to find proper node on which active scheduler is placed.
If it's possible I'm ready to implement solution for this bug.
Environment:
kubectl version
):cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: