Add Resource Group Overview tables to the Monitor Home page#6240
Add Resource Group Overview tables to the Monitor Home page#6240DomGarguilo wants to merge 4 commits intoapache:mainfrom
Conversation
|
|
||
| private final Set<String> resourceGroups = ConcurrentHashMap.newKeySet(); | ||
| private final Set<ServerId> problemHosts = ConcurrentHashMap.newKeySet(); | ||
| private final Set<ServerId> metricProblemHosts = ConcurrentHashMap.newKeySet(); |
There was a problem hiding this comment.
I added a new collection here. Now we collect servers who didnt respond to the metrics poll into this new metricProblemHosts and that number is what is used in the table for the "Not Responding" count. Made this separation since problemHosts could contain hosts that are still responding but just have other problems.
There was a problem hiding this comment.
Yeah, it's possible that a Compactor doesn't respond to the call to get the Metrics, but does respond to the call to get the currently running compaction. Seems odd, but could happen.
|
|
||
| /** | ||
| * If manager is down, tserver status will be ERROR. Add a banner to indicate | ||
| * Show a page banner that matches the tablet server status shown in the navbar. |
There was a problem hiding this comment.
The changes in this file are unrelated but I aligned the status LED with the existence of the banner that explains what the error/warning is on the tserver page.
|
For the table in the UI, I wonder if there is a way to save a bunch of horizontal space. Some thoughts:
|
|
|
||
| private final Set<String> resourceGroups = ConcurrentHashMap.newKeySet(); | ||
| private final Set<ServerId> problemHosts = ConcurrentHashMap.newKeySet(); | ||
| private final Set<ServerId> metricProblemHosts = ConcurrentHashMap.newKeySet(); |
There was a problem hiding this comment.
Yeah, it's possible that a Compactor doesn't respond to the call to get the Metrics, but does respond to the call to get the currently running compaction. Seems odd, but could happen.
| + " contains the total, responding, and not responding server counts.") | ||
| public DeploymentOverview getDeploymentOverview() { | ||
| var summary = monitor.getInformationFetcher().getSummaryForEndpoint(); | ||
| return DeploymentOverview.fromSummary(summary.getDeploymentOverview(), summary.getTimestamp()); |
There was a problem hiding this comment.
In #6235 I removed the DTO object in favor of returning the object that are using in the SystemInformation class. This reduces the object creation that is performed for each user that hits this endpoint. I would suggest doing the same here. The deployment map in SystemInformation isn't really used for anything except this endpoint. Can we change its definition there to be more UI friendly such that we can just return it here with no translation?
|
My personal preference would be to invert the server type and resource groups on this page. So that each group is a server type like |
I am fine with this suggestion. @dlmarion or anyone else have any objections or other ideas here? |
You are suggesting that a user may use two resource groups in the cluster.yaml file with the same name and not think of them as the same? For example, the
They will share the same resource group configuration in ZooKeeper, right? |
For some situations it may make sense to display the servers grouped by RG. If the user names them and configures them appropriately. However there are situations where it does not make sense to group by RG first on the monitor. One situation is the manager is in the default group, but it manages servers in all RGs for all the servers types. So it seems misleading to show the manger with a set of tservers that also happen to be in the default RG, like its only going to interact with those. The same is true for the GC, its going to GC files for tables assigned to any RG. But the GC is in the default RG. Another situation where I think its misleading to group on RG first is w/ complex graphs of RG. Like for example if for efficiency I need scans server group A for certain query, scan server group B for another query, and scan server group C for all other queries. Then I also need tserver group A, copmactors groups A (small copmactions) and B (large compactions). This is complex configuration that serves different query and ingest needs. Gouping on RG first would show three slices of this larger picture. Those three slices are not really useful in conveying anything about the larger graph. So when grouping by RG first, it may mislead in multiple situations and completely misrepresent what is actually happening. In some situations it may be correct in that it matches intention. Since its not always going to be the correct thing to do, thinking its not the best thing to do. In some situations grouping by Rg first will be benign and just add a lot of noise to the page that adds no value. If grouping by server type first, then the page layout will be more stable. Like manager would always be the first group, followed by tablet servers second, etc. |
|
Could also have a single table with a first column of server type and a second column of resource group for the overview. Then can sort by server type or resource group. This could be nice for a situation where like 7 related resource groups have the same prefix, like |
|
After some discussion it seems like a two table approach might be good here: One static table displaying the count per Server Type
And a second with the resource group as a column with filters to easily group by resource group
|
Fixes #6187
Adds a new deployment overview section to the Monitor Overview page. A table per resource group is now rendered to the page showing
Total,RespondingandNot Respondingcounts of servers per server type.Here is an example of the these new tables on the overview page with 3 resource groups:
