Add Resource Group Overview tables to the Monitor Home page by DomGarguilo · Pull Request #6240 · apache/accumulo

DomGarguilo · 2026-03-24T15:55:55Z

Adds a new deployment overview section to the Monitor Overview page. A table per resource group is now rendered to the page showing Total, Responding and Not Responding counts of servers per server type.

Here is an example of the these new tables on the overview page with 3 resource groups:

DomGarguilo · 2026-03-24T15:59:14Z

server/monitor/src/main/java/org/apache/accumulo/monitor/next/SystemInformation.java


  private final Set<String> resourceGroups = ConcurrentHashMap.newKeySet();
  private final Set<ServerId> problemHosts = ConcurrentHashMap.newKeySet();
+  private final Set<ServerId> metricProblemHosts = ConcurrentHashMap.newKeySet();


I added a new collection here. Now we collect servers who didnt respond to the metrics poll into this new metricProblemHosts and that number is what is used in the table for the "Not Responding" count. Made this separation since problemHosts could contain hosts that are still responding but just have other problems.

Yeah, it's possible that a Compactor doesn't respond to the call to get the Metrics, but does respond to the call to get the currently running compaction. Seems odd, but could happen.

DomGarguilo · 2026-03-24T16:03:01Z

server/monitor/src/main/resources/org/apache/accumulo/monitor/resources/js/tservers.js


 /**
- * If manager is down, tserver status will be ERROR. Add a banner to indicate
+ * Show a page banner that matches the tablet server status shown in the navbar.


The changes in this file are unrelated but I aligned the status LED with the existence of the banner that explains what the error/warning is on the tserver page.

dlmarion · 2026-03-24T17:12:07Z

For the table in the UI, I wonder if there is a way to save a bunch of horizontal space. Some thoughts:

The total is the sum of responding + not responding, maybe we can not show it?
I wonder if we can use visual cues and a tooltip to show the responding and not responding in one column. For example, the format of the column value could be "X/Y" where X is the number of responding servers in green and Y is the number of not responding servers in red. The tooltip could just say "responding / not responding".
With the changes in 1 & 2, maybe it's possible to show multiple tables horizontally, and centered in case there is only one table?

dlmarion · 2026-03-24T17:18:48Z

server/monitor/src/main/java/org/apache/accumulo/monitor/next/SystemInformation.java


  private final Set<String> resourceGroups = ConcurrentHashMap.newKeySet();
  private final Set<ServerId> problemHosts = ConcurrentHashMap.newKeySet();
+  private final Set<ServerId> metricProblemHosts = ConcurrentHashMap.newKeySet();


Yeah, it's possible that a Compactor doesn't respond to the call to get the Metrics, but does respond to the call to get the currently running compaction. Seems odd, but could happen.

dlmarion · 2026-03-24T17:27:06Z

server/monitor/src/main/java/org/apache/accumulo/monitor/next/Endpoints.java

+      + " contains the total, responding, and not responding server counts.")
+  public DeploymentOverview getDeploymentOverview() {
+    var summary = monitor.getInformationFetcher().getSummaryForEndpoint();
+    return DeploymentOverview.fromSummary(summary.getDeploymentOverview(), summary.getTimestamp());


In #6235 I removed the DTO object in favor of returning the object that are using in the SystemInformation class. This reduces the object creation that is performed for each user that hits this endpoint. I would suggest doing the same here. The deployment map in SystemInformation isn't really used for anything except this endpoint. Can we change its definition there to be more UI friendly such that we can just return it here with no translation?

added in ca6af6b

keith-turner · 2026-03-24T18:42:10Z

My personal preference would be to invert the server type and resource groups on this page. So that each group is a server type like Tablet server and the first column in each table is resource group. The reason for this preference is that there is nothing built into Accumulo that will do anything special with a compactor group and tserver group that have the same name. How resource groups are used is all driven by user config and that config is specialized for each server type. So to me the monitor view grouping different server types by resource group implies there is a relationship that may not exists and it feels misleading to me.

DomGarguilo · 2026-03-25T16:58:53Z

My personal preference would be to invert the server type and resource groups on this page. So that each group is a server type like Tablet server and the first column in each table is resource group. The reason for this preference is that there is nothing built into Accumulo that will do anything special with a compactor group and tserver group that have the same name. How resource groups are used is all driven by user config and that config is specialized for each server type. So to me the monitor view grouping different server types by resource group implies there is a relationship that may not exists and it feels misleading to me.

I am fine with this suggestion. @dlmarion or anyone else have any objections or other ideas here?

dlmarion · 2026-03-25T17:16:00Z

My personal preference would be to invert the server type and resource groups on this page. So that each group is a server type like Tablet server and the first column in each table is resource group. The reason for this preference is that there is nothing built into Accumulo that will do anything special with a compactor group and tserver group that have the same name.

You are suggesting that a user may use two resource groups in the cluster.yaml file with the same name and not think of them as the same? For example, the test resource groups in the example below might represent two distinct sets of resources?

tserver:
  default:
    servers_per_host: 2
    hosts:
      - localhost
  test:
    servers_per_host: 1
    hosts:
      - localhost

compactor:
  default:
    servers_per_host: 2
    hosts:
      - localhost
  test:
    servers_per_host: 1
    hosts:
      - localhost

The reason for this preference is that there is nothing built into Accumulo that will do anything special with a compactor group and tserver group that have the same name.

They will share the same resource group configuration in ZooKeeper, right?

keith-turner · 2026-03-25T18:52:00Z

You are suggesting that a user may use two resource groups in the cluster.yaml file with the same name and not think of them as the same?

For some situations it may make sense to display the servers grouped by RG. If the user names them and configures them appropriately.

However there are situations where it does not make sense to group by RG first on the monitor. One situation is the manager is in the default group, but it manages servers in all RGs for all the servers types. So it seems misleading to show the manger with a set of tservers that also happen to be in the default RG, like its only going to interact with those. The same is true for the GC, its going to GC files for tables assigned to any RG. But the GC is in the default RG.

Another situation where I think its misleading to group on RG first is w/ complex graphs of RG. Like for example if for efficiency I need scans server group A for certain query, scan server group B for another query, and scan server group C for all other queries. Then I also need tserver group A, copmactors groups A (small copmactions) and B (large compactions). This is complex configuration that serves different query and ingest needs. Gouping on RG first would show three slices of this larger picture. Those three slices are not really useful in conveying anything about the larger graph.

So when grouping by RG first, it may mislead in multiple situations and completely misrepresent what is actually happening. In some situations it may be correct in that it matches intention. Since its not always going to be the correct thing to do, thinking its not the best thing to do.

In some situations grouping by Rg first will be benign and just add a lot of noise to the page that adds no value. If grouping by server type first, then the page layout will be more stable. Like manager would always be the first group, followed by tablet servers second, etc.

keith-turner · 2026-03-26T15:38:57Z

Could also have a single table with a first column of server type and a second column of resource group for the overview. Then can sort by server type or resource group. This could be nice for a situation where like 7 related resource groups have the same prefix, like testabc_.*, they could all be sorted together in the view.

DomGarguilo · 2026-03-26T18:38:04Z

After some discussion it seems like a two table approach might be good here:

One static table displaying the count per Server Type

Server Type	Responding / Total
Manager	1 / 1
Garbage Collector	1 / 1
Tablet Server	3 / 4
Scan Server	4 / 4
Compactor	5 / 6

And a second with the resource group as a column with filters to easily group by resource group

Resource Group	Server Type	Responding / Total
default	Tablet Server	2 / 3
group1	Tablet Server	1 / 1
default	Scan Server	3 / 3
group1	Scan Server	1 / 1
default	Compactor	4 / 5
group2	Compactor	1 / 1
default	Manager	1 / 1
default	Garbage Collector	1 / 1

Add Resource Group Overview tables to the Monitor Home page

2eab4a0

DomGarguilo added this to the 4.0.0 milestone Mar 24, 2026

DomGarguilo self-assigned this Mar 24, 2026

DomGarguilo commented Mar 24, 2026

View reviewed changes

dlmarion reviewed Mar 24, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into monitorOverview

3f9d16d

build deployment view on finalize

ca6af6b

Merge remote-tracking branch 'upstream/main' into monitorOverview

0fed858

Conversation

DomGarguilo commented Mar 24, 2026

Uh oh!

DomGarguilo Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

dlmarion Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

DomGarguilo Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

dlmarion commented Mar 24, 2026

Uh oh!

dlmarion Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

dlmarion Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

DomGarguilo Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

keith-turner commented Mar 24, 2026

Uh oh!

DomGarguilo commented Mar 25, 2026

Uh oh!

dlmarion commented Mar 25, 2026

Uh oh!

keith-turner commented Mar 25, 2026

Uh oh!

keith-turner commented Mar 26, 2026

Uh oh!

DomGarguilo commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants