Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add service status as command option to admin command #4567

Merged
merged 6 commits into from
May 24, 2024

Conversation

EdColeman
Copy link
Contributor

Adds a admin command serviceStatus to print service counts and hosts, with resource groups when appropriate to stdout.

To run the command:

> accumulo admin serviceStatus [--noHosts ] [--json]

The noHost option output provides a summary count
The --json option provides unformatted json output (the --noHost option is ignored)

Sample output:

Summary (service counts only)
> accumulo admin serviceStatus --noHosts
Report time: 2024-05-16T20:05:45.493Z
ZooKeeper read errors: 0
Managers: count: 1
Monitors: count: 1
Garbage Collectors: count: 1
Tablet Servers: count: 2
Scan Servers: count: 4
Coordinators: count: 1
Compactors: count: 4
Full status
> accumulo admin serviceStatus

Report time: 2024-05-16T20:04:09.556Z
ZooKeeper read errors: 0
Managers: count: 1
  localhost:9999
Monitors: count: 1
  localhost:9995
Garbage Collectors: count: 1
  localhost:9998
Tablet Servers: count: 2
  localhost:10000
  localhost:9997
Scan Servers: count: 4
  resource groups:
    default
    sg1
  hosts (by group):
    default: localhost:10003
    default: localhost:9996
    sg1: localhost:10004
    sg1: localhost:10005
Coordinators: count: 1
  localhost:9132
Compactors: count: 4
  resource groups:
    q1
    q2
  hosts (by group):
    q1: localhost:9133
    q1: localhost:9134
    q2: localhost:9135
    q2: localhost:9136

Full status formatted as json
> accumulo admin serviceStatus --json | jq   (normally the output is unformatted)

{
  "reportTime": "2024-05-16T20:01:39.593Z",
  "zkReadErrors": 0,
  "noHosts": false,
  "summaryMap": {
    "COMPACTOR": {
      "reportKey": "COMPACTOR",
      "resourceGroups": [
        "q1",
        "q2"
      ],
      "serviceNames": [
        "q1: localhost:9133",
        "q1: localhost:9134",
        "q2: localhost:9135",
        "q2: localhost:9136"
      ],
      "serviceCount": 4,
      "errorCount": 0
    },
    "COORDINATOR": {
      "reportKey": "COORDINATOR",
      "resourceGroups": [],
      "serviceNames": [
        "localhost:9132"
      ],
      "serviceCount": 1,
      "errorCount": 0
    },
    "GC": {
      "reportKey": "GC",
      "resourceGroups": [],
      "serviceNames": [
        "localhost:9998"
      ],
      "serviceCount": 1,
      "errorCount": 0
    },
    "MANAGER": {
      "reportKey": "MANAGER",
      "resourceGroups": [],
      "serviceNames": [
        "localhost:9999"
      ],
      "serviceCount": 1,
      "errorCount": 0
    },
    "MONITOR": {
      "reportKey": "MONITOR",
      "resourceGroups": [],
      "serviceNames": [
        "localhost:9995"
      ],
      "serviceCount": 1,
      "errorCount": 0
    },
    "S_SERVER": {
      "reportKey": "S_SERVER",
      "resourceGroups": [
        "default",
        "sg1"
      ],
      "serviceNames": [
        "default: localhost:10003",
        "default: localhost:9996",
        "sg1: localhost:10004",
        "sg1: localhost:10005"
      ],
      "serviceCount": 4,
      "errorCount": 0
    },
    "T_SERVER": {
      "reportKey": "T_SERVER",
      "resourceGroups": [],
      "serviceNames": [
        "localhost:10000",
        "localhost:9997"
      ],
      "serviceCount": 2,
      "errorCount": 0
    }
  }
}

Fixes #4495

Comment on lines 159 to 166
// process resource groups
var payload = r.getSecond();
String[] tokens = payload.split(",");
String groupSeparator = "";
if (tokens.length == 2) {
groupNames.add(tokens[1]);
groupSeparator = tokens[1] + ": ";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this parsing the lock content represented by the ServerServices object? If so, be aware that this changed in 3.1, ref: https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L664

Comment on lines +132 to +133
sb.append(I2).append("resource groups:\n");
summary.getResourceGroups().forEach(g -> sb.append(I4).append(g).append("\n"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to print the groups twice? I wonder if we could print something like:

  Group: DEFAULT
    localhost:1234
    localhost:1235
  Group: Group1
    localhost:1236

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address in f5f9221

This addresses PR comment to avoid displaying resource groups multiple
times.

It also changes the json output to use group name and a set
of hosts. This makes it easier to pull out resources by resource group.

Also includes some code clean-up
@EdColeman
Copy link
Contributor Author

The output changes in f5f9221 organize services by resource group. This slightly modifies the command and the json output.

text output
> accumulo admin serviceStatus

Report time: 2024-05-21T23:06:21.084Z
ZooKeeper read errors: 0
Managers: count: 1
  localhost:9999
Monitors: count: 1
  localhost:9995
Garbage Collectors: count: 1
  localhost:9998
Tablet Servers: count: 2
  localhost:10000
  localhost:9997
Scan Servers: count: 4
  resource groups:
    default
    sg1
  hosts (by group):
    default (2):
      localhost:10003
      localhost:9996
    sg1 (2):
      localhost:10004
      localhost:10005
Coordinators: count: 1
  localhost:9132
Compactors: count: 4
  resource groups:
    q1
    q2
  hosts (by group):
    q1 (2):
      localhost:9133
      localhost:9134
    q2 (2):
      localhost:9135
      localhost:9136
json output (formatted with jq)
> accumulo admin serviceStatus --json | jq

{
  "reportTime": "2024-05-21T23:06:31.596Z",
  "zkReadErrors": 0,
  "noHosts": false,
  "summaries": {
    "COMPACTOR": {
      "serviceType": "COMPACTOR",
      "resourceGroups": [
        "q1",
        "q2"
      ],
      "serviceByGroups": {
        "q1": [
          "localhost:9133",
          "localhost:9134"
        ],
        "q2": [
          "localhost:9135",
          "localhost:9136"
        ]
      },
      "serviceCount": 4,
      "errorCount": 0
    },
    "COORDINATOR": {
      "serviceType": "COORDINATOR",
      "resourceGroups": [],
      "serviceByGroups": {
        "NO_GROUP": [
          "localhost:9132"
        ]
      },
      "serviceCount": 1,
      "errorCount": 0
    },
    "GC": {
      "serviceType": "GC",
      "resourceGroups": [],
      "serviceByGroups": {
        "NO_GROUP": [
          "localhost:9998"
        ]
      },
      "serviceCount": 1,
      "errorCount": 0
    },
    "MANAGER": {
      "serviceType": "MANAGER",
      "resourceGroups": [],
      "serviceByGroups": {
        "NO_GROUP": [
          "localhost:9999"
        ]
      },
      "serviceCount": 1,
      "errorCount": 0
    },
    "MONITOR": {
      "serviceType": "MONITOR",
      "resourceGroups": [],
      "serviceByGroups": {
        "NO_GROUP": [
          "localhost:9995"
        ]
      },
      "serviceCount": 1,
      "errorCount": 0
    },
    "S_SERVER": {
      "serviceType": "S_SERVER",
      "resourceGroups": [
        "default",
        "sg1"
      ],
      "serviceByGroups": {
        "default": [
          "localhost:10003",
          "localhost:9996"
        ],
        "sg1": [
          "localhost:10004",
          "localhost:10005"
        ]
      },
      "serviceCount": 4,
      "errorCount": 0
    },
    "T_SERVER": {
      "serviceType": "T_SERVER",
      "resourceGroups": [],
      "serviceByGroups": {
        "NO_GROUP": [
          "localhost:10000",
          "localhost:9997"
        ]
      },
      "serviceCount": 2,
      "errorCount": 0
    }
  }
}

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. On the merge upstream, note the following:

  1. The classs ServiceLockData can be used to parse the lock data in main
  2. The Coordinator was merged into the Manager in elasticity

@EdColeman EdColeman merged commit e3c540d into apache:2.1 May 24, 2024
8 checks passed
EdColeman pushed a commit that referenced this pull request May 24, 2024
- the merge of #4567 needs additional changes because of ServiceLock
  data in ZooKeeper.  This will be done as a separate PR.
@ctubbsii ctubbsii modified the milestones: 4.0.0, 3.1.0, 2.1.3 Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants