Run a central Monitor service #258

amrc-benmorrow · 2024-04-23T15:42:54Z

Run a central instance of the Monitor. This monitors the other monitors and raises Alerts if a cluster goes offline altogether, or if the cluster monitor is not functioning correctly for some reason.

Closes: #233

This will use the same NodeMonitors as the edge monitor, but will be driven from the ConfigDB instead of from k8s.

It's not just running at the edge now. This requires a corresponding change in edge-helm-charts.

This will make diffs easier to understand in the future.

This watches the ConfigDB for Cluster Status entries.

Let's try creating this service principal the New Way, via a krbkey.

Pull in the new rx-util, we need it for mapStartStops. Remove utilities and replace with service-client, we don't need any server-side code.

Waiting for packets with rx.timeout() has the problem that once you've timed out you stop watching for packets. So if a packet comes in during our jitter delay, we won't see it. It also causes a lot of subscription and unsubscription if nothing else is watching the device. Use switchMap instead, like the offline logic.

This is too big a hammer really, but I'm not making a lot of redundant requests here. Without this the ConfigDB notifications are useless, as we just get back a cached result.

Passing the revision build arg into the environment causes cache invalidation for all RUN lines in the container. (Docker can't guess nothing uses the environment.)

* Move central config fetching into a method. * Remove some unnecessary logging.

* Make sure we don't publish before DBIRTH or after DDEATH. * Accept Rebirth DCMD. * Publish Rebirth metrics in our BIRTHs.

Make sure we don't claim to be monitoring a cluster when we are not.

Move the factory function into the NodeSpec; it appears to be impossible to sensible resolve a circular import in JS.

amrc-benmorrow added 4 commits April 22, 2024 08:47

Rename files in preparation for a central monitor

596f4e9

This will use the same NodeMonitors as the edge monitor, but will be driven from the ConfigDB instead of from k8s.

Rename acs-edge-monitor to acs-monitor

e448bf9

It's not just running at the edge now. This requires a corresponding change in edge-helm-charts.

Refactor matrix build YAML

5558bf1

This will make diffs easier to understand in the future.

I missed a file rename

6447cc9

amrc-benmorrow requested a review from AlexGodbehere April 23, 2024 15:42

amrc-benmorrow self-assigned this Apr 23, 2024

amrc-benmorrow added 17 commits April 24, 2024 08:39

Create a central Monitor

940391b

This watches the ConfigDB for Cluster Status entries.

Deploy the central Monitor

1c38e71

Grant the Central Monitor some permissions

72f4b0d

Let's try creating this service principal the New Way, via a krbkey.

Start NodeMonitors for each cluster

69bcb4c

Update deps

182befc

Pull in the new rx-util, we need it for mapStartStops. Remove utilities and replace with service-client, we don't need any server-side code.

Don't allow HTTP caching

82a5b31

This is too big a hammer really, but I'm not making a lot of redundant requests here. Without this the ConfigDB notifications are useless, as we just get back a cached result.

Move creation of git-version.js into run container

f2c2c85

Passing the revision build arg into the environment causes cache invalidation for all RUN lines in the container. (Docker can't guess nothing uses the environment.)

Tidy up a bit

84bcd7b

* Move central config fetching into a method. * Remove some unnecessary logging.

Improve Sparkplug handling

1400e2d

* Make sure we don't publish before DBIRTH or after DDEATH. * Accept Rebirth DCMD. * Publish Rebirth metrics in our BIRTHs.

Make the central Monitor a Sparkplug Node

533c28d

Use monitored Group name for central Devices

3e2aace

Make sure we don't claim to be monitoring a cluster when we are not.

Handle clusters without Group address yet

d70f7ca

Publish DDEATH correctly

98dd475

Remove some unnecessary logging

4f35535

Split Sparkplug code into files

15be362

Split the Monitors into separate files

c0268b4

Move the factory function into the NodeSpec; it appears to be impossible to sensible resolve a circular import in JS.

amrc-benmorrow force-pushed the bmz/central-monitor branch from d107476 to c0268b4 Compare April 24, 2024 07:43

amrc-benmorrow marked this pull request as ready for review April 24, 2024 07:54

AlexGodbehere approved these changes Apr 25, 2024

View reviewed changes

amrc-benmorrow merged commit 1b3891c into main Apr 25, 2024
1 check passed

amrc-benmorrow deleted the bmz/central-monitor branch April 25, 2024 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run a central Monitor service #258

Run a central Monitor service #258

amrc-benmorrow commented Apr 23, 2024 •

edited

Run a central Monitor service #258

Run a central Monitor service #258

Conversation

amrc-benmorrow commented Apr 23, 2024 • edited

amrc-benmorrow commented Apr 23, 2024 •

edited