Skip to content

Detector: Rolling upgrade using artifactory mode

Shishir edited this page Jun 16, 2021 · 2 revisions

How do I add a new health check, and do a rolling upgrade on detector?

  • Add the new health check in your github health check repo.
  • Don't forget to update your master config (config.json).
    hint: Use npd config generate --root-dir <dir> to update your master config.
  • Push all your changes to your github health check repo.
  • Make sure your artifact stanza is pointing to the right source i.e. your health check repo.

Now, before we upgrade our detector by rolling out the new version. We should pause the aggregator. This will make sure, that aggregator is not trying to collect node health results, while the detector is getting upgraded.

$ nomad job status aggregator (Copy the allocation ID)
$ nomad alloc signal -s SIGUSR1 <alloc_id>

In the allocation logs, you should see aggregator getting paused.

time="2021-06-09T01:17:51Z" level=info msg="Collect and aggregate nodes health"
time="2021-06-09T01:18:06Z" level=info msg="Collect and aggregate nodes health"
time="2021-06-09T01:18:21Z" level=info msg="Received signal SIGUSR1, pausing aggregator."

Now, upgrade the detector.

$ nomad job stop -purge detector
$ nomad job plan detector-artifact.nomad
$ nomad job run detector-artifact.nomad
$ nomad job status detector

The detector is upgraded successfully, and is now running your new health check that you just added to your health check repo. Now, let's unpause the aggregator.

$ nomad job status aggregator (Copy the allocation ID)
$ nomad alloc signal -s SIGUSR1 <alloc_id>

In the allocation logs, you should see aggregator getting unpaused.

time="2021-06-09T01:22:57Z" level=info msg="Collect and aggregate nodes health"
time="2021-06-09T01:23:12Z" level=info msg="Collect and aggregate nodes health"
time="2021-06-09T01:23:24Z" level=info msg="Received signal SIGUSR1, pausing aggregator."
time="2021-06-09T01:23:29Z" level=info msg="Received signal SIGUSR1, unpausing aggregator."
time="2021-06-09T01:23:29Z" level=info msg="Collect and aggregate nodes health"