No interface to `juju-reboot` hook tool in ops #929

NucciTheBoss · 2023-04-25T18:08:22Z

Hello there! I am currently working through a predicament in the SLURM charms where the machines need to be rebooted after package installation to apply security updates. Originally, the SLURM charms were deployed manually so the human operator would issue a machine reboot command based on the substrate being used (e.g. openstack server reboot <machine_id>). We are now working to make deploying SLURM an automatic process, however, we still have the issue where the machines need to be rebooted to apply updates.

My idea was to use the juju-reboot hook tool located in /var/lib/juju/tools/machine-<#>/, however, it does not seem that ops has a direct method for invoking juju-reboot. It would be nice if ops had a defined method for scheduling machine reboots at the end of hook execution:

# Assume charm is fully defined

def _on_install(self, _: InstallEvent) -> None:
    # Install some packages, update base image, etc.
    ...

    if pathlib.Path("/var/run/reboot-required").exists():
        logger.info("Scheduling machine for reboot...")
        self.unit.reboot()

Currently I can circumvent my reboot issue by directly invoking the juju-reboot hook tool, but I thought that this might be a nice edition to ops. Let me know what your thoughts are!

The text was updated successfully, but these errors were encountered:

benhoyt · 2023-04-28T06:20:28Z

It's a reasonable idea -- thanks. We're thinking of doing this as a top-level function, say ops.reboot(), rather than an Unit.reboot() method, because it only applies to "our" unit (and not other units), so it seems a bit awkward to keep adding stuff to Unit only to have to test at runtime that it's our unit and fail if not.

It could be defined in ops/model.py, or even a new ops/functions.py or similar.

Thoughts?

NucciTheBoss · 2023-04-28T19:56:20Z

You bring up a good point. I was in pylib-juju mode where you need to "pull down" the unit to run a command on it 😅

I think something like ops/functions.py would be a good edition. I don't think ops/model.py would be a good place for reboot functionality since ops/model.py is more focused on representing Juju-specific data structs such as models, applications, and units. Only thing about ops/functions.py is that I am worried the name functions might be too ambiguous. Are they functions used throughout ops to mask complex operations in the framework or are they meant for use within charms?

Perhaps ops/tools.py or ops/hooktools.py would be good place to define a reboot method? Let me know your thoughts.

benhoyt · 2023-09-29T02:08:35Z

Thinking about this further, ops.reboot() might make sense in some respects, but isn't a good idea -- we need the "backend" object to be attached to something, like the Unit instance. We are rebooting the unit, so that does make sense -- it would just need to do an is_our_unit check like several of the other methods.

benhoyt · 2023-10-11T00:43:59Z

FWIW, here's a use of juju-reboot in the wild: https://github.com/canonical/github-runner-operator/blob/2462acae4002c1c68dad73362ebbc5f98a17587b/src/charm.py#L374

We could potentially submit a PR to that charm after this work is in a released ops version to update it to use Unit.reboot.

Adds a new `reboot` method to `Unit`, that's a wrapper around the `juju-reboot` command. `juju-reboot` is only available for machine charms. When it's run for a K8s charm, Juju logs an error message but doesn't return an response with the error (or have a non-zero exit), which means that we have to fail silently. Adds a corresponding `reboot()` method to the harness backend. From the point of view of the harness, we assume that everything is essentially the same after rebooting without `--now`, so this is a no-op. The most likely change that would impact the harness would be a leadership change, but it seems better to let the Charmer model that explicitly in their tests. When rebooting with `--now`, what should happen is that everything after that point in the event handler is skipped, there's a simulated reboot (again, roughly a no-op), and then the event is re-emitted. I went down a long rabbit-hole trying to get the harness to do this, but it ended up way too messy for minimal benefit. Instead, a `SystemExit` exception is raised and the Charmer is expected to handle it in their tests, for example: ```python def test_installation(self): self.harness.begin() with self.assertRaises(SystemExit): self.harness.charm.on.install.emit() # More asserts here that the first part of installation was done. self.harness.charm.on.install.emit() # More asserts here that the installation continued appropriately. ``` Fixes #929

benhoyt added feature New feature or request small item labels May 1, 2023

benhoyt assigned tonyandrewmeyer Sep 29, 2023

tonyandrewmeyer mentioned this issue Oct 10, 2023

feat: add Unit.reboot for machine charms #1041

Merged

benhoyt closed this as completed in #1041 Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No interface to `juju-reboot` hook tool in ops #929

No interface to `juju-reboot` hook tool in ops #929

NucciTheBoss commented Apr 25, 2023

benhoyt commented Apr 28, 2023

NucciTheBoss commented Apr 28, 2023

benhoyt commented Sep 29, 2023

benhoyt commented Oct 11, 2023

No interface to juju-reboot hook tool in ops #929

No interface to juju-reboot hook tool in ops #929

Comments

NucciTheBoss commented Apr 25, 2023

benhoyt commented Apr 28, 2023

NucciTheBoss commented Apr 28, 2023

benhoyt commented Sep 29, 2023

benhoyt commented Oct 11, 2023

No interface to `juju-reboot` hook tool in ops #929

No interface to `juju-reboot` hook tool in ops #929