Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No interface to juju-reboot hook tool in ops #929

Closed
NucciTheBoss opened this issue Apr 25, 2023 · 4 comments · Fixed by #1041
Closed

No interface to juju-reboot hook tool in ops #929

NucciTheBoss opened this issue Apr 25, 2023 · 4 comments · Fixed by #1041
Assignees
Labels
feature New feature or request small item

Comments

@NucciTheBoss
Copy link

Hello there! I am currently working through a predicament in the SLURM charms where the machines need to be rebooted after package installation to apply security updates. Originally, the SLURM charms were deployed manually so the human operator would issue a machine reboot command based on the substrate being used (e.g. openstack server reboot <machine_id>). We are now working to make deploying SLURM an automatic process, however, we still have the issue where the machines need to be rebooted to apply updates.

My idea was to use the juju-reboot hook tool located in /var/lib/juju/tools/machine-<#>/, however, it does not seem that ops has a direct method for invoking juju-reboot. It would be nice if ops had a defined method for scheduling machine reboots at the end of hook execution:

# Assume charm is fully defined

def _on_install(self, _: InstallEvent) -> None:
    # Install some packages, update base image, etc.
    ...

    if pathlib.Path("/var/run/reboot-required").exists():
        logger.info("Scheduling machine for reboot...")
        self.unit.reboot()

Currently I can circumvent my reboot issue by directly invoking the juju-reboot hook tool, but I thought that this might be a nice edition to ops. Let me know what your thoughts are!

@benhoyt
Copy link
Collaborator

benhoyt commented Apr 28, 2023

It's a reasonable idea -- thanks. We're thinking of doing this as a top-level function, say ops.reboot(), rather than an Unit.reboot() method, because it only applies to "our" unit (and not other units), so it seems a bit awkward to keep adding stuff to Unit only to have to test at runtime that it's our unit and fail if not.

It could be defined in ops/model.py, or even a new ops/functions.py or similar.

Thoughts?

@NucciTheBoss
Copy link
Author

You bring up a good point. I was in pylib-juju mode where you need to "pull down" the unit to run a command on it 😅

I think something like ops/functions.py would be a good edition. I don't think ops/model.py would be a good place for reboot functionality since ops/model.py is more focused on representing Juju-specific data structs such as models, applications, and units. Only thing about ops/functions.py is that I am worried the name functions might be too ambiguous. Are they functions used throughout ops to mask complex operations in the framework or are they meant for use within charms?

Perhaps ops/tools.py or ops/hooktools.py would be good place to define a reboot method? Let me know your thoughts.

@benhoyt benhoyt added feature New feature or request small item labels May 1, 2023
@benhoyt
Copy link
Collaborator

benhoyt commented Sep 29, 2023

Thinking about this further, ops.reboot() might make sense in some respects, but isn't a good idea -- we need the "backend" object to be attached to something, like the Unit instance. We are rebooting the unit, so that does make sense -- it would just need to do an is_our_unit check like several of the other methods.

@benhoyt
Copy link
Collaborator

benhoyt commented Oct 11, 2023

FWIW, here's a use of juju-reboot in the wild: https://github.com/canonical/github-runner-operator/blob/2462acae4002c1c68dad73362ebbc5f98a17587b/src/charm.py#L374

We could potentially submit a PR to that charm after this work is in a released ops version to update it to use Unit.reboot.

benhoyt pushed a commit that referenced this issue Oct 20, 2023
Adds a new `reboot` method to `Unit`, that's a wrapper around the `juju-reboot` command.

`juju-reboot` is only available for machine charms. When it's run for a K8s charm, Juju logs an error message but doesn't return an response with the error (or have a non-zero exit), which means that we have to fail silently.

Adds a corresponding `reboot()` method to the harness backend.

From the point of view of the harness, we assume that everything is essentially the same after rebooting without `--now`, so this is a no-op. The most likely change that would impact the harness would be a leadership change, but it seems better to let the Charmer model that explicitly in their tests.

When rebooting with `--now`, what should happen is that everything after that point in the event handler is skipped, there's a simulated reboot (again, roughly a no-op), and then the event is re-emitted. I went down a long rabbit-hole trying to get the harness to do this, but it ended up way too messy for minimal benefit. Instead, a `SystemExit` exception is raised and the Charmer is expected to handle it in their tests, for example:

```python
def test_installation(self):
    self.harness.begin()
    with self.assertRaises(SystemExit):
        self.harness.charm.on.install.emit()
        # More asserts here that the first part of installation was done.
    self.harness.charm.on.install.emit()
    # More asserts here that the installation continued appropriately.
```

Fixes #929
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request small item
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants