Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle non-existing systemd units deactivation #1614

Closed
tormath1 opened this issue Apr 20, 2023 · 4 comments · Fixed by #1621
Closed

handle non-existing systemd units deactivation #1614

tormath1 opened this issue Apr 20, 2023 · 4 comments · Fixed by #1621
Assignees

Comments

@tormath1
Copy link
Contributor

tormath1 commented Apr 20, 2023

Feature Request

Hello, between systemd-250 and systemd-252, it seems there is a change in the codebase that makes the systemd --root... disable my-unit.service output different. I highly suspect this commit: systemd/systemd@7a6c73d but it would require more investigation.

On Flatcar, between two releases that includes a systemd bump from 250 to 252, we have the following:

  • new stable-3510.2.0:
bash-5.1# systemctl --root=/sysroot disable docker.socket
Failed to disable unit, unit docker.socket does not exist.
bash-5.1# echo $?
1
  • previous stable-3374.2.5:
bash-5.1# systemctl --root=/sysroot disable docker.socket
Failed to disable unit, unit docker.socket does not exist.
bash-5.1# echo $?
0

As Ignition relies on the return code of systemctl command (only when disabling units):

args := []string{"--root", ut.DestDir, "disable", disabledUnit}
if output, err := exec.Command(distro.SystemctlCmd(), args...).CombinedOutput(); err != nil {
return fmt.Errorf("cannot remove symlink(s) for %s: %v: %q", disabledUnit, err, string(output))
}

It leads to a boot failure.

Environment

What hardware/cloud provider/hypervisor is being used to run Ignition? Flatcar on Qemu and OpenStack

Desired Feature

I think Ignition should provide a best effort to disable symlinks but it should let the machine booting even if the symlinks do not exist to at least write the preset value.

This issue is limited to Flatcar with Torcx as docker.{service,socket} is provided by Torcx but we can think about a user that would extend the OS with systemd-sysext images for examples.

Other Information

@tormath1 tormath1 changed the title handle non-existing systemd units handle non-existing systemd units deactivation Apr 20, 2023
@bgilbert
Copy link
Contributor

Interestingly, on systemd 251, systemctl disable enoent.service exits 1 but systemctl --root=/ disable enoent.service exits 0.

On the one hand, Ignition is supposed to fail whenever it can't do something requested by the config. On the other hand, Ignition has never worked this way for systemd units (because Ignition just writes out a preset and trusts systemd to handle the enablement later in boot, #587). It makes some sense that disabling an already-disabled unit would be a no-op, and indeed the original PR added a blackbox test to ensure that (#1352 (comment)). After CI switched to Fedora 38, this test started failing in CI.

I think we should treat this as a regression for now. We might eventually want to change the semantics, but I don't think it makes sense to do that only for this single case on specific systemd releases.

@prestist prestist self-assigned this May 1, 2023
@prestist
Copy link
Collaborator

prestist commented May 1, 2023

@bgilbert What were we fishing for with the error check previously? since it would return 0 in this case?

While more feedback is nice, it will break previously working configs. So I suspect to keep things good we can check the return error and if its error is "...does not exist" then we can ignore it ?

@bgilbert
Copy link
Contributor

bgilbert commented May 1, 2023

I don't think there was a specific error we were concerned about. We're just checking for errors because it's the right thing to do.

It's not a good practice to compare error strings (they might change, or in other circumstances might be localized into another language), so we should try to avoid that if there's any possible alternative. In this case, we could probably run a systemctl command to check whether the unit exists, and only try to disable it if so.

@prestist
Copy link
Collaborator

prestist commented May 1, 2023

Ah yeah that makes waaay more sense. Thank you for the feedback.

prestist added a commit to prestist/ignition that referenced this issue May 8, 2023
Prior to systemd 252 and the change found in commit 7a6c73d. When a unit
did not exist and a `systemctl disable` comand was ran it would not error.
However, since the change 7a6c73d the exit code now returns a 1. Which changes
Ignition's functionality, as it depends on `systemd`. To restore previous
functionality,check if the unit is enabled using systemctl's `is-enabled`
before disabling the unit to avoid unnecessary errors. Fixes coreos#1614
prestist added a commit to prestist/ignition that referenced this issue May 8, 2023
Prior to systemd 252 and the change found in commit 7a6c73d. When a unit
did not exist and a `systemctl disable` comand was ran it would not error.
However, since the change 7a6c73d the exit code now returns a 1. Which changes
Ignition's functionality, as it depends on `systemd`. To restore previous
functionality, check if the unit is enabled using systemctl's `is-enabled`
before disabling the unit to avoid unnecessary errors. Fixes coreos#1614
prestist added a commit to prestist/ignition that referenced this issue May 9, 2023
Due to a change found in systemd/systemd@7a6c73d systemd 252 and newer will
return an exit code 1 when attempting to disable a non-existent unit. Since
Ignition does not expect an exit code of 1 this causes a regression on a
system with systemd 252 or newer.

Add preemptive check to see if the unit is enabled using systemctl's
`is-enabled`before disabling the unit to avoid unnecessary errors.
Fixes coreos#1614
prestist added a commit to prestist/ignition that referenced this issue May 9, 2023
Due to a change found in systemd/systemd@7a6c73d systemd 252 and newer will
return an exit code 1 when attempting to disable a non-existent unit. Since
Ignition does not expect an exit code of 1 this causes a regression on a
system with systemd 252 or newer.

Add preemptive check to see if the unit is enabled using systemctl's
`is-enabled` before disabling the unit to avoid unnecessary errors.
Fixes coreos#1614
prestist added a commit to prestist/ignition that referenced this issue May 9, 2023
Ignition depends on `systemctl disable` for disabling units. Currently
if the unit does not exist `systemctl disable` exits 1; however, before
252 `systemctrl disable` exits 0 if `--root` is specified. Since Ignition
depends on systemctrl's exit code this change caused a regression, and causing
the unit.remove.symlinks blackbox test to fail with:

    removing enablement symlink(s) for "enoent.service": cannot remove symlink(s) for enoent.service: exit status 1: "Failed to disable unit, unit enoent.service does not exist.\n"

Before disabling a unit, use `systemctl is-enabled` to verify that the
unit exists and is enabled.

Fixes coreos#1614.
prestist added a commit to prestist/ignition that referenced this issue May 9, 2023
Ignition depends on `systemctl disable` for disabling units. Currently
if the unit does not exist `systemctl disable` exits 1; however, before
systemd 252 `systemctl disable` exits 0 if `--root` is specified. Since
Ignition depends on systemctl's exit code the new behavior caused a
regression, causing the unit.remove.symlinks blackbox test to fail with:

    removing enablement symlink(s) for "enoent.service": cannot remove symlink(s) for enoent.service: exit status 1: "Failed to disable unit, unit enoent.service does not exist.\n"

Before disabling a unit, use `systemctl is-enabled` to verify that the
unit exists and is enabled.

Fixes coreos coreos#1614
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants