Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NewSystemd must not ignore UnitExists #279

Closed
kolyshkin opened this issue Apr 3, 2023 · 3 comments · Fixed by #290
Closed

NewSystemd must not ignore UnitExists #279

kolyshkin opened this issue Apr 3, 2023 · 3 comments · Fixed by #290

Comments

@kolyshkin
Copy link
Contributor

This is the same issue as described in opencontainers/runc#3780

In short, NewSystemd ignores UnitExists error from systemd, as a result the PID is not added to the proper systemd unit and thus the cgroup.

See the above issue for details as to how that may happen.

See the opencontainers/runc#3782 as to how this might be fixed.

@danishprakash
Copy link

I'll take a look at this

@mmerkes
Copy link
Contributor

mmerkes commented Apr 22, 2023

I assume that this is the block that you're talking about refactoring? In a nutshell, are you proposing the following:

  1. If you get a unit exists error, reset the systemd unit like here and try again.
  2. If the retry fails, return an error rather than quietly complete.

I'll see if I can throw together a PR. Seems straightforward enough, if my understanding is correct...

mmerkes added a commit to mmerkes/cgroups that referenced this issue Apr 23, 2023
As done in this runc PR
opencontainers/runc#3782,
UnitExists errors are no longer ignored when starting units. If it
already exists, we attempt to reset the failed unit and retry once.
Otherwise, we fail. See containerd#279
for more context.

Signed-off-by: Matt Merkes <matt.merkes@gmail.com>
@kolyshkin
Copy link
Contributor Author

I assume that this is the block that you're talking about refactoring? In a nutshell, are you proposing the following:

  1. If you get a unit exists error, reset the systemd unit like here and try again.
  2. If the retry fails, return an error rather than quietly complete.

I'll see if I can throw together a PR. Seems straightforward enough, if my understanding is correct...

@mmerkes yes, your understanding is correct. The most straightforward fix would be to merely return a "unit already exists" error and when fix the callers who do not expect it. The slightly more elaborated one is to call reset-failed (which will remove the failed/non-running unit, if exists) and retry, like I did in runc/libcontainer/cgroups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants