Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid handling of pids.limit=0 from runtime spec #4023

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Sep 22, 2023

This is an alternative to #4015 (smaller set of code changes, no libct API breakage).

It has been pointed out in #4014 that runc incorrectly ignores pids.limit=0 set
in the runtime spec. This happens because runtime-spec says "default is
unlimited" and also allows for Pids to not be set at all, thus
distinguishing unset (Resources.Pids == nil) from unlimited
(Resources.Pids.Limit <= 0).

Internally, runc also distinguishes unset from unlimited, but since we
do not use a pointer, we treat 0 as unset and -1 as unlimited.

Add a conversion code to libcontainer/specconv.

Add a test case to check that starting a container with pids.limit=0
results in no pids limit (the test fails before the fix when systemd
cgroup manager is used, as systemd apparently uses parent's TasksMax;
see #4022 for CI runs).

NOTE that runc update still treats 0 as "unset".

Finally, fix/update the documentation here and there.

Fixes: #4014.

@kolyshkin kolyshkin added this to the 1.2.0 milestone Sep 22, 2023
@kolyshkin kolyshkin added area/systemd kind/bug backport/todo/1.1 A PR in main branch which needs to be backported to release-1.1 labels Sep 22, 2023
@kolyshkin kolyshkin marked this pull request as ready for review September 22, 2023 03:07
@kolyshkin
Copy link
Contributor Author

@haircommander PTAL

},
cli.IntFlag{
Name: "pids-limit",
Usage: "Maximum number of pids allowed in the container",
Usage: "Maximum number of tasks; use '-1' for unlimited",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some users will be confused with -r and runtime-spec?

runc update -r - containerid
{
  "pids": { "limit": 0 }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is the way it was always working.
  2. It seems that the behavior of runc update is not in spec.

if r.Pids != nil {
c.Resources.PidsLimit = r.Pids.Limit
if r.Pids.Limit > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether upstream projects should write such codes or not when they update the container's pids limit?

if r.Pids.Limit > 0 {
    c.Resources.PidsLimit = r.Pids.Limit
} else {
    c.Resources.PidsLimit = -1
}

@AkihiroSuda Do you know what's the situation in nerdctl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I still think we should fix it in pids.go, because others always use Manager.Set directly without use specconv.
But we have to solve 2 problems if we choose solutions like #4015 :

  1. Don't always write pids.max if PidsLimit == 0;
  2. Don't introduce break changes of libcontainer API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at git history, and it seems that libcontainer always treated 0 as unset for pids limit, since commit db3159c from January 2016 that added the initial pids limit support (alas, the very same commit incorrectly documents the field (db3159c#diff-b71b6973c045d22e41e802cf65cade82c573a844190bc0d06a6ade9f21cb2c5c), and this PR fixes that).

Thus, I am pretty sure libcontainer users are aware of what 0 means for pids limit in libcontainer; yet I am going to look into kubernetes and nerdctl to confirm.

Now, we have two problems to solve:

  1. runc violates the runtime spec by treating pids.limit==0 as unset rather than unlimited (Undefined (and potentially incorrect) behavior when pids limit is set to 0 #4014);
  2. libcontainer's handling of pids limit value differs from runtime spec.

I think that 2 is much less of a problem than 1, and can be solved by properly documenting the PidsLimit field (this is what this PR does).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: case 1 is the only case I care about fixing

Copy link
Contributor

@haircommander haircommander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@lifubang
Copy link
Member

If we say that we just only need to fix it in ‘runc create’ and ‘runc run’, for the uses using ‘runc update’ and ‘libcontainer’, they should fix this issue in their own projects according to runc’s doc. This one LGTM.

@kolyshkin
Copy link
Contributor Author

for the uses using ‘runc update’ and ‘libcontainer’,

Yes, both runc update and libcontainer API is not actually covered by runtime-spec (which is sort of a problem, but out of scope for this).

Copy link
Member

@lifubang lifubang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, to fix this issue completely, it needs to open PR by their own contributors in projects using ‘runc update’ or ‘runc libcontainer API’, for example, Kubernetes, containerd, and cri-o etc.

@lifubang
Copy link
Member

lifubang commented Oct 7, 2023

@opencontainers/runc-maintainers PTAL
It's really a bug for runc violates the runtime spec by treating pids.limit==0 as unset rather than unlimited. (#4023 (comment))

After this one backported to 1.1, we can release 1.1.10.

@lifubang
Copy link
Member

@AkihiroSuda PTAL
I have checked docker cli and nerdctl, they tell users to use -1 for unlimited pids limit.

@thaJeztah
Copy link
Member

I'd yet have to have a close look, but would like @cyphar to have a look; I think there's still some ambiguity in the description of the PR that updated the specs opencontainers/runtime-spec#234 ("there is no limit: meaning: don't (set) a limit (and use the active max)" vs "explicitly set it to max/unlimited")

The mention of "Go's unset is zero value" in that description may indicate that 0 == unset and thus the former may have been the intent?

And it wouldn't be the first time that specs were update to reflect the implementation, and now (due to ambiguity on the wording), the implementation being update to reflect the spec (going full circle)

@kolyshkin
Copy link
Contributor Author

Rebased

This test case checks that unified resources override those set by
conventional means, but it does not set conventional pids limit.

Fix this.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It has been pointed out that runc incorrectly ignores pids.limit=0 set
in the runtime spec. This happens because runtime-spec says "default is
unlimited" and also allows for Pids to not be set at all, thus
distinguishing unset (Resources.Pids == nil) from unlimited
(Resources.Pids.Limit <= 0).

Internally, runc also distinguishes unset from unlimited, but since we
do not use a pointer, we treat 0 as unset and -1 as unlimited.

Add a conversion code to libcontainer/specconv.

Add a test case to check that starting a container with pids.limit=0
results in no pids limit (the test fails before the fix when systemd
cgroup manager is used, as systemd apparently uses parent's TasksMax).

NOTE that runc update still treats 0 as "unset".

Finally, fix/update the documentation here and there.

Should fix issue 4014.

Reported-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@cyphar
Copy link
Member

cyphar commented Oct 23, 2023

Reading through my (8-year-old 😰) comments, I think that there were three issues at play here:

  1. A value of 0 being identical to 1 is a quirk of the way pids limits work (a process needs to be in the cgroup to fork into the cgroup, so a limit of 0 and 1 are identical in terms of behaviour -- though with CLONE_INTO_CGROUP this is no longer true).
  2. When I wrote that PR, I had sent a patch to make 0 an invalid value of a pids cgroup (due to point 1). It was never merged, but I think the logic was that having two values that mean the same thing was not ideal (and the cgroup maintainer had made a similar comment before merging the cgroup).
  3. At the time, we hadn't fully settled on the whole pointers-in-Go-JSON thing, and so I just used the zero value of int64 as a temporary stopgap until we did a tree-wide fixup. It seems we never did a proper tree-wide fixup (or we skipped PidsLimit).

Looking at it now, I think the ideal option would be to change the spec so that PidsLimit is an *int64 and then we can have the behaviour of most other cgroups (unset/nil means to not touch the limit, any value means "use that value" with -1 meaning unlimited). However, given the disparate behaviour of runtimes when dealing with 0, the spec should have a note saying that (prior to whatever release has the new change) the behaviour was not always consistent...

@kolyshkin kolyshkin removed the backport/todo/1.1 A PR in main branch which needs to be backported to release-1.1 label Oct 24, 2023
@kolyshkin
Copy link
Contributor Author

Removing the backport/todo/1.1 as we're not sure how to move forward with this (maybe we'll change the spec instead, since runc never accepted 0 as unlimited.

@lifubang
Copy link
Member

maybe we'll change the spec instead, since runc never accepted 0 as unlimited.

👍

@haircommander
Copy link
Contributor

haircommander commented Nov 21, 2023

in the meantime, CRI-O can just set -1 where it originally set 0 cri-o/cri-o#7503

@AkihiroSuda
Copy link
Member

What's the next step?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undefined (and potentially incorrect) behavior when pids limit is set to 0
6 participants