Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1] Fix set nofile rlimit error #4277

Merged

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented May 9, 2024

This is backport of #4268, #4265.

Fix: #4195
Close: #4237

1. Fix a get/set race between runc exec and syscall.rlimit.init()

As @ls-ggg has given the detailed steps to reproduce the issue #4195 , and has given the core reason is that the go runtime will cache the value of rlimit nofile. [1]
When we are running runc exec with nofile limit, runc set it in the parent process, it may cause a race with go runtime init. The race condition is in the time when runc parent process set the container child process's nofile limit just after go runtime init has fetched the nofile limit. [2]
So, we should set nofile limit after syscall.rlimit.init() completed.

2. Fix an edge case caused by nofile rlimit cache in go stdlib

As @kolyshkin have found an edge case for set nofile rlimit by runc, which is also caused by nofile rlimit cache in go stdlib. Although we have a way to resolve the above get/set race, but if the hard value of nofile rlimit configured in config.json is bigger than this value of runc create/run/exec, it will also be incorrect restored by syscall.Exec in runc init. So we need to clear the nofile rlimit cache before we start the container initial process if we need to set nofile rlimit for the container.

[1] golang/go@f5eef58#diff-ec665e9789f8cf5cd1828ad7fa9f0ff4ebc1f5b5dd0fc82a296da5c07da7ece6
[2] https://github.com/golang/go/blob/f5eef58e4381259cbd84b3f2074c79607fb5c821/src/syscall/rlimit.go#L34-L35


CHANGELOG:
deprecated: libct.system.Execv is not used in runc anymore, it will be removed in v1.2.0.

@lifubang lifubang added the backport/1.1-pr A backport to 1.1.x release. label May 9, 2024
@lifubang
Copy link
Member Author

lifubang commented May 9, 2024

Note that we don’t support go version less than 1.19 anymore once we merge this PR.

@kolyshkin kolyshkin added this to the 1.1.13 milestone May 9, 2024
@kolyshkin
Copy link
Contributor

kolyshkin commented May 9, 2024

Note that we don’t support go version less than 1.19 anymore once we merge this PR.

This is a questionable decision. I think what we should do instead in 1.1 is add two versions of func ClearRlimitNofileCache -- one for Go 1.19+ which does what it does now, and the other is for !go1.19, which does nothing.

@lifubang lifubang force-pushed the backport-4265-nofilerlimit branch 4 times, most recently from 4adca11 to f67218a Compare May 10, 2024 10:48
@lifubang
Copy link
Member Author

I think what we should do instead in 1.1 is add two versions of func ClearRlimitNofileCache -- one for Go 1.19+ which does what it does now, and the other is for !go1.19, which does nothing.

Thanks your suggestion.

Comment on lines 5 to 11
// As reported in issue #4195, the new version of go runtime(since 1.19)
// will cache rlimit-nofile. Before executing execve, the rlimit-nofile
// of the process will be restored with the cache. In runc, this will
// cause the rlimit-nofile setting by the parent process for the container
// to become invalid. It can be solved by clearing this cache. But
// unfortunately, go stdlib doesn't provide such function, so we need to
// link to the private var `origRlimitNofile` in package syscall to hack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to repeat the comment I guess.

@kolyshkin
Copy link
Contributor

@lifubang thanks for the changes! Can you tell why are you still bumping go version?

@lifubang lifubang force-pushed the backport-4265-nofilerlimit branch 2 times, most recently from 7d7ef39 to cb01f3b Compare May 10, 2024 23:37
@lifubang
Copy link
Member Author

The complete generic features are supported since go 1.18

The complete generic features are supported since go 1.18, because go 1.19 and above need this change.

@kolyshkin
Copy link
Contributor

The complete generic features are supported since go 1.18

The complete generic features are supported since go 1.18, because go 1.19 and above need this change.

I'm sorry, I still don't understand what do you mean :(

This code works with Go 1.17 AFAICS. Can you explain why you changed go version to 1.18 in go.mod in the last commit?

@lifubang
Copy link
Member Author

This code works with Go 1.17 AFAICS.

Yes, this works with go 1.17, but if we use go 1.19, because of the ‘go 1.17’ defined in ‘go.mod’, it can’t work.

@kolyshkin
Copy link
Contributor

Ah, I see! This is what happens without the commit bumping the go version in go.mod:

[kir@kir-rhat runc]$ make all GO=go1.20
go1.20 build -trimpath "-buildmode=pie"  -tags "seccomp" -ldflags "-X main.gitCommit=v1.1.12-21-ge36b3ae7 -X main.version=1.1.12+dev " -o runc .
# github.com/opencontainers/runc/libcontainer/system
libcontainer/system/rlimit_go119.go:13:43: type instantiation requires go1.18 or later (-lang was set to go1.17; check go.mod)
make: *** [Makefile:61: runc] Error 1

So, despite libcontainer/system/rlimit_go119.go being guarded with //go:build go1.19, go build still complains. To me this looks like a bug, and there's an issue opened about this (golang/go#52880), but it looks like it won't be fixed.

This is why the go version bump is needed.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

So, despite libcontainer/system/rlimit_go119.go being guarded with //go:build go1.19, go build still complains. To me this looks like a bug, and there's an issue opened about this (golang/go#52880), but it looks like it won't be fixed.

This is why the go version bump is needed.

Wait, so we're using go1.20, but the go version in go.mod causes it to downgrade to language version 1.17, but ... then it does not use the < go1.19 fallback (through //go:build constraints) implementation? So what's the value of go build-constraints now if the only version used is the version specified in go.mod, and consumers of the module MUST be using that version (or above)? Does that mean it's no longer possible to provide fallback implementations?

This stuff is getting more and more confusing.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we're back porting this one;

Technically that's a breaking change so not something were could include in a patch release. Perhaps we're not using it ourselves (so it would be "dead code" for us), but someone using the module could be using it.

I couldn't find external consumers, but if we have no strong reason, perhaps we should skip it to reduce the diff for path releases?

libcontainer/system/rlimit_go119.go Outdated Show resolved Hide resolved
@lifubang
Copy link
Member Author

Curious why we're back porting this one;

Technically that's a breaking change so not something were could include in a patch release. Perhaps we're not using it ourselves (so it would be "dead code" for us), but someone using the module could be using it.

I couldn't find external consumers, but if we have no strong reason, perhaps we should skip it to reduce the diff for path releases?

I agree. In fact we have no strong reason to back port this commit, we just only need a comment change from this commit.
I'll remove this backport later.
Thanks.

@thaJeztah
Copy link
Member

Thanks! Yes, it's probably fine, but if we don't have a strong need, we may as well skip it (just in case).

@thaJeztah
Copy link
Member

^ I'd be fine adding a Deprecated: comment on it (for the 1.1 branch) if we want to raise visibility for it going away. I think there's some other exported functions that we removed in 1.2 as well, and I guess every bit could help to raise awareness.

@lifubang
Copy link
Member Author

Wait, so we're using go1.20, but the go version in go.mod causes it to downgrade to language version 1.17, but ... then it does not use the < go1.19 fallback (through //go:build constraints) implementation?

No, because go version still be go1.20.

There are two explanations related to go directive in go.mod:

  1. The go directive affects use of new language features
    This means that if we using go 1.19 to compile runc, we can’t use any new language features provided after 1.17, for example generic type;

  2. Before Go 1.21, the directive was advisory only
    This means that we can support go 1.17 like before.

Refer:
https://go.dev/doc/modules/gomod-ref#go

kolyshkin and others added 3 commits May 16, 2024 21:47
This is not used since commit dac4171.
It will be removed in v1.2.0

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit dbd0c33)
Signed-off-by: lifubang <lifubang@acmcoder.com>
Do not refer to the function which was removed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit bac5064)
Signed-off-by: lifubang <lifubang@acmcoder.com>
Issue: opencontainers#4195
Since https://go-review.googlesource.com/c/go/+/476097, there is
a get/set race between runc exec and syscall.rlimit.init, so we
need to call setupRlimits after syscall.rlimit.init() completed.

Signed-off-by: lifubang <lifubang@acmcoder.com>
(cherry picked from commit a853a82)
Signed-off-by: lifubang <lifubang@acmcoder.com>
@lifubang lifubang force-pushed the backport-4265-nofilerlimit branch from cb01f3b to ed13815 Compare May 16, 2024 13:57
@lifubang
Copy link
Member Author

^ I'd be fine adding a Deprecated: comment on it (for the 1.1 branch) if we want to raise visibility for it going away. I think there's some other exported functions that we removed in 1.2 as well, and I guess every bit could help to raise awareness.

Switched to this suggection, make it deprecated in 1.1 .

@kolyshkin
Copy link
Contributor

A single nit: I think libct: clean cached rlimit nofile in go runtime should come AFTER use go 1.18 in go.mod, as otherwise we're breaking git-bisect.

lifubang and others added 4 commits May 17, 2024 18:18
The complete generic features are supported since go 1.18

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime
will cache rlimit-nofile. Before executing execve, the rlimit-nofile
of the process will be restored with the cache. In runc, this will
cause the rlimit-nofile set by the parent process for the container
to become invalid. It can be solved by clearing the cache.

Signed-off-by: ls-ggg <335814617@qq.com>
(cherry picked from commit f9f8abf)
Signed-off-by: lifubang <lifubang@acmcoder.com>
(cherry picked from commit da68c8e)
Signed-off-by: lifubang <lifubang@acmcoder.com>
issues:
opencontainers#4195
opencontainers#4265 (comment)

Signed-off-by: lifubang <lifubang@acmcoder.com>
(cherry picked from commit 4ea0bf8)
Signed-off-by: lfbzhm <lifubang@acmcoder.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
(cherry picked from commit a35f7d8)
Signed-off-by: lifubang <lifubang@acmcoder.com>
@lifubang lifubang force-pushed the backport-4265-nofilerlimit branch from ed13815 to c918058 Compare May 17, 2024 10:18
@lifubang
Copy link
Member Author

A single nit: I think libct: clean cached rlimit nofile in go runtime should come AFTER use go 1.18 in go.mod, as otherwise we're breaking git-bisect.

done

@@ -1,6 +1,6 @@
module github.com/opencontainers/runc

go 1.17
go 1.18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Ideally we should not bump up Go version in backport releases, but it's not a huge deal, as Go 1.17 has already reached EOL)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed here because we rely on a feature that is only available since Go 1.18, and despite the feature being guarded by go1.20 build tag, this bump is still required (see #4277 (comment) above).

Nevertheless, go 1.17 can still be used just fine to compile this (as you can see in CI).

All that doesn't make much sense to me either, but Go developers seem to disagree (golang/go#52880).

@kolyshkin kolyshkin merged commit 9244703 into opencontainers:release-1.1 May 20, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1-pr A backport to 1.1.x release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants