Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task: don't close() io before cancel() #8643

Merged
merged 1 commit into from Jun 7, 2023

Conversation

laurazard
Copy link
Member

@laurazard laurazard commented Jun 5, 2023

The contract for cio/io.go/IO states that a call to Close() will always be preceded by a call to Cancel() -

// Close cleans up all open io resources. Cancel() is always called before
which isn't being held up here.

Furthermore, the call to Close() here makes the subsequent Wait() moot, and causes issues to consumers (see: moby/moby#45689) – basically, if we close before Wait()ing, there will be times where the pipes get closed while task IO (stdin/stdout/whatever) is still being piped, so we lose task output.

It seems from

// Only cleanup the IO after a successful Delete
that the Close() should be called there, the call removed in this commit is unnecessary/erroneous.

@k8s-ci-robot
Copy link

Hi @laurazard. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@laurazard
Copy link
Member Author

/cc @dmcgowan @thaJeztah @corhere

@k8s-ci-robot
Copy link

@laurazard: GitHub didn't allow me to request PR reviews from the following users: corhere.

Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @dmcgowan @thaJeztah @corhere

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thaJeztah
Copy link
Member

/ok-to-test

@thaJeztah
Copy link
Member

looks like this was introduced in 55faa5e (#5974), which was created to address

@laurazard
Copy link
Member Author

laurazard commented Jun 5, 2023

OH! I should've noticed that. I guess we might need a slightly more complex solution here. Closing task io before waiting doesn't seem correct to me, but maybe we can change Wait() to ensure it doesn't lock in the situation described in #5621.

Other than that, it still seems incorrect (albeit maybe harmless) to always call t.io.Close() twice ("maybe" since the comment on line

// Only cleanup the IO after a successful Delete
seems to imply that Close() shouldn't be called before a successful task delete)

task.go Show resolved Hide resolved
@laurazard
Copy link
Member Author

laurazard commented Jun 5, 2023

Investigating #5621 (and the fix in 55faa5e), it seems to me that adding the Close() call there was a bad solution. If the call to Wait() hangs on Windows in certain states, the root cause for that should be fixed.


Warning:
This is a lot of speculation and I didn't get to test everything out, but hopefully can direct someone towards the root cause for the issue on Windows

Since this happens only on restore, we can follow the thread and we find that:

The specific IO for which the Wait blocking is created in

containerIO, err = cio.NewContainerIO(id,
which looks like
func NewContainerIO(id string, opts ...ContainerIOOpts) (_ *ContainerIO, err error) {
c := &ContainerIO{
id: id,
stdoutGroup: cioutil.NewWriterGroup(),
stderrGroup: cioutil.NewWriterGroup(),
}
for _, opt := range opts {
if err := opt(c); err != nil {
return nil, err
}
}
if c.fifos == nil {
return nil, errors.New("fifos are not set")
}
// Create actual fifos.
stdio, closer, err := newStdioPipes(c.fifos)
if err != nil {
return nil, err
}
c.stdioPipes = stdio
c.closer = closer
return c, nil
}

and the Wait() being called is
c.closer.Wait()
where the closer there comes from
func newStdioPipes(fifos *cio.FIFOSet) (_ *stdioPipes, _ *wgCloser, err error) {
var (
f io.ReadWriteCloser
set []io.Closer
ctx, cancel = context.WithCancel(context.Background())
p = &stdioPipes{}
)
defer func() {
if err != nil {
for _, f := range set {
f.Close()
}
cancel()
}
}()
if fifos.Stdin != "" {
if f, err = openPipe(ctx, fifos.Stdin, syscall.O_WRONLY|syscall.O_CREAT|syscall.O_NONBLOCK, 0700); err != nil {
return nil, nil, err
}
p.stdin = f
set = append(set, f)
}
if fifos.Stdout != "" {
if f, err = openPipe(ctx, fifos.Stdout, syscall.O_RDONLY|syscall.O_CREAT|syscall.O_NONBLOCK, 0700); err != nil {
return nil, nil, err
}
p.stdout = f
set = append(set, f)
}
if fifos.Stderr != "" {
if f, err = openPipe(ctx, fifos.Stderr, syscall.O_RDONLY|syscall.O_CREAT|syscall.O_NONBLOCK, 0700); err != nil {
return nil, nil, err
}
p.stderr = f
set = append(set, f)
}
return p, &wgCloser{
wg: &sync.WaitGroup{},
set: set,
ctx: ctx,
cancel: cancel,
}, nil
}
for which the Wait() is
g.wg.Wait()

The wg there is referenced in
func (c *ContainerIO) Pipe() {

which gets called in containerd/pkg/cri/server/restart.go – which fits in with our theory.

Pipe() doesn't look suspicious, and doesn't break on unix systems:

func (c *ContainerIO) Pipe() {
wg := c.closer.wg
if c.stdout != nil {
wg.Add(1)
go func() {
if _, err := io.Copy(c.stdoutGroup, c.stdout); err != nil {
log.L.WithError(err).Errorf("Failed to pipe stdout of container %q", c.id)
}
c.stdout.Close()
c.stdoutGroup.Close()
wg.Done()
log.L.Debugf("Finish piping stdout of container %q", c.id)
}()
}
if !c.fifos.Terminal && c.stderr != nil {
wg.Add(1)
go func() {
if _, err := io.Copy(c.stderrGroup, c.stderr); err != nil {
log.L.WithError(err).Errorf("Failed to pipe stderr of container %q", c.id)
}
c.stderr.Close()
c.stderrGroup.Close()
wg.Done()
log.L.Debugf("Finish piping stderr of container %q", c.id)
}()
}
}

we can see then that the likely culprit here is some locking on

if _, err := io.Copy(c.stdoutGroup, c.stdout); err != nil {
that prevents the call to

Locking the Wait() out. This is also consistent with only being broken on Windows, since there's a platform-specific implementation for the pipe's read/write:

func (p *pipe) Write(b []byte) (int, error) {
p.conWg.Wait()
if p.conErr != nil {
return 0, fmt.Errorf("connection error: %w", p.conErr)
}
return p.con.Write(b)
}
func (p *pipe) Read(b []byte) (int, error) {
p.conWg.Wait()
if p.conErr != nil {
return 0, fmt.Errorf("connection error: %w", p.conErr)
}
return p.con.Read(b)
}

It's likely that this is only an issue in this case since we're "re-attaching" existing IO for stopped tasks on Windows, and those pipes are somehow broken and blocking. This happens around

containerd/container.go

Lines 407 to 413 in f92e576

if ioAttach != nil && response.Process.Status != tasktypes.Status_UNKNOWN {
// Do not attach IO for task in unknown state, because there
// are no fifo paths anyway.
if i, err = attachExistingIO(response, ioAttach); err != nil {
return nil, err
}
}


Calling Close() first "fixes" the issue, but breaks a bunch of other contracts/makes it impossible for a consumer to properly handle an exit.

@thaJeztah
Copy link
Member

@cpuguy83 just brought #8334 up, which is something related to this area and was just merged (just in case useful)

// io.Wait locks for restored tasks on Windows unless we call
// io.Close first (https://github.com/containerd/containerd/issues/5621)
// in other cases, preserve the contract and let IO finish before closing
if t.client.runtime == fmt.Sprintf("%s.%s", plugin.RuntimePlugin, "windows") {
Copy link
Member Author

@laurazard laurazard Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to not reintroduce #5621 which was fixed by #5974, we leave the io.Close() call here, on Windows only. (for now, until the underlying issue re: io.Wait() locking for restored tasks on Windows)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The Delete should be called after the task has been stopped.
The kernel or shim will ensure all the processes that are created by this task are killed. Just needs to ensure that the t.io.Cancel() won't close the fifo file (REF: #8334).

@laurazard
Copy link
Member Author

/cc @AkihiroSuda @fuweid

fuweid

This comment was marked as duplicate.

fuweid

This comment was marked as outdated.

// in other cases, preserve the contract and let IO finish before closing
if t.client.runtime == fmt.Sprintf("%s.%s", plugin.RuntimePlugin, "windows") {
t.io.Close()
}
t.io.Cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should add some comments like "Cancel is used to cancel the goroutine which is still in fifo-opening state (container hasn't been started yet). It's not trying to stop pipe because the pipe should be stopped by shim-side. Otherwise we might loss the data from container"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good! From this and the other PRs, it looks like there's a lot of confusion around what Cancel does/should do. I'll add the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @laurazard ! The comment can be handled by the follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oop, I was a bit too fast, I've added it in this commit. Lmk if you think that looks good @fuweid

The contract for `cio/io.go/IO` states that a call to `Close()`
will always be preceded by a call to `Cancel()` -
https://github.com/containerd/containerd/blob/f3a07934b49bf142925a5913e3e19f3528eda0d2/cio/io.go#L59
which isn't being held up here.

Furthermore, the call to `Close()` here makes the subsequent `Wait()`
moot, and causes issues to consumers (see: moby/moby#45689)

It seems from
https://github.com/containerd/containerd/blob/f3a07934b49bf142925a5913e3e19f3528eda0d2/task.go#L338
that the `Close()` should be called there, the call removed in this
commit is unnecessary/erroneous.

We leave the `Close()` call on Windows only since this was introduced
in containerd#5974 to address
containerd#5621.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM

@dmcgowan dmcgowan merged commit 67349c1 into containerd:main Jun 7, 2023
45 checks passed
@fuweid fuweid added the cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch label Jun 8, 2023
@mikebrow mikebrow added the cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch label Jun 8, 2023
@thaJeztah thaJeztah added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch ok-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants