*: fix leaked shim caused by high IO pressure #8954

fuweid · 2023-08-11T09:10:45Z

integration: add case to reproduce #7496

When the shim unmounts overlayfs rootfs, kernel will force syncfs if there is no volatile option. In order to reproduce the high IO pressure, this patch uses strace to delay the umount2 syscall.

NOTE: I don't merge three commits into one because it's easy to backport to v1.6.

Fixes: #7496 #8931

integration: add ShouldRetryShutdown case based on #7496

Within current design, if the shim is killed before task-service.Delete API call, the callback on connect close will send 137 exit code because the callback doesn't have any context about container's exit code.

containerd/runtime/v2/shim.go

Lines 170 to 184 in 70a2c95

    
           if response != nil { 
        
           	pid = response.Pid 
        
           	exitStatus = response.Status 
        
           	exitedAt = response.Timestamp 
        
           } else { 
        
           	exitStatus = 255 
        
           	exitedAt = time.Now() 
        
           } 
        
           events.Publish(ctx, runtime.TaskExitEventTopic, &eventstypes.TaskExit{ 
        
           	ContainerID: id, 
        
           	ID:          id, 
        
           	Pid:         pid, 
        
           	ExitStatus:  exitStatus, 
        
           	ExitedAt:    protobuf.ToTimestamp(exitedAt), 
        
           })

And the moby/moby can't handle duplicate exit event well. Let's say that the moby receives exit code 0 at first and then the duplicate exit event with different code 137 can override 0, as the #4769 described.

I think the best solution to avoid leaky issue is to redesign the task-service.Delete API, remove the async callback and then let the caller retry. Since the task in shim can't be restart, we can cache the exit code in container bundle so that the shim.Delete binary call and read it and return exit code correctly. However, it doesn't work with running shim server.

In order to prevent from regression like #4769, I add skipped
integration case as TODO item and we should rethink about how to handle
the task/shim lifecycle.

@mikebrow @dmcgowan @mxpv @AkihiroSuda @thaJeztah @laurazard

Signed-off-by: Wei Fu <fuweid89@gmail.com>

Fixes: containerd#7496 containerd#8931 Signed-off-by: Wei Fu <fuweid89@gmail.com>

Since the moby/moby can't handle duplicate exit event well, it's hard for containerd to retry shutdown if there is error, like context canceled. In order to prevent from regression like containerd#4769, I add skipped integration case as TODO item and we should rethink about how to handle the task/shim lifecycle. Signed-off-by: Wei Fu <fuweid89@gmail.com>

Signed-off-by: Wei Fu <fuweid89@gmail.com>

cpuguy83

LGTM

I have a feeling moby needs a similar patch.

fuweid · 2023-08-16T02:54:02Z

I have a feeling moby needs a similar patch.

Basically, yes. The leaky shim can be cleanup after containerd restart by the way.

Thanks for the review.

fuweid added the ok-to-test label Aug 11, 2023

fuweid force-pushed the fix-shim-leak branch from dc0588b to c362a3b Compare August 11, 2023 09:36

fuweid added 4 commits August 11, 2023 17:41

integration: add case to reproduce containerd#7496

5bdd9ca

Signed-off-by: Wei Fu <fuweid89@gmail.com>

pkg/cri/server: fix leaked shim issue

72bc63d

Fixes: containerd#7496 containerd#8931 Signed-off-by: Wei Fu <fuweid89@gmail.com>

pkg/cri/sbserver: fix leaked shim issue for podsandbox mode

8dcb2a6

Fixes: containerd#7496 containerd#8931 Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid force-pushed the fix-shim-leak branch from c362a3b to 601699a Compare August 11, 2023 09:44

Vagrantfile: add strace tool

00ef8ba

Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid mentioned this pull request Aug 13, 2023

libcontainerd: consider to use task.Wait to update the container's exit code instead of task.Event moby/moby#46212

Open

cpuguy83 approved these changes Aug 15, 2023

View reviewed changes

dmcgowan approved these changes Aug 16, 2023

View reviewed changes

fuweid merged commit ba852fa into containerd:main Aug 17, 2023
45 checks passed

fuweid added cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch area/cri Container Runtime Interface (CRI) labels Aug 17, 2023

fuweid deleted the fix-shim-leak branch August 17, 2023 00:17

johannesfrey mentioned this pull request Oct 30, 2023

shim process leaked #9309

Closed

mikebrow mentioned this pull request Nov 2, 2023

[release/1.7] Update hcsshim tag to v0.11.4 #9326

Merged

marcodalcin mentioned this pull request Feb 6, 2024

OCI runtime create failed #9222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: fix leaked shim caused by high IO pressure #8954

*: fix leaked shim caused by high IO pressure #8954

fuweid commented Aug 11, 2023 •

edited

Loading

cpuguy83 left a comment

fuweid commented Aug 16, 2023

	if response != nil {
	pid = response.Pid
	exitStatus = response.Status
	exitedAt = response.Timestamp
	} else {
	exitStatus = 255
	exitedAt = time.Now()
	}
	events.Publish(ctx, runtime.TaskExitEventTopic, &eventstypes.TaskExit{
	ContainerID: id,
	ID: id,
	Pid: pid,
	ExitStatus: exitStatus,
	ExitedAt: protobuf.ToTimestamp(exitedAt),
	})

*: fix leaked shim caused by high IO pressure #8954

*: fix leaked shim caused by high IO pressure #8954

Conversation

fuweid commented Aug 11, 2023 • edited Loading

integration: add case to reproduce #7496

integration: add ShouldRetryShutdown case based on #7496

cpuguy83 left a comment

Choose a reason for hiding this comment

fuweid commented Aug 16, 2023

fuweid commented Aug 11, 2023 •

edited

Loading