Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix leaked shim caused by high IO pressure #8954

Merged
merged 5 commits into from
Aug 17, 2023

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Aug 11, 2023

integration: add case to reproduce #7496

When the shim unmounts overlayfs rootfs, kernel will force syncfs if there is no volatile option. In order to reproduce the high IO pressure, this patch uses strace to delay the umount2 syscall.

NOTE: I don't merge three commits into one because it's easy to backport to v1.6.

Fixes: #7496 #8931

integration: add ShouldRetryShutdown case based on #7496

Within current design, if the shim is killed before task-service.Delete API call, the callback on connect close will send 137 exit code because the callback doesn't have any context about container's exit code.

if response != nil {
pid = response.Pid
exitStatus = response.Status
exitedAt = response.Timestamp
} else {
exitStatus = 255
exitedAt = time.Now()
}
events.Publish(ctx, runtime.TaskExitEventTopic, &eventstypes.TaskExit{
ContainerID: id,
ID: id,
Pid: pid,
ExitStatus: exitStatus,
ExitedAt: protobuf.ToTimestamp(exitedAt),
})

And the moby/moby can't handle duplicate exit event well. Let's say that the moby receives exit code 0 at first and then the duplicate exit event with different code 137 can override 0, as the #4769 described.

I think the best solution to avoid leaky issue is to redesign the task-service.Delete API, remove the async callback and then let the caller retry. Since the task in shim can't be restart, we can cache the exit code in container bundle so that the shim.Delete binary call and read it and return exit code correctly. However, it doesn't work with running shim server.

In order to prevent from regression like #4769, I add skipped
integration case as TODO item and we should rethink about how to handle
the task/shim lifecycle.

@mikebrow @dmcgowan @mxpv @AkihiroSuda @thaJeztah @laurazard

Signed-off-by: Wei Fu <fuweid89@gmail.com>
Fixes: containerd#7496 containerd#8931

Signed-off-by: Wei Fu <fuweid89@gmail.com>
Since the moby/moby can't handle duplicate exit event well, it's hard
for containerd to retry shutdown if there is error, like context
canceled.

In order to prevent from regression like containerd#4769, I add skipped
integration case as TODO item and we should rethink about how to handle
the task/shim lifecycle.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I have a feeling moby needs a similar patch.

@fuweid
Copy link
Member Author

fuweid commented Aug 16, 2023

I have a feeling moby needs a similar patch.

Basically, yes. The leaky shim can be cleanup after containerd restart by the way.

Thanks for the review.

@fuweid fuweid merged commit ba852fa into containerd:main Aug 17, 2023
45 checks passed
@fuweid fuweid added cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch area/cri Container Runtime Interface (CRI) labels Aug 17, 2023
@fuweid fuweid deleted the fix-shim-leak branch August 17, 2023 00:17
@fuweid fuweid added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri Container Runtime Interface (CRI) cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch ok-to-test
Projects
Development

Successfully merging this pull request may close these issues.

shim process leaked when the host having high disk i/o usage
3 participants