client: add synchronize between userCloseFunc and rpc call #87

wlmxjm1 · 2021-06-30T10:15:04Z

when containerd runtime plugin exites abnormally, ttrpc connection will closed
and userCloseFunc will be called to handle cleanup the resources created by
containerd shim. current rpc call will also return err. But these two step are
asynchronous.

after rpc call return err, upper application such as k8s may restart container.
but start may fail due to cleanup not finish, some resources not be released.
and this leaked resources leads to failed inplace-update the pod again.

when containerd runtime plugin exites abnormally, ttrpc connection will closed and userCloseFunc will be called to handle cleanup the resources created by containerd shim. current rpc call will also return err. But these two steps are asynchronous. after rpc call return err, upper application such as k8s may restart container. but start may fail due to cleanup not finish, some resources not be released. and this leaked resources leads to failed inplace-update the pod again. Fixed containerd#88

AkihiroSuda · 2021-07-15T12:06:06Z

client.go

@@ -263,7 +265,6 @@ func (c *Client) run() {

 	defer func() {
 		c.conn.Close()
-		c.userCloseFunc()
 		close(c.userCloseWaitCh)


This line needs to be moved too?

Also, can we have an unit test?

Yes. I think UserOnCloseWait function may not necessary. Cause userCloseFunc() and rpc call are already synchronous in this situation.
I think the problem mentioned in #68 can also be fixed by the current patch. rpc call will not return error until userCloseFunc() finish, then the Task.Delete goroutine works.

If you have to wait for the closeFunc, you can call UserOnCloseWait. But the shim is killed unexpectedly and the userCloseFunc() can be called in async-defer. If we put the userCloserFunc() in Close(), no one will call it. Hope it can help.

I didn't find a suitable place to add UserOnCloseWait. I find UserOnCloseWait is call in shim Delete, but this function will not be called when shim is killed unexpectedly.

When the task is killed unexpectedly, the cleanup will be called and no need to sync the cleanup. The UserOnCloseWait is only be called when delete task manually because it needs sync.

fuweid

From #88

and userCloseFunc will be called to handle cleanup the resources created by
containerd shim. current rpc call will also return err. But these two step are
asynchronous.

after rpc call return err, upper application such as k8s may restart container.
but start may fail due to cleanup not finish, some resources not be released.
and this leaked resources leads to failed inplace-update the pod again.

If the cleanupAfterDeadShim doesn't finish, I think the task record is still there. The recreate with same ID will fail.

Please provide more information about your case, it will help. Thanks

REF: https://github.com/containerd/containerd/blob/0ad8c0a169fa8fc8ff87acb082a0c3b8045f653f/runtime/v2/manager.go#L166-L175

wlmxjm1 · 2021-07-16T07:14:12Z

If the cleanupAfterDeadShim doesn't finish, restart will fail cause the bundle path exits. ks8 will continuously restart the container until success. It will work, but we think it may better to solve this problem in ttrpc.

fuweid · 2021-07-16T07:17:44Z

I don't think it is problem because the bundle data is still here. The record is still there. And one more thing is that there is no way to restart container in kubernetes which only recreate.

wlmxjm1 mentioned this pull request Jun 30, 2021

client: add synchronize between userCloseFunc and rpc call #88

Closed

wlmxjm1 force-pushed the main branch from 38f2152 to f1d8894 Compare July 1, 2021 02:58

wlmxjm1 force-pushed the main branch from f1d8894 to 488fc1a Compare July 1, 2021 03:29

houstar mentioned this pull request Jul 15, 2021

client: add synchronize between userCloseFunc and rpc call containerd/cri#1644

Closed

AkihiroSuda reviewed Jul 15, 2021

View reviewed changes

fuweid requested changes Jul 15, 2021

View reviewed changes

wlmxjm1 closed this Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: add synchronize between userCloseFunc and rpc call #87

client: add synchronize between userCloseFunc and rpc call #87

wlmxjm1 commented Jun 30, 2021

AkihiroSuda Jul 15, 2021

wlmxjm1 Jul 15, 2021

fuweid Jul 15, 2021

wlmxjm1 Jul 16, 2021

fuweid Jul 16, 2021

fuweid left a comment

wlmxjm1 commented Jul 16, 2021

fuweid commented Jul 16, 2021

client: add synchronize between userCloseFunc and rpc call #87

client: add synchronize between userCloseFunc and rpc call #87

Conversation

wlmxjm1 commented Jun 30, 2021

AkihiroSuda Jul 15, 2021

Choose a reason for hiding this comment

wlmxjm1 Jul 15, 2021

Choose a reason for hiding this comment

fuweid Jul 15, 2021

Choose a reason for hiding this comment

wlmxjm1 Jul 16, 2021

Choose a reason for hiding this comment

fuweid Jul 16, 2021

Choose a reason for hiding this comment

fuweid left a comment

Choose a reason for hiding this comment

wlmxjm1 commented Jul 16, 2021

fuweid commented Jul 16, 2021