New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proposal] Shim API v2 #2426
Comments
Right now I think the shim uses ttrpc. Would you expect this to move to standard gRPC? |
LGTM overall other than 3 questions:
|
@jstarks we talked about this at our Dockercon meetup. We want to support both ttrpc and grpc. One things that we need to do on the windows side is get ttrpc working over named pipes but this should be pretty effortless. |
|
@crosbymichael @jterry75 already has named pipes working with ttrpc. I think he can prepare a PR soon. |
@jstarks perfect! |
I like the proposal. A few questions:
Thanks! |
|
We have an ongoing branch for this development here: |
One thing we have to figureout is how to handle OOM events. Should we use the events API for this or a grpc streaming endpoint for oom events? @Random-Liu @dmcgowan ? service Task {
rpc OOM(OOMRequest) returns (stream OOMResponse);
} |
@crosbymichael It seems that ttrpc doesn't support streaming yet. https://github.com/containerd/ttrpc/blob/fa6c68143601db58b3636ee9948aad2fe08ed1ea/services.go#L36 The current event api + containerd binary approach works for me as well. At least we understand its performance better. :) |
@crosbymichael how many different event types are there? I think streaming is always better, does the callback end up connection to containerd through grpc and sending the event? I wonder how expensive implementing streaming is... any ideas @stevvooe ? |
@dmcgowan the shim only does exit events right now, oom would be the second type the shim emits |
@jterry75 for your second question the other day. I just worked on a solution for it: crosbymichael@5c678b6#diff-cd9adfde13a79bd4e2b11470dbabf55fR36 Basically we need the shim to have two commands as its binary. One that serves the rpc api and one that can cleanup/delete when a previous shim died. This way the runtime can unmount or cleanup any internal state from a crashed shim. |
@crosbymichael - Yes this is exactly what I was thinking to shift the responsibility of partial cleanup one layer lower to a shim. As for my earlier question about signals. It's not just debugging: see here. Today the port that I have done for containerd-shim to Windows only handles the debugging case via events how the Docker daemon does it but I would like to have debugging/reaping, etc done via API call rather than signals in all of these cases. -Justin |
Hi @crosbymichael, I have a few questions to clarify how this API is going to be used.
I might be completely off topic here, let me know if that's the case :) |
@sboeuf The new
You need to implement your own containerd-shim, which is a per container process. If that is a daemon, we haven't thought about it yet /cc @crosbymichael.
It doesn't matter. As long as the new shim api is implemented, containerd should be able to use it. In theory, it is possible to use libcontainer directly to implement the shim, but I guess for linux container, using
That would be ideal, but I'm not sure whether we can do that now. OCI runtime spec is still the standard runtime config, and there is no pod concept there yet. Unless we introduce another pod level spec in containerd.
You implement the new shim api, so you can do input/output however you want as long as you expose required pipes on the host.
It doesn't matter. As long as you keep it consistent across the shim api implementation: always internal or always host. Containerd and containerd/cri will never assume the pid exists on the host. |
@Random-Liu thanks for those answers, I think they clarify a lot. Here are more questions based on your replies:
Now I understand but I am wondering how this would work with Kata Containers for instance. This proposal tries to address the issue of VM based container runtimes, and this would work perfectly in a simple case like
Yes I understand, particularly because there would be no direct benefit for the
Where do they come from ? How do you know which pipes the container should expose ?
Ok, if the consumer of this API will not assume the PIDs exist on the host, we should definitely provide the PIDs as they're seen by the container. I am saying this because I guess that As a side note, @crosbymichael is presenting to Kata Architecture Committee meeting on Monday July 9th at 8AM PST. Feel free to join if you want to talk with Kata community about it :) |
Forgive my ignorance here, but couldn't this be handled by In the API definition, |
@cpuguy83 And even if we were able to pass any kind of information through |
You haven't explained why containerd needs to be pod aware. In my mind saying containerd needs to be pod aware is akin to saying Linux needs to be pod aware. A pod is a scheduling concept to ensure two workloads land on the same node and have certain API's between each member of a pod (primarily 127.0.0.1 being the same in each member of a pod). cri(-containerd) is pod aware and can make use of everything containerd has to offer (and then some), which is really way more than what Docker offers even. Or is the issue more that this is a failure of the OCI spec to abstract more than a traditional linux container? |
@cpuguy83 Ok let me try to be clearer (I hope I will be :)). Now, if we take a look at the proposal, which is to avoid having states being stored in case of VM based container runtimes, we would need to provide a pod level shim API. Otherwise, we would end up (from |
@crosbymichael to follow up after the discussion during the Kata Containers architecture committee meeting, here are a few thoughts: What would it take to make containerd "pod aware"? How things should be handled differently? I am suggesting this because it would make a lot of sense from VM based shim implementation to handle everything at the pod level (which is the VM level), and I truly believe this would not modify |
@sboeuf It's part of the shim api https://github.com/containerd/containerd/blob/master/runtime/shim/v1/shim.proto#L60-L62.
@sboeuf From what I can see, the shim api is pretty generic, every request has an id associated. It means that from the api perspective, it doesn't matter whether it is a daemon, a per-container shim or a per-pod shim behind. The only thing that matters is how the shim is started, stopped and cleaned up, which we need to resolve for shim v2 api anyway. A simple idea is that we can define a protocol to start/stop the shim process, e.g.
Then for the 3 scenarios above:
In this way, the logic doesn't need to reside in containerd, everything can be done in the shim binary. And it provides enough flexibility for people to innovate. :) @crosbymichael Do you see any problem with this proposal? |
@Random-Liu the scenarios you're describing looks perfectly good to me ! But I think containerd still needs to be aware at some point, otherwise how can it decide to start (or not) a new shim ? |
It is decided in the containerd-shim binary, in the
We are not getting rid of the pod notion from containerd. We just try to contain the pod notion in plugins, and actually other containerd features should also be contained in plugins. |
Sorry, by "getting rid of", I meant "pod agnostic". I agree it's better if we can keep I think I am misunderstanding something here, and that's why I need some clarification about the sequence. With this new API, what happens when |
I think the previous cleanup logic is to call I believe |
@Random-Liu ya, we already have a |
@Random-Liu ah yes good point, this would be an extra security. |
@crosbymichael what's the full picture of this v2 ?
Is that it, or am I missing some parts ? |
@sboeuf yes, that is it |
@crosbymichael thanks for the clarification |
Yeah, that is good enough. I don't know we have a |
@Random-Liu @sboeuf please take a look at this commit: |
@crosbymichael will do ! |
@crosbymichael 2 comments, but LGTM |
/cc @resouer |
@Random-Liu @sboeuf @crosbymichael With this Or do you expect the kata-shim to start a new grpc server with a new grpc address for each container created? |
There is an id associated with each request. The id is the container id.
But it seems that we are missing a container id in the exec request.
…On Wed, Jul 11, 2018 at 8:58 PM Peng Tao ***@***.***> wrote:
@Random-Liu <https://github.com/Random-Liu> @sboeuf
<https://github.com/sboeuf> @crosbymichael
<https://github.com/crosbymichael> With this containerd-shim start/stop
design, it seems that the shim V2 grpc requests should also be extended to
indicate each request is targeting which container, so that we can use the
same grpc server to handle different containers of the same sandbox in
kata-shim. WDYT?
Or do you expect the kata-shim to start a new grpc server with a new grpc
address for each container created?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2426 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFjVuyL5jPo4a-2Pcj84_MIDzABWplcqks5uFslugaJpZM4U6OnQ>
.
|
@Random-Liu hmm, it already has a container id and an exec id in the exec request. The design looks good to me. Thanks! BTW, one implication with the process related requests is that |
Its up to the runtime to error if there is a collision with an exec_id, especially with this model going forward. Containerd keeps very minimal state about running containers and it's up to the shims to hold and validate state on the system. |
@Random-Liu I just added any missing IDs and the containerd id on Exec in my last commit for my branch |
As long as we get the |
Then would it make sense to let the shim create and return an exec id instead? Because in current API, by the time a shim finds out the collision, it can do nothing but fail the operation and containerd would have to retry with a different exec_id because the exec_id is created by containerd rather than passed by the upper layers. It's just a minor issue though. Mostly not worth the engineer efforts since the chance of collision is negligible. @sboeuf Except for the |
exec_id's come from the user, not containerd so you have to fail if they collide. The users usually have this figured out as they are for specific things, healthchecks, or random ids. Its not up to us. @bergwolf I'll have to update those methods that work on exec'd processes to take both. |
@bergwolf ah yes good point, they're tied to a process, so it'd be cool to have the |
@crosbymichael Ah, you are right. I was looking at the cri plugin code actually and |
Ok, i updated my PR fixing the last rpcs with exec and container ids |
This looks great. Thanks for explaining the issue @sboeuf! |
Shim API for Runtimes
Authors:
More VM based runtimes have internal state and more abstract actions.
A CLI approach introduces issues with state management.
This proposal introduces a shim API for solving these state issues at the shim layer in containerd.
The goals is to provide an API that various runtimes can implement to add support in containerd while
still having control of state and abstract actions.
v2 Task(shim) API
Previous Shim API
New API
Shim Inputs
Bundle
The OCI bundle is still the main source of configuration for shims.
The shim should not write to any other location on disk except the bundle.
The bundle can be used as a workspace for the shim with any additional state.
Configuration
Configuration for shims can be passed via Opts or defaults defined within
the containerd
/etc/containerd/config.toml
.Shim Outputs
GRPC
The shim grpc service is the main source of interaction with the shim.
The shim is also expected to write a
shim.pid
file for containerd to read in caseit is no longer able to access the shim via the GRPC api.
This pid will be used to
SIGKILL
the shim in case of a forceful shutdown.UX
The existing runtime will continue to work for upgrades where containers are running under v1 shims.
The text was updated successfully, but these errors were encountered: