New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: docker daemon's SPoF and hot upgrade issue #13884
Conversation
Please sign your commits following these rules: $ git clone -b "master" git@github.com:mudverma/docker.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f Ammending updates the existing PR. You DO NOT need to open a new one. |
Signed-off-by: Mudit Verma <mudit.f2004912@gmail.com>
I like the idea with proposal A, been thinking about it for a while. Decoupling containers from daemon API layer would be super-awesome. Having extra monitor process doesn't look like a big issue, it's going to be pretty lightweight anyway. Forward upgrades / rollbacks with preserved running containers should be considered as a top priority. |
Looked at this myself as well. |
@cpuguy83 can you describe me how upgrade 1.X -> 1.Y would work with a single monitor process? I can't see how new containers could take advantage of new features this way. Will it be 1 monitor per docker version? |
@cpuguy83 @bobrik One single monitor process for all containers will not work, specially in upgrades. We thought about this option but ruled it out later. We are not sure even if 1 monitor per docker version will work without it having all the daemon code which is required to spawn a container. Let's assume that we have two processes at a given instance. Init ----- Docker Daemon(v1) Now if we want to have a new container E spawned, how can we make the existing monitor its parent? Transfer of parentship is not allowed. In that case, monitor itself will have to keep all the code and run time to able to spawn a new container. Which will make it sort of another daemon in itself. The idea of monitor being just a light weight process which has nothing to do with what daemon does, will be lost. Pls correct me if I am wrong. |
+1 for proposal A. |
Thanks for a very well-written proposal, and reviving this topic @mudverma |
@mudverma Really well structured proposal, definitely going to add this as "the prototype" approach for newbies. Very interesting technically too, i'm looking forward to following it. |
Isn't the architecture the relatively easy part of this? This is a nice proposal (I like A), I'm just wondering about some of the details. When I pondered on it, the part that always brought me up short was the compatibility layer that would need to exist for handling running containers started by an older daemon and the oddities that would arise from starting a daemon with different arguments. Thoughts on an upgraded daemon:
Thoughts on a daemon started with different arguments:
Notes that fit into both of the above:
This is just scratching the surface! I'd start by using the proposed architecture to resolve the SPoF problem and revisit the hot upgrade problem later, enforcing this separation by killing all containers if a different daemon version is started or any daemon args are different. You're then in no worse position for upgrades, but you are resilient to daemon crashes/deadlocks. |
+1 for proposal A.... and I agree with @aidanhs |
@aidanhs I understand the situation. While it is not right to expect 20 version jumps to work, it is still worthwhile to have forward/backward compatibility for at-least 3-4 versions. All softwares have it. Otherwise, a simple patch to resolve some issue even in "docker search" command will also bring down the running containers. Idea is to provide more flexibility to the users and sysadmins. Of-course, they can decide what would be in their best interest, whether to bring down the containers on upgrade or not, but at-least this option should be given. Disruptive shutdown of containers should be the last resort, not first. On your second part, we might have to do more research on this. I will take a look and get back to you. Thanks for brining it up. |
+1 for proposal A |
Please don't +1 unless you have something meaningful to add here. |
Signed-off-by: Mudit Verma <mudit.f2004912@gmail.com>
Signed-off-by: Mudit Verma <mudit.f2004912@gmail.com>
/cc @prologic |
Please keep in mind that there's no need to comment in order to subscribe to notifications for an issue. There's a button for that. |
Would be interesting to see if this can be solved in the new OCF specification with runc |
One possible way to do this would be to use runc and have containers register with machinectl under systemd. Then when the docker daemon starts up it could query systemd/machinectl for any docker containers already running. This would plug docker daemon much more into the normal system framework. Since systemd is becoming the default even for ubunto, I think this is the best way forward, rather then creating some other kind of process manager. |
We can't tie this to systemd, not every system has systemd regardless of the major distros including it... maybe it wouldn't be incredibly difficult to support systemd if it exists... Wrapping runC with something docker daemon can attach to might go a long way. |
Well defining the protocol for what this connects to would be better, with perhaps an example, then we could do this the systemd way. docker not working well with systemd, in my opinion is a major weakness. |
Collective PR review with maintainers @mudverma Thanks for the highly detailed proposal! What we're seeing here is that all assumptions have to be reconsidered in light of runC. We hope Docker 1.9.0 will ship with a dependency on runC for container runtime. That means that neither proposal A (containers as children of init) nor proposal B (containers as children of the daemon) seem to be the way forward. RunC being the parent of the container process will open new doors for "hot upgrades" of the daemon. We definitely want that feature, and we definitely appreciate your work here! Can you please reformulate this proposal in terms of runC? Thanks 👍 |
@icecrime I don't see how anything changes with runC. Just read "runC" as "container" and it is all the same. What did I miss? |
Collective review @duglin @calavera @LK4D4 @tonistiigi @icecrime @jfrazelle I opened opencontainers/runc#185 to keep track of the issue. Here's what we suggest for the following steps:
We would gladly appreciate your help in those steps, and sorry again for our long review. |
@tiborvass Hi, Thanks for your review. I see that, after the introduction of RunC, lot of assumptions have to be reconsidered. |
@icecrime Thanks. So it is going to be Daemon ----> runc --> container A All containers will have separate parent process (RunC) and Daemon would the parent of all RunC processes? |
Yes!
I'd like to say: not necessarily (especially if we want to support use cases such as restarting the daemon and "reattaching" to the existing runC processes). |
Background:
As per the current architecture of Docker, all containers are spawned as children of docker daemon. This parent-child relationship between the docker daemon and containers provides a straight forward way of signalling/communication between them. However, this tight coupling between the containers and daemon results in some issues which are critical for containers' up-time, stability and high availability. Few of the issues with this approach are:
Both of these issues become even more important in production environments such as containers cloud where different containers running on a server might belong to same or different clients and might host highly available or stateless services. In these scenarios, a container downtime caused by external factors such as daemon's death/upgrade, is highly undesirable.
The 2nd issue was opened by @shykes in 2013, but this is still an open item #2658
Goals:
Findings:
Based on our investigation and experimentation with docker we found that once started, a container can function stand-alone and does not require daemon's presence for the execution of an encapsulated service.
We changed the daemon's code such that upon it's death, containers would become orphaned and shall be adopted by INIT. We ran the official mysql image. We were able to connect to, and use the mysql service even when the daemon was not running and container had become orphaned.
The namespaces (pid,network, ipc, mnt, uts) and cgroups etc, which are the building blocks for container creation and execution, continue to exist and function normally as they are provided by the linux kernel. Therefore, there does not seem to be any reason for a container to stop functioning when the daemon is not present.
Proposal:
In context of our findings and our goals, we propose two alternative design models through which containers will no longer be tightly coupled with the daemon either always (proposal A) or after daemon's death (proposal B). Both of these models require an external communication/signalling mechanism between the daemon and the containers.
Communication between containers and the daemon:
Both of these proposals require some sort of two-way communication mechanism between the daemon and the containers. For example, how would a daemon get notified when a container finishes its execution? Also, how would the daemon pass the commands to containers. In current design, daemon does "wait" on child process (container) in a go routine. This can be tackled by having a dedicated parent monitor process for each container, whose job will be to wait on the container and communicate with the daemon. Below describes the communication:
Implementation:
Proposal A:
Proposal B:
Proposal A vs Proposal B:
runtime: support for daemonize golang/go#227
Use Cases:
Commands' analysis:
Following table lists all the commands that will be (not) impacted and would require a code change.
Limitations:
We will have the overhead of extra monitor processes (as many as the number of running containers).