New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch set to support systemd as PID1 in container #13525
Conversation
i just can't condone a |
whats next |
--systemd is the primary init system on almost all linux distributions and ubunto has said they will adopt it. Alot of the changes are defaults for standard init systems including the proper signals to send to an init daemon. |
Ubuntu (sic) has already replaced Upstream with systemd: https://wiki.ubuntu.com/VividVervet/ReleaseNotes/#Boot_and_service_management It's just not in a long-term release yet. |
I have no problem with changing the option to something like --os or --init, but this patch set is basically following these standards defined by systemd maintainers. https://wiki.freedesktop.org/www/Software/systemd/ContainerInterface/ |
"Do not drop CAP_SYS_ADMIN from the container. " Seems secure.... On Friday, May 29, 2015, Daniel J Walsh notifications@github.com wrote:
|
@jfrazelle why the straw man? Or do you think that integrating one process manager (which the Docker daemon is) with system-wide service/process manager is bad? I don't get it. Especially in the context of this: So yeah, support natively both |
I think there is another way we can make everyone happy, and the Init flags On Friday, May 29, 2015, Causa Sui notifications@github.com wrote:
|
"Do not drop CAP_SYS_ADMIN from the container. " Not sure what you mean, I don't want CAP_SYS_ADMIN in the container. |
How about we go with --subsystem=systemd Then if some other init system comes along, they could implement it without haveing to add options. |
If you don't want it in the container then you should drop it On Friday, May 29, 2015, Daniel J Walsh <notifications@github.com
|
It is dropped by default. I don't think we are communicating well. Maybe we should go to IRC. I don't understand what you are trying to say. |
I think it's fine don't worry it's just interesting wording to warn users On Friday, May 29, 2015, Daniel J Walsh notifications@github.com wrote:
|
@jfrazelle Well, making everyone happy is utopic thought :) I think that a program that does OS level virtualization should be aware about the innards of the OS it is virtualizing to simply provide the most seamless experience possible. BTW: By default, when you create user namespace you get a full capability set, therefore it is suggested not to drop that capability. The fact that Docker drops it by default is nice, yet completely unrelated to the underlying kernel technologies, since Docker doesn't even use user namespace in the first place and therefore "relies" on the fact that CAP_SYS_ADMIN is inherited from the root account the Daemon is running as. |
Ok, you were reading lennarts article. I was wondering where these quotes were coming from. I want to run a non-privileged container, so we will not follow that part of his description as well as allowing the container to do mknod. I believe they currently disable Pr |
Honestly I don't see this as an issue (i.e. the idea) i.e. init subsystem specific flags . Init systems are critical piece for os/processes. I don't think anyone writes init system just like that. For example if we look at history of Linux(may be Unix too), how many times new init systems came to major Linux/GNU distributions. |
I don't think it makes sense for Docker to go down the path of supporting arbitrary init systems running inside containers, which I believe is the implication @jfrazelle was reacting to. That expectation of having to eventually adjust Docker to handle the different init systems is implied by the "--systemd" name for the flag, as well as by the alternative suggestions of making Docker aware of multiple init systems and configuring the container differently based on which init system is being selected. However, I do think it makes sense for Docker to better support the "multi-service container" use case, where an existing application that isn't particularly amenable to containerisation can be run inside a multi-service container, allow it to be managed and deployed using Docker based container management and orchestration tools, rather than having to have separate tooling to manage full VMs for such applications. With a flag name like "--multiservice", this feature would let Docker define for init system developers what they need to do to integrate with Docker as the init system running inside a multi-service container. The fact that systemd already supports working this way shows that the defined interface is sufficient to get a multi-service container working correctly. Other init systems are likely to work better in a container configured this way, and if they still don't work completely, then it will be up to the developers of those systems to either update their init system to support Docker properly (as the systemd developers have), or else to propose changes to the way Docker configures multi-service containers. |
Reworked this patch set to use --init=systemd rather then --systemd. Also added some more man page changes and completion code. This would allow additional init systems should they come along to add patches to run in their defined containerized world. I would like to at least get the ok on the CLI change, so we could announce this to the Fedora world. |
we're interrested in this too, thanks @rhatdan for this pull request. |
The main thing this patch does is sets the default graceful stop signal at create/run time. So why not name the parameter As far as I can see not just process managers, but many daemons use different signals for graceful shutdown, so it doesn't make sense to enforce SIGTERM (check out Handling Signals section). Also check out systemd and docker discussion of this bug:
Otherwise with the official
Systemd and others also use SIGPWR for graceful shutdown (if prepared correctly, see
To me it seems that it would make most sense to at least put the default graceful stop (now SIGTERM) and default graceful stop timeout before killing (now only in stop command) as settings into |
|
Sometimes I think people are clouding their dislike of systemd rather then doing what is best for the users. systemd is the default init system, which will be running on the majority of systems that docker will be running on going forward. Having a systemd running properly within the container, without having everyone need to learn some fancy recipe is the goal. Perhaps this should be moved to runc going forward, but we still would prefer docker to tell the container it will be running a traditional init system as pid1. |
4e63cde
to
09cb2fe
Compare
a87edd7
to
7031529
Compare
@@ -201,6 +202,14 @@ more advanced use case would be changing the host's hostname from a container. | |||
> **Note**: `--uts="host"` gives the container full access to change the | |||
> hostname of the host and is therefore considered insecure. | |||
|
|||
## INIT settings (--init) | |||
Default to docker standard method of running containers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults to Docker's standard method of running contianers.
Though this is still confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest some other wording?
## INIT settings (--init)
Specify alternate method of running containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write:
INIT settings (--init)
Enable a pre-configured profile for running init systems within containers.
Default: No profile is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're working on that, please add a blank line after the heading (## INIT..
), otherwise it may not render properly when generating the docs
Changed the docs |
Signed-off-by: Sally O'Malley <somalley@redhat.com> Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
This patch will use the new PreMount and PostMount hooks to "tar" up the contents of the base image on top of tmpfs mount points. We want to add support for /run and /tmp on tmpfs and Michael Chrosby idea to add --tmpfs PATH would require a patch like this. Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
This tells systemd what to name the /etc/machine-id file which links the journald within the container to to external log Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
…ystem mode If you are running a container with an init system like systemd as pid 1 this i option will modify the default container environment to follow the standard defined for that init system. Specifying --init=systemd will run the container with the following changes: /run mounted as a tmpfs /sys/fs/cgroup volume mounted from the host into the container read-only /var/log/journal/UUID volume mounted from the host into the container to allow journald data from the container to be seen on the host. Changes the default signals send from docker stop to the init process (systemd) Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan) Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
We've discussed this change with several maintainers and we reached the agreement that we do NOT want to add a flag for a specific init system to docker. We're going to provider the proper primitives to make this work, like allowing to stop a container with a specific signal and allowing to mount tmpfs. Closing this PR. Thank you all for your feedback. |
@calavera It will take longer to add each of these capabilities in a general way, but I agree that it will be worth the effort. |
I totally disagree with this decision. I don't see a clean way to do this without at least getting the container ID information into the container. The complexity of doing this correctly is going to be a huge headache to users, and going to cause lots of bugs when systemd/docker does not work correctly. I responded to emails discussing this issue with my concerns and then never heard back on them. |
@thaJeztah would you mind linkng me to @calavera's approach? I am not in the in club I guess ;) |
@thaJeztah Or do you simply refer to his closing comments? |
@timthelion sorry, yes, just meant the approach in his comment (not "in the loop" either, if there's any "loop", lol) Just hoping we can find a solution that's satisfactory to all involved here ❤️ |
@timthelion There are two PRs open already to address some of the issues that this change tried to solve: Stop signaling: #15307 We already mount the cgroups directory inside the container as read-only, and other issues raised here can be work around with options that we already have. |
The unification of the UUID is the one that can not be fixed. Having separate UUID to refer to the docker container, is a user nightmare. This patch along with machinectl patches basically allow a user to refer to a container based on the UUID. The end goal is to be able to do journalctl -M UUID And then would see the journal from inside of the container. This requieres RegisterMachine to register the UUID, /var/log/journal/UUID from the host mounted into the container, /etc/machine-id generated by systemd to match the UUID, which can only happen if $container_uuid=UUID before systemd starts. |
@rhatdan Then can you make a pull request that would create a |
I have had a pull request to do this for over a year but it is not that simple. Since we need to know the name before the container starts to create the /var/log/docker/UUID directory to mount into the container. We also need the RegisterMachine patch to be added to to docker to register the UUID before the container starts. |
Could you explain why please? I'd like to see this added for more efficient system level testing using containers running systemd rather than full VM's running systemd. I've read this comment thread a few times and it's not clear why the patch has been rejected. |
This patch set combines a few patches to allow systemd to fully function in a non privileged container
I am closeing down some pull requests for /run and journald and putting them all behind a
docker run --systemd option.
This patch set includes a couple of patches that should get merged without using the systemd pull.