*: shared namespace execution modes #1433

jonboulle · 2015-09-17T22:48:47Z

There are various use cases where running a full pod (with all of the isolation and lifecycle that implies) isn't desirable and users simply want to perform a "simpler" execution of a container image. In the simplest case this is just using rkt as a package manager - discovering/downloading/extracting an image onto the filesystem, chrooting in, and execing the desired executable. The rkt fly prototype (#1072, #1416) implements a very basic example of this.

Obviously in this mode there is (aside from the filesystem) no isolation whatsoever, in terms of either resources or namespaces - it is just another process executing directly on the host. But different users may have more nuanced requirements, like sharing some namespaces and not others with the host. One example is #1046 about using the host's PID namespace. Another use case would be running the CNI networking plugins using rkt, rather than bundling them into it as is done today. system-nspawn's --share-system flag provides one other example of a possible execution mode that might be desirable.

This is a tracker ticket to start fleshing out some example use cases and design work.

The text was updated successfully, but these errors were encountered:

jonboulle · 2015-09-17T22:49:25Z

/cc @eyakubovich

/cc @ppalucki re: #1072 (comment)

steveej · 2015-09-19T07:39:49Z

Namespace specific tickets

Namespace	Related Issues/PRs	Status	Syntax shared	Syntax unshared
UTS	?	?	?	?
PID	#1046	WIP	?	?
Network	#1418	WIP	?	?
User	#986	experimental	?	?
Mount	TODO	TODO

chancez · 2015-10-23T18:55:36Z

Just chiming in on a use-case for this:

I want a way to use rkt as a way to distribute and run things like monitoring agents on my hosts. Similarly, any many types of debugging tools would need minimal isolation in order to inspect system state, and state of other containers, but the usefulness of the "packaging/distribution" of rkt would still shine here.

Another thing I was thinking of was it simply makes the transition to containers easier when certain applications (like docker/kubelet) misbehave when running inside a container. For example, we could begin shipping Docker as an ACI and iteratively work on making it work inside of the other namespaces. It means we could have something between "docker runs on the host" or "docker runs in a container", when we can't get the latter to work.

steveej · 2015-11-11T10:49:14Z

@ecnahc515 I've been having very similar thoughts lately.

Every namespace the container shares removes isolation. Sharing all namespaces would practically mean to run an application that has been installed/downloaded as an ACI instead of using any other packaging/deployment manager.

Cherry-picking namespace isolation is not supported by systemd-nspawn, it's either all (default, also what rkt is currently using) or nothing (--share-system). If we wanted to switch to the latter, we couldn't use --boot anymore, which we currently do to run systemd inside the container and eventually start the apps as services.

We need to investigate the following options for gaining fine-grained namespace control:

Patch systemd-nspawn

OR
A new program for
- setup chroot, cgroups, requested namespaces
- exec the applications
- Relates to stage1: idea of using pure golang unprivileged containers (unc) as execution engine #1318

Investigations in a GDoc

alban · 2015-11-19T09:39:53Z

@n0rad: can you explain the use cases you would have for rkt fly here? Would systemd-nspawn --share-system --capability=all be enough for you? Is it fine without pre-start/post-stop eventHandlers in your use cases?

n0rad · 2015-11-19T10:20:51Z

Hi there,

We have some use cases that is not working with RKT, like sysdig and some dell hardware tools. We also had quite the same issue while running chef and think we will probably have it while running mesos.

For the moment we are getting around this by using a unit doing :

[Service]
ExecStartPre=/opt/bin/rkt --insecure-skip-verify fetch example.com/aci-omsa --no-store
ExecStartPre=/opt/bin/rkt image render --overwrite  example.com/aci-omsa  /opt/aci/omsa
ExecStart=/usr/bin/systemd-nspawn  --directory="/opt/aci/omsa/rootfs" --capability=all --bind=/dev --share-system --bind=/lib/modules --user=root bash -c "/cnt/bin/prestart.sh && /opt/dell/start.sh"
KillMode=mixed
Restart=always

And some script doing the same on demand for tools like sysdig
Note that we have to bind some host directories like /dev and /lib/modules

We will probably need at least pre-start since CNT, that build the ACI, rely on it to do templating and prepare the run. We have to call it manually in the systemd-nspawn command with our current system.

jonboulle · 2015-11-25T17:23:24Z

Primary use case for initial mode here is going to be to run Docker, kubelet, rkt within rkt.
/cc @aaronlevy @steveej

alban · 2015-12-18T21:52:15Z

Fixed by #1833.

jonboulle · 2015-12-18T21:55:46Z

I think #1833 solves one particular use case but there's still some more to be teased out here.

alban · 2016-04-07T13:29:54Z

I asked systemd-nspawn upstream: systemd/systemd#2982

jonboulle added area/usability kind/design labels Sep 17, 2015

jonboulle assigned steveej Sep 17, 2015

jonboulle added the priority/P1 label Sep 17, 2015

This was referenced Sep 21, 2015

Network plugins as ACIs #541

Open

rkt: add rkt fly command #1416

Closed

jonboulle added this to the v0.10.0 milestone Oct 8, 2015

iaguis modified the milestones: v0.12.0, v0.10.0 Oct 20, 2015

jonboulle modified the milestones: v0.10.0, v0.12.0, v0.11.0 Oct 21, 2015

alban modified the milestones: v0.12.0, v0.11.0 Oct 23, 2015

alban modified the milestones: v0.13.0, v0.12.0 Nov 26, 2015

alban closed this as completed Dec 18, 2015

jonboulle reopened this Dec 18, 2015

alban modified the milestones: v0.16.0, v0.14.0 Dec 18, 2015

iaguis modified the milestones: v1.0.0, v0.16.0 Jan 19, 2016

iaguis modified the milestones: v1+, v1.0.0 Jan 26, 2016

alban mentioned this issue Apr 7, 2016

[RFC] nspawn: add --share-ipc and --share-uts systemd/systemd#2982

Closed

lucab mentioned this issue Aug 23, 2016

nspawn: split down SYSTEMD_NSPAWN_SHARE_SYSTEM systemd/systemd#4023

Merged

lucab unassigned steveej Apr 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: shared namespace execution modes #1433

*: shared namespace execution modes #1433

jonboulle commented Sep 17, 2015

jonboulle commented Sep 17, 2015

steveej commented Sep 19, 2015

chancez commented Oct 23, 2015

steveej commented Nov 11, 2015

alban commented Nov 19, 2015

n0rad commented Nov 19, 2015

jonboulle commented Nov 25, 2015

alban commented Dec 18, 2015

jonboulle commented Dec 18, 2015

alban commented Apr 7, 2016

*: shared namespace execution modes #1433

*: shared namespace execution modes #1433

Comments

jonboulle commented Sep 17, 2015

jonboulle commented Sep 17, 2015

steveej commented Sep 19, 2015

chancez commented Oct 23, 2015

steveej commented Nov 11, 2015

alban commented Nov 19, 2015

n0rad commented Nov 19, 2015

jonboulle commented Nov 25, 2015

alban commented Dec 18, 2015

jonboulle commented Dec 18, 2015

alban commented Apr 7, 2016