Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

*: shared namespace execution modes #1433

Open
jonboulle opened this issue Sep 17, 2015 · 10 comments
Open

*: shared namespace execution modes #1433

jonboulle opened this issue Sep 17, 2015 · 10 comments

Comments

@jonboulle
Copy link
Contributor

There are various use cases where running a full pod (with all of the isolation and lifecycle that implies) isn't desirable and users simply want to perform a "simpler" execution of a container image. In the simplest case this is just using rkt as a package manager - discovering/downloading/extracting an image onto the filesystem, chrooting in, and execing the desired executable. The rkt fly prototype (#1072, #1416) implements a very basic example of this.

Obviously in this mode there is (aside from the filesystem) no isolation whatsoever, in terms of either resources or namespaces - it is just another process executing directly on the host. But different users may have more nuanced requirements, like sharing some namespaces and not others with the host. One example is #1046 about using the host's PID namespace. Another use case would be running the CNI networking plugins using rkt, rather than bundling them into it as is done today. system-nspawn's --share-system flag provides one other example of a possible execution mode that might be desirable.

This is a tracker ticket to start fleshing out some example use cases and design work.

@jonboulle
Copy link
Contributor Author

/cc @eyakubovich

/cc @ppalucki re: #1072 (comment)

@steveej
Copy link
Contributor

steveej commented Sep 19, 2015

Namespace specific tickets

Namespace Related Issues/PRs Status Syntax shared Syntax unshared
UTS ? ? ? ?
PID #1046 WIP ? ?
Network #1418 WIP ? ?
User #986 experimental ? ?
Mount TODO TODO

This was referenced Sep 21, 2015
@jonboulle jonboulle added this to the v0.10.0 milestone Oct 8, 2015
@iaguis iaguis modified the milestones: v0.12.0, v0.10.0 Oct 20, 2015
@jonboulle jonboulle modified the milestones: v0.10.0, v0.12.0, v0.11.0 Oct 21, 2015
@alban alban modified the milestones: v0.12.0, v0.11.0 Oct 23, 2015
@chancez
Copy link
Contributor

chancez commented Oct 23, 2015

Just chiming in on a use-case for this:

I want a way to use rkt as a way to distribute and run things like monitoring agents on my hosts. Similarly, any many types of debugging tools would need minimal isolation in order to inspect system state, and state of other containers, but the usefulness of the "packaging/distribution" of rkt would still shine here.

Another thing I was thinking of was it simply makes the transition to containers easier when certain applications (like docker/kubelet) misbehave when running inside a container. For example, we could begin shipping Docker as an ACI and iteratively work on making it work inside of the other namespaces. It means we could have something between "docker runs on the host" or "docker runs in a container", when we can't get the latter to work.

@steveej
Copy link
Contributor

steveej commented Nov 11, 2015

@ecnahc515 I've been having very similar thoughts lately.

Every namespace the container shares removes isolation. Sharing all namespaces would practically mean to run an application that has been installed/downloaded as an ACI instead of using any other packaging/deployment manager.

Cherry-picking namespace isolation is not supported by systemd-nspawn, it's either all (default, also what rkt is currently using) or nothing (--share-system). If we wanted to switch to the latter, we couldn't use --boot anymore, which we currently do to run systemd inside the container and eventually start the apps as services.

We need to investigate the following options for gaining fine-grained namespace control:

Investigations in a GDoc

@alban
Copy link
Member

alban commented Nov 19, 2015

@n0rad: can you explain the use cases you would have for rkt fly here? Would systemd-nspawn --share-system --capability=all be enough for you? Is it fine without pre-start/post-stop eventHandlers in your use cases?

@n0rad
Copy link

n0rad commented Nov 19, 2015

Hi there,

We have some use cases that is not working with RKT, like sysdig and some dell hardware tools. We also had quite the same issue while running chef and think we will probably have it while running mesos.

For the moment we are getting around this by using a unit doing :

[Service]
ExecStartPre=/opt/bin/rkt --insecure-skip-verify fetch example.com/aci-omsa --no-store
ExecStartPre=/opt/bin/rkt image render --overwrite  example.com/aci-omsa  /opt/aci/omsa
ExecStart=/usr/bin/systemd-nspawn  --directory="/opt/aci/omsa/rootfs" --capability=all --bind=/dev --share-system --bind=/lib/modules --user=root bash -c "/cnt/bin/prestart.sh && /opt/dell/start.sh"
KillMode=mixed
Restart=always

And some script doing the same on demand for tools like sysdig
Note that we have to bind some host directories like /dev and /lib/modules

We will probably need at least pre-start since CNT, that build the ACI, rely on it to do templating and prepare the run. We have to call it manually in the systemd-nspawn command with our current system.

@jonboulle
Copy link
Contributor Author

Primary use case for initial mode here is going to be to run Docker, kubelet, rkt within rkt.
/cc @aaronlevy @steveej

@alban alban modified the milestones: v0.13.0, v0.12.0 Nov 26, 2015
@alban
Copy link
Member

alban commented Dec 18, 2015

Fixed by #1833.

@alban alban closed this as completed Dec 18, 2015
@jonboulle
Copy link
Contributor Author

I think #1833 solves one particular use case but there's still some more to be teased out here.

@jonboulle jonboulle reopened this Dec 18, 2015
@alban alban modified the milestones: v0.16.0, v0.14.0 Dec 18, 2015
@alban
Copy link
Member

alban commented Apr 7, 2016

I asked systemd-nspawn upstream: systemd/systemd#2982

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants