Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider not mounting docker.sock on each container #11

Closed
ScottG489 opened this issue Nov 3, 2020 · 4 comments
Closed

Consider not mounting docker.sock on each container #11

ScottG489 opened this issue Nov 3, 2020 · 4 comments
Labels
bug Something isn't working question Further information is requested

Comments

@ScottG489
Copy link
Owner

This was always a security issue but is now more urgent/required due to secrets being introduced with the completion of #10. Here is some relevant information:

However, one big issue arises from the fact that all images that are run are privileged and can control the host dockerd (since docker.sock is mounted on all run images). This means that they have full access to run anything on the host dockerd and thus would have access to any volume and view anything stored in these volumes, such as a secret. Since containers are allowed to run docker freely there is little that can be done about this right now.

I think a longer term solution would be to not run all images with docker.sock mounted. This would be safer in general since doing this in the first place is a bad idea for security reasons. However, this will also break current builds that expect this to be allowed and do things like building docker images within their builds.

One solution to this would be to require builds be more "modular" and instead call specific docker images like the "docker-build-push" image to handle docker-related tasks.

For instance, in a build, an application could build it's binary and put it on a docker named volume (unrelated to secrets) then quit. Then an approved/privileged image like docker-build-push could mount that volume and run the docker commands that it needs to. This would probably require some kind of whitelist on the server so that only specified images would be allowed to run as privileged.

The benefit of mounting docker.sock is strong easy of use. This allows for users to run docker commands within their containers which is a completely valid use case. I personally am using it often to build and push application images from project build containers.

Not mounting the docker.sock would mean some substantial work would have to be done to not lose the ability to do docker related tasks. The work needed for this should be discussed in a separate issue. However, to summarize some ideas here:

  • Allow for native support to do building and pushing (running? or too unsafe?). What would this look like?
  • Allow for "whitelisted" images which are allowed to do docker operations (via having the docker.sock mounted). This would be fairly easy to implement but isn't idea because it would not scale well.
  • Allow for special purpose images to be "whitelisted" which would handle docker related tasks specifically (e.g. the docker-build-push image). This could work by building your application as normal, putting the artifact on a shared volume, then mounting that volume onto the whitelisted image and it would do the necessary docker operations. Another, possibly less versatile, option would be to publish your built artifact and then download it during your docker build.

Whatever the implementation, great care needs to be taken so that the docker operations will at least not be able to access the secrets volumes.

This should be a configuration option rather than something hard-coded. If users completely trust all jobs running on the server then it's fine to allow it.

@ScottG489 ScottG489 added bug Something isn't working question Further information is requested labels Nov 3, 2020
@ScottG489
Copy link
Owner Author

See this original post on dind that has been updated recently with some information that help solve this problem more cleanly:

https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/

In particular this project looks promising:

https://github.com/nestybox/sysbox

@ScottG489
Copy link
Owner Author

I made some progress on this.

Sysbox is very promising. I have successfully run nested containers completely encapsulated from other containers. How I got it working now is that docker.sock is mounted on the conjob container (as it was before) and then all containers run by conjob are run with the sysbox runtime (-runtime=sysbox-runc ). You can use the runtime to run an image with docker installed and at runtime run dockerd (i.e. dockerd > /var/log/dockerd.log 2>&1 &) and then run anything you need with docker from there.

However, once inside a sysbox runtime container you won't be able to nest more sysbox runtime containers. If you need more nested containers, mounting docker.sock DOES work, but you'll be sharing the dockerd instance running on the systemd container.

Here's an example of what the nested container layers would look like:

  1. Container with docker.sock mounted
  2. Container with the sysbox runtime and dockerd running
  3. Container with the docker.sock from # 2 mounted.

After # 2, you'll no longer able to run any more containers with the sysbox runtime. So any container spawned from # 2, will share the same dockerd (via mounting `docker.sock). From the host you'll be able to see # 1 and # 2 but nothing nested further down. From # 2 you can also run as many concurrent containers as you want as well.

This should solve the problem of jobs being aware of each other. The only limitation is if a job for some reason also needed nested containers to be unaware of each other and that wouldn't be supported. However, I see that being very unlikely for now.


The next problem was that for some reason, when I specify this runtime inside of conjob with the docker lib I'm using, no output is returned. Attaching to the container instead and specifying the right log parameters seemed to work. However, I decided to go with the approach of using docker wait instead. This may cause problems later down the line if I decide to try to bring back logging but that's a low priority right now.

@ScottG489
Copy link
Owner Author

ScottG489 commented Nov 16, 2020

The next issue was getting the sysbox runtime working in AWS. I have yet to find an AMI which has the shiftfs kernel module. However, after discussions in the nestybox/sysbox/issues/120 issue and finally the nestybox/sysbox/discussions/121 discussion, I was able to get it working essentially by specifying the --userns=host flag on the conjob container.

The server didn't support shiftfs, so the sysbox install had to enable userns-remap in the daemon. This in turn makes all containers have certain limitations, one of which meant I was no longer able to access the docker.sock even after mounting it. Using the --userns=host flag essentially reverts it back to normal behavior (read the discussion mentioned above for small differences).

Now everything seems to be working as expected. A slightly cleaner solution is outlined in this comment in the sysbox discussion: nestybox/sysbox/discussions/121#discussioncomment-130136.
This should install shiftfs on the AMI which would mean the --userns=host wouldn't be necessary.

@ScottG489
Copy link
Owner Author

Closed with 04165c8 and ad9b2fb.

There is a lot of other follow up with that needed to be done as well such starting dockerd in the build and updating other scripts like test.sh. All projects that use conjob will also need to point to the new server and update their builds accordingly.

See also this follow up issue for installing shiftfs on the AMI: #16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant