New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker integration: Exposing Docker socket to Traefik container is a serious security risk #4174

Open
codethief opened this Issue Nov 7, 2018 · 22 comments

Comments

@codethief

codethief commented Nov 7, 2018

Do you want to request a feature or report a bug?

Bug

What did you do?

I followed the official instructions to get Traefik running with Docker (and Let's Encrypt).

What did you expect to see?

I would have expected that the part of Traefik which communicates with the Docker daemon and updates Traefik's configuration accordingly when containers are started/stopped would happen in a different container, separate from the Traefik container as the latter is typically exposed to the internet.

What did you see instead?

The official docs instructed me to expose the Docker socket to the Traefik container itself. This is widely recognized to be a serious security issue, see 1, 2, 3, 4 and 5, as it basically means that anyone who manages to compromise Traefik obtains root on the host machine. Note that this is even worse than running a regular (non-Docker-integrated) reverse proxy outside any container directly on the host machine as such a proxy usually doesn't run as root.

Unfortunately, I couldn't find any other way in the docs to get Traefik set up with Docker.

Suggested solution

Move the part of Traefik that regularly pulls the container list from the Docker daemon and updates the Traefik configuration into a separate binary so that it can be run in a separate container, similar to what the nginx-proxy project suggests (see section "separate containers"). This seems to be the only solution that fully isolates the docker.sock-accessing part of the application from network requests. Compared to this, other solutions like using an authz plugin could, in theory, still be compromised.

PS: I am aware that some might consider my report a feature request, not a bug, as the current behavior & instructions seem to be the officially accepted way to integrate Traefik with Docker. However, I still think that any security issue – whether widely accepted or not – should 1) be considered a bug and 2) be at least mentioned in the official documentation.

@scriptninja

This comment has been minimized.

scriptninja commented Nov 9, 2018

I absolutely agree that this should be addressed more thoroughly. I just wanted to share here, for anyone who comes across it, what I'm currently using as something of a workaround. I'm running a minimally adapted version of https://github.com/Tecnativa/docker-socket-proxy as a standalone container on each swarm manager, connected to a docker network specific to the purpose, connecting the traefik service to that network as well, and pointing Traefik to the proxied port.
In this way I don't change the behavior on the host, I don't expose the socket over TCP except inside that one docker network, and I don't have to deal with configuring an authz plugin, but I allow the Traefik container to access only the endpoints it needs with only GET requests, and in addition to breaking out inside the Traefik container an attacker would also have to break either out of the container to the host without direct access to the socket, or across to the haproxy container.

Why a standalone container? ...because I have namespace mapping enabled and selinux enforcing, and I needed --userns=host for the proxy to access the socket. I'm not sure that's the only alternative, but it's the one I found. I'd like to revisit this and figure out a way I could do it with a manager-node-restricted service instead, but right now I don't have time to figure out if that's possible.

Why a slight modification instead of just using the approach as supplied by Tecnativa (or copying the config and using haproxy directly)? ...because I wanted to install socat in the image and modify the haproxy docker entrypoint to use socat to redirect the container's /dev/log socket to STDOUT. (Edit: I did not originally mention that this is because haproxy doesn't follow the general Docker practice of logging to STDOUT unless instructed otherwise.)

I'm sure this could be done with quite a few other approaches. In fact, at one point I was trying to figure out if I could accomplish this using nothing but socat and sed, but if that would have been possible at all, it was beyond my level of regex-fu.

@emilevauge

This comment has been minimized.

Member

emilevauge commented Nov 14, 2018

@codethief Thanks for reporting this.
I fully agree with you on the fact there is a lack in our doc on this topic.
Sometimes things seem obvious to few of us, but this is because we are dealing with this everyday. Which is not the case of all our users, so you are right, we should do something.
This lack of granularity of the Docker API is a pain, really :'(

  • We will update the documentation to warn our users with this, and add possible workarounds
  • We are working on a far better solution for production deployments that should be out soon ;)
@bitsofinfo

This comment has been minimized.

Contributor

bitsofinfo commented Nov 19, 2018

@scriptninja can you please share your "...I'm running a minimally adapted version..." that you are running? Thanks

@bitsofinfo

This comment has been minimized.

Contributor

bitsofinfo commented Nov 19, 2018

@emilevauge can you shed any light on what the ".....far better solution for production deployments that should be out soon ;)" is specifically for the swarm setup? i.e. does it incorporate some sort of sidecar like this? or involved this workaround? if not what is the new solution going to be?

@emilevauge

This comment has been minimized.

Member

emilevauge commented Nov 19, 2018

@bitsofinfo we will make an announcement soon. Stay tuned ;)

@bitsofinfo

This comment has been minimized.

Contributor

bitsofinfo commented Nov 19, 2018

hmmm

For anyone else coming across this, this appears to work

# Create network for local docker host api access
docker network create -d overlay dockersocket4traefiknet

# Create network for app
docker network create -d overlay myappnetwork

# Start NGINX dummy for test, this has no access to the `dockersocket4traefiknet`
docker service create \
  --network myappnetwork \
  --label traefik.frontend.rule='Host:mynginx' \
  --label traefik.enable=true \
  --label traefik.port=80 \
  --name nginx-test \
  nginx

# Start the docker-socket-proxy container for traefik bound to `dockersocket4traefiknet`
# Grant read only GET request access to SERVICES/NETWORKS/TASKs apis
# there are no published ports so only other services granted access on the 
# dockersocket4traefiknet network can hit this named service @ 2375
docker service create \
   --mode global \
   --constraint=node.role==manager \
   --network dockersocket4traefiknet \
   -e SERVICES=1 \
   -e NETWORKS=1 \
   -e TASKS=1 \
   --name dockersocket4traefik \
  --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
   tecnativa/docker-socket-proxy

# Start Traefik.. note it has access to 2 networks the `dockersocket4traefiknet` for swarm info 
# and the `myappnetwork` for containers to proxy to
docker service create \
  --name traefik-test \
  --mode global \
  --constraint=node.role==manager \
  --network myappnetwork \
  --network dockersocket4traefiknet \
  --publish 9080:80 \
  --publish 9999:8080 \
  --publish 9443:443 \
  traefik:1.6.6  \
  --entryPoints='Name:http Address::80' \
  --defaultentrypoints="http" \
  --retry \
  --debug=true \
  --logLevel=DEBUG \
  --docker \
  --docker.endpoint="tcp://dockersocket4traefik:2375" \
  --docker.swarmmode \
  --docker.domain=traefik \
  --docker.watch \
  --docker.exposedbydefault=false \
  --web \
  --web.statistics

# Dashboard shows service OK
http://localhost:9999/dashboard/

# Logs legit
docker service logs -tf traefik-test

# Access service OK
http://mynginx:9080/
@bitsofinfo

This comment was marked as off-topic.

Contributor

bitsofinfo commented Nov 19, 2018

@emilevauge can you point to any PRs for the changes in progress or is this part of a pay for solution?

@scriptninja

This comment has been minimized.

scriptninja commented Nov 20, 2018

@scriptninja can you please share your "...I'm running a minimally adapted version..." that you are running? Thanks

My fork has swapped out Tecnativa's build hooks for Gitlab CI and I'm using a more recent haproxy image as the base, but the functional difference consists entirely of RUN apk --no-cache add socat in the dockerfile and providing a copy of the docker-entrypoint.sh from the haproxy project with socat UNIX-RECV:/dev/log,mode=666 STDOUT & added to it.

@dduportal

This comment has been minimized.

Contributor

dduportal commented Nov 23, 2018

Hello everyone, you might be interested into this PR #4225 , which discuss security concerns with Docker and Swarm mode backends.

The content is based from a set of issues, including this one.

Thanks all for your feedbacks on this important topic, which we do value a lot!

@dduportal

This comment has been minimized.

@bitsofinfo

This comment has been minimized.

Contributor

bitsofinfo commented Nov 27, 2018

@dduportal i think its pretty good as its up top and not just one sentence. A working example however might be warranted.

@codethief

This comment has been minimized.

codethief commented Nov 29, 2018

@dduportal Thanks for adding those remarks to the docs! I think they are a very good start! Then again, I do think they are a bit vague. Consider e.g.:

If the Traefik processes (handling requests from the outside world) is attacked, then the attacker can access the Docker (or Swarm Mode) backend.

IMO it should be stressed that if Traefik gets compromised, the attacker effectively gains root access. A user reading the docs shouldn't have to go to one of the linked articles to figure this out.

Regarding the "security compensation":

The main security compensation is to expose the Docker socket over TCP, instead of the default Unix socket file. It allows different implementation levels of the AAA (Authentication, Authorization, Accounting) concepts, depending on your security assessment:

Two remarks here:

  1. The docs still don't provide any actionable advice or best practice, so the user is basically left alone with figuring this all out by going to one of the links.

  2. As mentioned in my original post, I don't think proxying / validating requests to the Docker API is anywhere comparable in terms of security to completely isolating the Docker socket (see the proposal in my post). The former basically just adds another layer of obstacles that a potential attacker has to overcome. However, in my opinion, the Docker socket shouldn't be available to internet-facing containers at all – neither directly nor indirectly through some intermediate proxy – and the docs should therefore clearly mention that the present options to "secure" the Docker socket through proxying are merely temporary workarounds until a more secure option is available (i.e. until the bug we're discussing here is solved). In particular, I would also put the link to the present Github issue in a more prominent location in the docs, say at the beginning of the "Security Considerations" section.

@ndeloof

This comment has been minimized.

ndeloof commented Nov 30, 2018

At CloudBees I've been investigating a comparable issue to grant build container access to the docker socket, which exposes the whole infrastructure.

A possible way to balance this risk is to use a docker API proxy to only allow legitimate API to be used by traefik (typically, allow /events). Such a sidecar container would need to run with high privileges so it can access docker socket, but then the traefik container itself can run with lower "safe" privileges. Sidecar container being only responsible to forward docker.sock to another container, can be implemented with a very minimalist runtime and as such limit attack surface.

@tomwerneruk

This comment has been minimized.

tomwerneruk commented Dec 6, 2018

I've been a bit concerned about potentially tieing a world facing container into my docker socket too.

I have put together a glorified shell script that helps decouple the docker sock and my traefik router. I am using it successfully in my lab. It works by having a traefik container attached to docker, purely to generate config. The script then pulls the config down via /api, and publishes it to my 'real' Traefik instance (which is not bound to Docker), but has --rest enabled. The key here is that the config is pushed to the internet facing container, not pulled, avoiding having to have connectivity to other components.

The script is designed to be provider agnostic (PROVIDERS could be a list), not sure if docker is the only backend that has this issue?

Docker image;

https://hub.docker.com/r/tomwerneruk/traefik-prism/

Comments welcome. Tom.

@liquidat

This comment has been minimized.

liquidat commented Dec 6, 2018

In case anyone is interested, I created an Ansible role which launches first a privileged, but secured tecnativa docker socket container and afterwards let traefik talk to it. The details can be found here:
https://github.com/liquidat/ansible-role-traefik

@BretFisher

This comment has been minimized.

BretFisher commented Dec 12, 2018

I've got an example of this setup doing all the swarm things needed. A few thoughts:

  1. I add :ro to the tecnativa/docker-socket-proxy mount so even that can't write to the socket.
  2. The verb and api route restrictions are great
  3. I put the traefik and socket-proxy on a dedicated overlay to reduce risk
  4. I added IPSec to the overlay network just because
  5. Don't use privileged mode on the socket-proxy (doesn't work in swarm services anyway). If you must use SELinux, better to configure that then run an elevated container.

I feel like this setup is simple enough, and lowers the risk enough that it's a pattern I'd like to try with all the other tools that need docker API access. In the real world, I've never seen a Swarm cluster without something mounting the socket for management and control over docker... so all the warnings about "don't mount the socket" tends to get thrown out the minute someone wants to run Traefik, Portainer, Swarmpit, etc. so I'm glad we're coming up with these alternative patterns to let people have options to control their environments and also keep them as secure as possible.

@ndeloof

This comment has been minimized.

ndeloof commented Dec 12, 2018

could be usefull to document the docker API traefik relies on.
It seems it only require (read) operations : ServerVersion, ContainerList, ContainerInspect, ServiceList, NetworkList, TaskList & Events. This could help bake an API filtering proxy, which would block all irrelevant API operations

@bitsofinfo

This comment has been minimized.

Contributor

bitsofinfo commented Dec 12, 2018

@BretFisher question: I see you also specify CONTAINERS=1 and SWARM=1 on your proxy settings... why is this? Does traefik need those? I've been testing with only

   -e SERVICES=1 \
   -e NETWORKS=1 \
   -e TASKS=1 \

and it appears to work fine.

@BretFisher

This comment has been minimized.

BretFisher commented Dec 12, 2018

This is just the start, and I'm glad for your testing! Based on yours and @ndeloof's feedback I'll check it out.

@codethief

This comment has been minimized.

codethief commented Dec 12, 2018

@BretFisher:

  1. I add :ro to the tecnativa/docker-socket-proxy mount so even that can't write to the socket.

Maybe I'm confusing something here but my understanding was that this doesn't prevent writing to a socket at all.

@ndeloof

This comment has been minimized.

ndeloof commented Dec 12, 2018

indeed, if you can't write the socket you can't send request to the docker API
AFAIK :ro on a unix domain socket just prevent you to rename/remove it

@BretFisher

This comment has been minimized.

BretFisher commented Dec 12, 2018

That makes complete sense! Amazing that I went all these years without realizing that. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment