Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker_swarm_service fails because of default "user: root" #49199

Closed
fchiacchiaretta opened this issue Nov 27, 2018 · 18 comments · Fixed by #51110
Closed

docker_swarm_service fails because of default "user: root" #49199

fchiacchiaretta opened this issue Nov 27, 2018 · 18 comments · Fixed by #51110
Labels
affects_2.7 This issue/PR affects Ansible v2.7 bug This issue/PR relates to a bug. cloud docker module This issue/PR relates to a module. python3 support:community This issue/PR relates to code supported by the Ansible community.

Comments

@fchiacchiaretta
Copy link

SUMMARY

I'm trying to start a service (Traefik Load Balancer) via docker_swarm_service and it fails because of user: root being passed by default in the invocation.
I can properly start the service if I specify user: '' or user: (passes as user: null in the invocation), but service declaration is not idempotent, every time I run the deploy I get

[...]
    "changed": true,
    "changes": [
        "user"
    ],
[...]
ISSUE TYPE
  • Bug Report
COMPONENT NAME

docker_swarm_service

ANSIBLE VERSION
ansible 2.7.2
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/fchiacchiaretta/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 3.7.1 (default, Nov 23 2018, 10:01:49) [GCC 8.2.1 20181105 (Red Hat 8.2.1-5)]
CONFIGURATION
OS / ENVIRONMENT

Control station OS: Fedora 29
Target OS: Debian 8.11 Jessie
Target docker version:

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:29 2018
  OS/Arch:          linux/amd64
  Experimental:     false
STEPS TO REPRODUCE

Start a traefik service:

  - name: Create traefik LB on manager node
    docker_swarm_service:
      debug: yes
      name: traefik-lb
      image: 'traefik:1.7'
      user:
      constraints:
      - "node.role==manager"
      networks:
      - traefik-net
      publish:
      - published_port: 8080
        target_port: 8080
      - published_port: 8081
        target_port: 8081
      args:
      - "--defaultEntryPoints=http"
      - "--entryPoints=Name:http Address::8080"
      - "--entryPoints=Name:mgt Address::8081"
      - "--docker"
      - "--docker.swarmMode"
      - "--docker.domain=traefik"
      - "--docker.watch"
      - "--api"
      - "--api.entrypoint=mgt"
      - "--api.statistics.recentErrors=100"
      mounts:
      - source: "/var/run/docker.sock"
        target: "/var/run/docker.sock"
        type: bind
EXPECTED RESULTS

I can start a service without specifying user parameter

ACTUAL RESULTS

Service start fails. If I specify user: '' or user: (passes as user: null in the invocation) it succeeds, but it is not idempotent anymore.

Best,
Federico Chiacchiaretta

@ansibot
Copy link
Contributor

ansibot commented Nov 27, 2018

Hi @fchiacchiaretta, thank you for submitting this issue!

click here for bot help

@ansibot
Copy link
Contributor

ansibot commented Nov 27, 2018

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibot
Copy link
Contributor

ansibot commented Nov 27, 2018

@ansibot ansibot added affects_2.7 This issue/PR affects Ansible v2.7 bug This issue/PR relates to a bug. cloud docker module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. python3 support:community This issue/PR relates to code supported by the Ansible community. labels Nov 27, 2018
@felixfontein
Copy link
Contributor

From your analysis, it sounds like it could be fixed by changing if self.user != os.user: to if self.user and self.user != os.user:. @dariko @jwitko what do you think?

@ansibot ansibot removed the needs_triage Needs a first human triage before being processed. label Nov 27, 2018
@dariko
Copy link
Contributor

dariko commented Nov 27, 2018

@fchiacchiaretta Thanks for your report!
Can you detail the failure you get when setting user: root?
Right now I can't reproduce the issue with the same ansible and docker versions:

  • setting user: root or omitting user has the service correctly created and unchanged in a second execution with the same parameters
  • setting user: has the service correctly created but the second execution reports the service as changed

I put the tasks and logs I used here.
What docker python library version are you using?

@felixfontein Yes, that should solve the idempotency issue; I'm wondering if, considering how the default for a service created without a user parameter is an absent Spec.TaskTemplate.ContainerSpec.User, it may be more future-proof to change the user parameter default from root to None (and all the related stuff).

user@dumb:/tmp/test$ sudo docker service create --name test_service_user --user root busybox sleep 3600
[CUT]
user@dumb:/tmp/test$ sudo docker service create --name test_service_nouser busybox sleep 3600
[CUT]
user@dumb:/tmp/test$ diff <(sudo docker service inspect test_service_user) <(sudo docker service inspect test_service_nouser)
[CUT]
<                     "User": "root",

@felixfontein
Copy link
Contributor

@dariko the problem with changing the default now is that it breaks backwards compatibility. It probably would have been better if the default would have never been there, but now it's too late. (The root cause for #49078 is essentially the same problem: the init option for docker_container has a default value.)

Unfortunately I don't have a good idea how to solve this. @gundalow are there any rules/suggestions/... on how to change defaults, which will break backwards incompatibility? Or do you know whom to ask?

@dariko
Copy link
Contributor

dariko commented Nov 27, 2018

@felixfontein I'm not sure backward compatibility would not be broken, because the implied user default would still be root. The only possible difference that I imagine would be in that, right now, if I rerun a task which had a user != "root" parameter removing it the service would be changed as to be executed as root; I expect (pending testing) that replacing the default as suggested could have the second execution of the task (without the root parameter) leave the service unchanged.
Leaving root as default grants the module a more tight/controlled interface which I think, all things considered, should be a priority here.

For reference, here is the related code from docker-py: if a ContainerSpec is created with user==None the parameter is ignored.

@felixfontein
Copy link
Contributor

@dariko doesn't the implied user depend on the image's default user? I don't use docker_service or docker_swarm_service, but for regular docker_container controlled instances I don't specify a user, but already add/specify it while building the images.

@dariko
Copy link
Contributor

dariko commented Nov 28, 2018

@felixfontein Thank you, the image's own default user completely slipped my mind 😕 Now I see how backward-compatibility can be broken by changing the module parameter default, and how bad the current default is. Maybe it would be possible to replace it on a eventual future major release?

@felixfontein
Copy link
Contributor

That's a good question. I hope @gundalow can answer that or point us to someone who can :)

Maybe how about the following:

  1. introduce a "user name" (which would be an invalid name for a "real" user) as a placeholder which allows to use the user from the container (I'd like as-in-container, but I think that's also a valid username);
  2. remove the default, but add code which explicitly sets the user to root if it is None, and at the same time outputs a warning that the default will be changed in 2.12 (the usual deprecation cycle);
  3. at the same time allow empty string to do the same as as-in-container (null will translate to Python's None, so we cannot distinguish it from "user wants default");
  4. change the default to as-in-container when 2.12 comes up.

The main problem is that if we choose a special name which someone (for whatever reason) wants to use as a real username. That's a tough one. I guess as-in-container is probably safe, but you never know ;)

Anyway, such a change probably has to wait for Ansible 2.8, but for now we could change the module to treat an empty string as as-in-container, also for idempotency. That would help @fchiacchiaretta for now. (If 2.7.3 is released next week and not this week, we should be able to get that in.)

@dariko
Copy link
Contributor

dariko commented Nov 28, 2018

@felixfontein I like the idea of a as-in-container username, maybe we could choose a non POSIX-compliant one (I think @image, for example, should be invalid linux and maybe in windows, and indicative of its purpose).

If the @fchiacchiaretta is just about the idempotency I've prepared #49235 which solves it, but he also mentioned a module failure which for now I've been unable to reproduce.

@fchiacchiaretta
Copy link
Author

Hi @dariko
I think I've been inaccurate in my first description, I'm not getting a module failure, playbook runs properly and service is created on the swarm manager, but container fails to start with this error

ID                          NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR                                                                                                        PORTS
g2m8eqwdj4vne0xp8m4gy3xz7   traefik-lb.1        traefik:1.7         sw-mgr-01           Running             Preparing 1 second ago                                                                                                                
lunqmj1tw5sdha7g7ts1napy8    \_ traefik-lb.1    traefik:1.7         sw-mgr-01           Shutdown            Failed 2 seconds ago     "starting container failed: linux spec user: unable to find user root: no matching entries in passwd file"   
iph8fcucom85eg0mee7whazxf    \_ traefik-lb.1    traefik:1.7         sw-mgr-01           Shutdown            Failed 4 seconds ago     "starting container failed: linux spec user: unable to find user root: no matching entries in passwd file"   
vyb4ycx8xgdv8ehb0c8zvhqsz    \_ traefik-lb.1    traefik:1.7         sw-mgr-01           Shutdown            Failed 7 seconds ago     "starting container failed: linux spec user: unable to find user root: no matching entries in passwd file"   
5nlvb3g57mcgebm4oghir4jib    \_ traefik-lb.1    traefik:1.7         sw-mgr-01           Shutdown            Failed 10 seconds ago    "starting container failed: linux spec user: unable to find user root: no matching entries in passwd file"   
zy3ru0fz7ki4jxzla9byeem0l    \_ traefik-lb.1    traefik:1.7         sw-mgr-01           Shutdown            Failed 13 seconds ago    "starting container failed: linux spec user: unable to find user root: no matching entries in passwd file"

As you reported

  • setting user: root or omitting user has the service correctly created and unchanged in a second execution with the same parameters
  • setting user: has the service correctly created but the second execution reports the service as changed

I get the same behaviour, but in the first case I have

# docker service inspect traefik-lb
[...]
ContainerSpec:
 Image:		traefik:1.7
 Args:		[..omitted..]
 User: root
[...]

while in the second User: root is missing.

If I can have idempotency even specifying a null user, that would be great for now 😄

@dariko
Copy link
Contributor

dariko commented Nov 28, 2018

@fchiacchiaretta Ok, it all makes sense. The changes in #49235 should have the module correctly report its changed status when using user:.
Here is the module as it would be when backported to stable-2.7, could you please test if this version works for you?

@fchiacchiaretta
Copy link
Author

@dariko I made a few tests with both original and updated module, I try to summarize my findings.

With updated module, the second run with user: results in 'ok' status, so I have idempotency now.

There is another notable difference in behaviour I noticed:
with original module, when creating the service without user parameter, running a second time setting user: resulted in a changed state, so the service on the node got restarted and started working properly (that's how I fixed it the first time, tested again a while ago just to be sure).

When running the same test with the updated module, the second run resulted in 'ok' status, so the service was left in a broken state, and I had to remove it a deploy it again with user:.

Is this the intended behaviour or are we breaking other cases (which I can't see)?

@dariko
Copy link
Contributor

dariko commented Nov 28, 2018

@fchiacchiaretta The new behaviour is at least consistent, and it is the one I would have expected from the beginning: now having user unset or nullifying it have the same meaning.
It could be better, but working in that direction will have to take in consideration backward compatibility.

@ansibot
Copy link
Contributor

ansibot commented Dec 14, 2018

@hannseman
Copy link
Contributor

hannseman commented Jan 19, 2019

user should really not be defaulted to root. I think we should try and find a way to have it be None as soon as possible.

In my opinion defaulting to root can be a great security risk where users have built images with a custom user only to have the container run as root by default. I only caught this because the official memcached image refuses to run as root. If it would be considered insecure breaking backwards comparability would be allowed right?

@dariko @felixfontein

https://docs.docker.com/engine/security/security/
https://medium.com/@mccode/processes-in-containers-should-not-run-as-root-2feae3f0df3b
https://security.stackexchange.com/questions/176206/docker-runs-container-processes-as-root-should-i-be-worried

@felixfontein
Copy link
Contributor

I think we all fully agree that the current situation isn't good :)

I guess we cannot do anything for versions before 2.8 anymore (that's only 2.7.x, since the module wasn't there in 2.6). Having the possibility to write user: null is already there in 2.7.x, so users aware of the problem can at least work around it.

What we can do is break backwards compatibility for 2.8. In general we shouldn't do that, but I think this is one of the situation where it's worth it. We'd have to make sure that this change is mentioned both in the changelog and the porting guide, so increase the chance that people won't miss it.

@ansible ansible locked and limited conversation to collaborators Jul 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.7 This issue/PR affects Ansible v2.7 bug This issue/PR relates to a bug. cloud docker module This issue/PR relates to a module. python3 support:community This issue/PR relates to code supported by the Ansible community.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants