Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still see dead process #52

Closed
tianlai opened this issue Jan 26, 2016 · 13 comments
Closed

Still see dead process #52

tianlai opened this issue Jan 26, 2016 · 13 comments

Comments

@tianlai
Copy link

tianlai commented Jan 26, 2016

Hi, I'm using this dumb-init as my entry point in my dock image from centos 6
ENTRYPOINT ["/usr/local/bin/dumb-init", "-c"]
The container is spun up by jenkins docker plugin as executor slaves with command bash -c '/usr/sbin/sshd -D'

After job finishes, I still see the zombi process issue in my docker machine

root 6985 1 0 12:07 ? 00:00:01 /usr/bin/docker -d -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver aufs --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=amazonec2

root 9138 6985 0 12:08 ? 00:00:00 [dumb-init]

500 9653 9138 37 12:08 ? 00:06:14 [java]

First one is docker service. I'm not able to kill either of latter ones. I'm kinda new to Docker, any idea?

@bukzor
Copy link
Contributor

bukzor commented Feb 4, 2016

I don't believe these are zombie processes.

Use ps -efly to see the process state. If it's really a zombie, you'll zee Z in column S.

@bukzor bukzor closed this as completed Feb 4, 2016
@tianlai
Copy link
Author

tianlai commented Feb 4, 2016

@bukzor yes, it's Z.

@asottile asottile reopened this Feb 4, 2016
@bukzor
Copy link
Contributor

bukzor commented Feb 4, 2016

@tianlai , how would I reproduce this situation myself?

See also: http://sscce.org/

@tianlai
Copy link
Author

tianlai commented Feb 4, 2016

@bukzor I'm not sure. It's not happening every time. Like I said, I used it in Jenkins' slave image. Slave just executes a Grails test job and tears down when finished. In my containers, I can see dumb-init is always running as pid 1, but the zombie java pid is always the Grails process running previously. Sorry I can't really provide the code base and expose more stuffs to you because of my company's policy.

@bukzor
Copy link
Contributor

bukzor commented Feb 5, 2016

I have to have something that I can reproduce in order to debug.

Sorry.

@bukzor bukzor closed this as completed Feb 5, 2016
@asottile asottile reopened this Feb 6, 2016
@asottile
Copy link
Contributor

asottile commented Feb 6, 2016

We're seeing something similar to this on our mesos slaves:

$ ps -wwwelfy | grep 27838
S 102       27838   1803  0  80   0     0     0 -      10:39 ?        00:00:00 [dumb-init]
Z 102       28176  27838 93  80   0     0     0 -      10:39 ?        00:14:37 [mysqld] <defunct>
$ pstree -halp 27838
(dumb-init,27838)
  └─(mysqld,28176)
      └─{mysqld},28188

This is happening inconsistently, some succeed others fail.

All the ones I've observed so far leave mysqld consuming an entire CPU.

Interestingly these appear different than a "healthy" dumb-init process in ps where the entire arguments are visible -- the brackets here (from man ps):

Sometimes the process args will be unavailable; when this happens, ps will instead print
the executable name in brackets.

@taladar
Copy link

taladar commented Feb 27, 2016

I am seeing the same thing, also with Jenkins Docker Slaves. Apparently the Jenkins slave.jar processes turn into zombies of two threads, both using 100% CPU each, the process PID has its state listed as Z while the other thread is listed as running. dumb-init is the parent process of this one and is listed as state Ss (interruptible sleep, session leader). with its name also in square brackets like the zombies and kernel processes.

Before I added dumb-init to the mix the same thing happened but there were two levels of sshd processes in between docker and the java process in this state, apparently those now get cleaned up correctly by dumb-init (the docker command essentially starts dumb-init sshd -D. Jenkins then connects to this and starts slave.jar.

Is there any specific information I could provide to help you debug this?

@asottile
Copy link
Contributor

@taladar uname -a would be helpful -- at yelp we're thinking it's a specific kernel version (we've since downgraded and the problem has gone away)

@taladar
Copy link

taladar commented Feb 28, 2016

The kernel version in use is

Linux <hostname> 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3~bpo70+1 (2016-01-19) x86_64 GNU/Linux

The system has been experiencing this problem only for a few days. As far as I can tell from the apt history.log and uprecords (which only lists the version in the format at the front of uname) the kernel used before the problem occurred was

3.16.7-ckt11-1+deb8u6~bpo70+1

Which versions were affected/unaffected for you?

@taladar
Copy link

taladar commented Feb 28, 2016

This looks like moby/moby#18180

@asottile
Copy link
Contributor

Linux <hostname> 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015 x86_64 GNU/Linux

is what we're running -- the version we were having trouble with was 3.13.0-79 iirc (not 100% sure here). But it definitely does look like moby/moby#18180

@asottile
Copy link
Contributor

asottile commented Mar 2, 2016

@taladar++

The workarounds in moby/moby#18180 seem to have fixed our issues

@asottile asottile closed this as completed Mar 2, 2016
@tianlai
Copy link
Author

tianlai commented Mar 2, 2016

I downgraded linux kernel and it seems working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants