Docker 1.9.1 hanging at build step "Setting up ca-certificates-java" #18180

Closed
jredl-va opened this Issue Nov 24, 2015 · 258 comments

Projects

None yet
@jredl-va

A few of us within the office upgraded to the latest version of docker toolbox backed by Docker 1.9.1 and builds are hanging as per the below build output.

docker version:

 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      darwin/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64

docker info:

Containers: 10
Images: 57
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 77
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-boot2docker
Operating System: Boot2Docker 1.9.1 (TCL 6.4.1); master : cef800b - Fri Nov 20 19:33:59 UTC 2015
CPUs: 1
Total Memory: 1.956 GiB
Name: vbootstrap-vm
ID: LLM6:CASZ:KOD3:646A:XPRK:PIVB:VGJ5:JSDB:ZKAN:OUC4:E2AK:FFTC
Debug mode (server): true
 File Descriptors: 13
 Goroutines: 18
 System Time: 2015-11-24T02:03:35.597772191Z
 EventsListeners: 0
 Init SHA1: 
 Init Path: /usr/local/bin/docker
 Docker Root Dir: /mnt/sda1/var/lib/docker
Labels:
 provider=virtualbox

uname -a:

Darwin JRedl-MB-Pro.local 15.0.0 Darwin Kernel Version 15.0.0: Sat Sep 19 15:53:46 PDT 2015; root:xnu-3247.10.11~1/RELEASE_X86_64 x86_64

Here is a snippet from the docker build uppet that hangs on the Setting up ca-certificates-java line. Something to do with the latest version of docker and openjdk?

update-alternatives: using /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/tnameserv to provide /usr/bin/tnameserv (tnameserv) in auto mode
update-alternatives: using /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jexec to provide /usr/bin/jexec (jexec) in auto mode
Setting up ca-certificates-java (20140324) ...

Docker file example:

FROM gcr.io/google_appengine/base

# Prepare the image.
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y -qq --no-install-recommends build-essential wget curl unzip python python-dev php5-mysql php5-cli php5-cgi openjdk-7-jre-headless openssh-client python-openssl && apt-get clean

I can confirm that this is not an issue with Docker 1.9.0 or Docker Toolbox 1.9.0d. Let me know if I can provide any further information but this feels like a regression of some sort within the new version.

@bsao
bsao commented Nov 24, 2015

I am facing same problem. I am investigating.

@Jberlinsky

We're facing the same problem as well.

@bsao
bsao commented Nov 24, 2015

Yep, it is a problem em docker 1.9. I had downgraded to 1.8.3 and all problems solved. Now i am investigating a workarround. will post here! Tks

@alex-jestin-taylor

I'm having the same issue with docker 1.9.1a

@bean5
bean5 commented Nov 24, 2015

I have docker 1.8.3, so maybe the process of installing a different version of docker remedies the situation. @bsao.

@loebpaul

having this same issue with docker version 1.9.1, build a34a1d5

@crosbymichael
Member

Are you only seeing this on boot2docker?

@crosbymichael
Member

I cannot repo on a stock ubuntu with aufs or on my machine. let me try with boot2docker to see if I can repo there.

@marianmoldovan

+1 in Docker 1.9.1 for ubuntu:14.10 using OSX

@bean5
bean5 commented Nov 25, 2015

This is an issue that started appearing after I turned on VPN for work. Even after I turned off VPN and restarted the docker machine on OSX it continued to have this problem. I re-installed Docker 1.9.1 and then 1.8.3, still seeing the issue. Blocks me from using most if not all of my dockers on the Mac.

@maxwellpeterson-wf

+1 in Docker 1.9.1 for ubuntu 12.04 using OS X 10.11

@chico1198

Developers in my office came across this by accident too.

This version/build worked: Docker version 1.9.0, build 76d6bc9

This version/build hung:Docker version 1.9.1, build a34a1d5

@ximenesuk ximenesuk referenced this issue in ome/omero-install Nov 26, 2015
Merged

Added docker file for debian8 apache24 #60

@jredl-va

@crosbymichael I unfortunately have not tried it on any other environment than Boot2Docker.

@bean5
bean5 commented Nov 27, 2015

Someone with the know-how of git-bisecting and docker could use the build IDs provided by @chico1198!

@abuechler

I experienced the same problem with 1.9.1 on OSX El Capitan, downgrading to 1.9.0 didn't help.

@dannymcpherson

Same issue here on OSX 10.9.3 with:
Docker version 1.9.1, build a34a1d5
Docker version 1.9.0, build 76d6bc9

@sevein
sevein commented Nov 28, 2015

@crosbymichael I logged in boot2docker and ran ps auxf, this is what I saw:

root      1290  0.4  1.8 1346656 75692 ?       Sl   Nov27   4:53 /usr/local/bin/docker daemon -D -g /var/lib/docker -H unix:// -H tcp://0.0.0.0:2376 [...]
root      8556  0.0  0.0      0     0 ?        Ss   05:12   0:00  \_ [sh]
root     24221 99.8  0.0      0     0 ?        Zl   05:33  64:17  |   \_ [java] <defunct>
root     24657  0.0  0.0      0     0 ?        Ss   06:07   0:00  \_ [sh]
root      6174 79.6  0.0      0     0 ?        Zl   06:22  12:33      \_ [java] <defunct>
root      7295 49.3  0.0      0     0 ?        Zl   06:32   2:49      \_ [java] <defunct>
@osterman

+1 with docker 1.9.1 on OSX 10.11 with attempting to build image from ubuntu 14.04

@junxy
junxy commented Nov 30, 2015

+1
use DockerToolbox-1.9.1a.pkg

docker version                                                                                      2 master?
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      darwin/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64
@osterman

Downgrading to Docker 1.8.3 is my temporary workaround. Here's the target I use in my Makefile.

downgrade-docker:
  docker-machine ssh $(DOCKER_MACHINE_NAME) sudo /etc/init.d/docker stop
  docker-machine ssh $(DOCKER_MACHINE_NAME) "while sudo /etc/init.d/docker status ; do sleep 1; done"
  docker-machine ssh $(DOCKER_MACHINE_NAME) "sudo curl 'https://get.docker.com/builds/Linux/x86_64/docker-1.8.3' -o /usr/local/bin/docker-1.8.3"
  docker-machine ssh $(DOCKER_MACHINE_NAME) "sudo ln -sf /usr/local/bin/docker-1.8.3 /usr/local/bin/docker"
  # FIXME: Starting machine is not enough; always fails with message like "Need TLS certs for 127.0.0.1,10.0.2.15,192.168.99.100"
  #docker-machine ssh $(DOCKER_MACHINE_NAME) sudo /etc/init.d/docker start
  docker-machine stop $(DOCKER_MACHINE_NAME) 
  docker-machine start $(DOCKER_MACHINE_NAME) 
@thaJeztah thaJeztah added this to the 1.9.2 milestone Nov 30, 2015
@tiborvass
Contributor

I couldn't reproduce this. Does it always hang at "setting up certificates" ? Did you try sending a ^D to close some pipe? Can you also try sending a SIGUSR1 to the daemon and paste the stack trace here when it's stuck?

@leafjiang

+1 with docker 1.9.1 on OS X 10.10

I tried downgrading to 1.8.3 using @osterman 's Makefile and also had troubles with the SSH key:

ip-10-100-0-211:docker-dev leaf$ docker-machine start default
(default) OUT | Starting VM...
Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded
@carsten-ulrich-saitow-ag

Tested it by doing different openjdk installs inside debian:jessie and ubuntu
OSX 10.11.1, boot2docker 1.9.1: hangs
OSX 10.11.1, boot2docker 1.9.0: works
Ubuntu 14.04 with docker 1.9.1: works

The boot2docker vms were created with:
docker-machine create -d virtualbox --virtualbox-boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v1.9.0/boot2docker.iso
and
docker-machine create -d virtualbox --virtualbox-boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v1.9.1/boot2docker.iso

On Ubuntu 14.04 docker was installed following the documentation on https://docs.docker.com/engine/installation/ubuntulinux/

@Lohhari
Lohhari commented Nov 30, 2015

+1, running docker 1.9.1 build a34a1d5 on OSX Yosemite 10.10.5.

@cpuguy83
Contributor

I can't reproduce this.

@patsa
patsa commented Nov 30, 2015

Same issue here.
Is there any way to downgrade to an earlier version on Windows?

@psanders
psanders commented Dec 1, 2015

+1, docker 1.9.1 @ El Capitan

@virtualzone

+1, Docker 1.9.1 on OS X 10.11.1

@patrikjensen

+1, Docker 1.9.1a, OS X 10.10.5

@jebbench
jebbench commented Dec 1, 2015

+1, Docker 1.9.1 build a34a1d5, Windows 10

@crazyball

+1, Docker 1.9.1 build a34a1d5, OS X 10.11.1, Docker-Machine 0.5.1 build 7e8e38e

@rogierslag

+1

Same on Docker-machine on OSX 10.11.1
Docker version 1.9.1, build a34a1d5
docker-machine version 0.5.1 (HEAD)

@thaJeztah
Member

I'm able to reproduce this on docker-machine, OS X 10.10.5, so this may be something related to boot2docker. docker top also gives me <defunct> for a java process;

docker top dreamy_sammet                                                                  Tue Dec  1 15:58:47 2015
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                2538                1023                0                   14:44               ?                   00:00:00            /bin/sh -c apt-get update && apt-get install -y -qq --no-install-recommends build-essential wget curl unzip python python-dev php5-mysql php5-cli php5-cgi openjdk-7-jre-headless openssh-client python-openssl && apt-get clean
root                2566                2538                1                   14:44               ?                   00:00:16            apt-get install -y -qq --no-install-recommends build-essential wget curl unzip python python-dev php5-mysql php5-cli php5-cgi openjdk-7-jre-headless openssh-client python-openssl
root                4830                2566                0                   14:46               pts/0               00:00:00            /usr/bin/dpkg --status-fd 14 --configure libgdbm3:amd64 libjson-c2:amd64 libbsd0:amd64 libedit2:amd64 libkeyutils1:amd64 libkrb5support0:amd64 libk5crypto3:amd64 libkrb5-3:amd64 libgssapi-krb5-2:amd64 libidn11:amd64 libsasl2-modules-db:amd64 libsasl2-2:amd64 libldap-2.4-2:amd64 libmagic1:amd64 libsqlite3-0:amd64 libwrap0:amd64 libxml2:amd64 perl-modules:all perl:amd64 mime-support:all libexpat1:amd64 libpython2.7-stdlib:amd64 python2.7:amd64 libpython-stdlib:amd64 python:amd64 libasan1:amd64 libasyncns0:amd64 libatomic1:amd64 libavahi-common-data:amd64 libavahi-common3:amd64 libdbus-1-3:amd64 libavahi-client3:amd64 libcilkrts5:amd64 libisl10:amd64 libcloog-isl4:amd64 libcups2:amd64 librtmp1:amd64 libssh2-1:amd64 libcurl3:amd64 libogg0:amd64 libflac8:amd64 libpng12-0:amd64 libfreetype6:amd64 ucf:all fonts-dejavu-core:all fontconfig-config:all libfontconfig1:amd64 libglib2.0-0:amd64 libgomp1:amd64 x11-common:all libice6:amd64 libicu52:amd64 libitm1:amd64 liblcms2-2:amd64 liblsan0:amd64 libmpfr4:amd64 mysql-common:all libmysqlclient18:amd64 libnspr4:amd64 libnss3:amd64 libonig2:amd64 libpcsclite1:amd64 libsm6:amd64 libvorbis0a:amd64 libvorbisenc2:amd64 libsndfile1:amd64 libxau6:amd64 libxdmcp6:amd64 libxcb1:amd64 libx11-data:all libx11-6:amd64 libx11-xcb1:amd64 libxext6:amd64 libxi6:amd64 libxtst6:amd64 libpulse0:amd64 libpython2.7:amd64 libc-dev-bin:amd64 linux-libc-dev:amd64 libc6-dev:amd64 libexpat1-dev:amd64 libpython2.7-dev:amd64 libquadmath0:amd64 libsctp1:amd64 libtsan0:amd64 libubsan0:amd64 tzdata-java:all java-common:all libjpeg62-turbo:amd64 ca-certificates-java:all openjdk-7-jre-headless:amd64 libmpc3:amd64 libpsl0:amd64 wget:amd64 bzip2:amd64 libperl4-corelibs-perl:all lsof:amd64 openssh-client:amd64 patch:amd64 xz-utils:amd64 binutils:amd64 cpp-4.9:amd64 cpp:amd64 libgcc-4.9-dev:amd64 gcc-4.9:amd64 gcc:amd64 libstdc++-4.9-dev:amd64 g++-4.9:amd64 g++:amd64 make:amd64 libtimedate-perl:all libdpkg-perl:all dpkg-dev:all build-essential:amd64 curl:amd64 libpython-dev:amd64 libqdbm14:amd64 psmisc:amd64 php5-common:amd64 php5-json:amd64 php5-cli:amd64 php5-cgi:amd64 php5-mysql:amd64 python-ply:all python-pycparser:all python-cffi:amd64 python-pkg-resources:all python-six:all python-cryptography:amd64 python2.7-dev:amd64 python-dev:amd64 python-openssl:all unzip:amd64
root                6711                4830                0                   14:46               pts/0               00:00:00            /bin/bash /var/lib/dpkg/info/ca-certificates-java.postinst configure
root                6725                6711                97                  14:46               pts/0               00:12:25            [java] <defunct>
@thaJeztah
Member

/cc @tianon @nathanleclaire @jeffdm perhaps any of you has an idea where to look, or what to debug, I couldn't really find something

@tianon
Member
tianon commented Dec 1, 2015
@thaJeztah
Member

Looks like memory is not the problem, however the <defunct> process does consume 100% CPU;

CONTAINER           CPU %               MEM USAGE / LIMIT   MEM %               NET I/O               BLOCK I/O
d263da116bfd        99.51%              689.3 MB / 2.1 GB   32.82%              157.9 MB / 2.754 MB   25.15 MB / 130.4 MB

The container seems to be stuck as well, and I had to reboot the vm to get it killed

@mlapierre

+1 Docker version 1.9.1, build a34a1d5, Win 7.

I've run into similar problems that turned out to be OOM, even though the stats command shows memory available to the container. The problem happened soon after task manager showed 0 free physical memory, while stats continued to show <100%.

@thaJeztah
Member

Weird thing is, that the process kept running, so it was not killed. I can retry with a -m, however, it's strange that this happens on 1.9.x, but (following this discussion) not on 1.8. Also, running the same on a 1GB DigitalOcean droplet (also 1.9.1) succeeded. Perhaps that one uses swap, should check that

@Lohhari
Lohhari commented Dec 1, 2015

It actually kept happening to me after I uninstalled 1.9.1 and installed 1.8.3. Looked like the uninstall wasn't very thorough though on Mac because firing up the shell was without delay on 1.8.3, unlike a normal first run where it sets up ssh keys and stuff.

@GordonTheTurtle

USER POLL

The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right.

The people listed below have appreciated your meaningfull discussion with a random +1:

@mattes

@bean5
bean5 commented Dec 1, 2015

31 participants on this issue and counting.

@thaJeztah
Member

@bean5 please keep your comments constructive

@bean5
bean5 commented Dec 1, 2015

@thaJeztah I didn't mean to offend nor be deconstructive. I mean to draw attention to the fact that github shows the number of people participating...and I gathered that @GordonTheTurtle wanted to construct a list of people who have done +1. Maybe I was confused by what he meant. In any case, I watch this issue with great anticipation since it has affected me on more than one occasion in the past weeks. I am glad we have information from various users.

@nathanleclaire
Contributor

I am able to duplicate this issue on my setup (using Docker Machine on Mac).

Here are my findings so far.

As noted by other posters, the simplest way to duplicate this has been to use the boot2docker 1.9.1 ISO with AUFS. This Dockerfile should minimally reproduce the problem fairly quickly:

FROM debian:jessie

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends openjdk-7-jre-headless

Looking at dmesg, I see some AUFS errors after attempting such a build, but I am not 100% sure they are related:

docker@default:~$ dmesg | tail
aufs au_opts_verify:1597:docker[14186]: dirperm1 breaks the protection by the permission bits on the lower branch
aufs au_opts_verify:1597:docker[14186]: dirperm1 breaks the protection by the permission bits on the lower branch
aufs au_opts_verify:1597:docker[14186]: dirperm1 breaks the protection by the permission bits on the lower branch
device veth955cc15 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): veth955cc15: link is not ready
eth0: renamed from vethc63e038
IPv6: ADDRCONF(NETDEV_CHANGE): veth955cc15: link becomes ready
docker0: port 2(veth955cc15) entered forwarding state
docker0: port 2(veth955cc15) entered forwarding state
docker0: port 2(veth955cc15) entered forwarding state

If I create a Docker 1.9.1 machine which uses overlay as the storage driver:

$ docker-machine create -d virtualbox --engine-storage-driver overlay overlay

The process does NOT hang and this line runs successfully! Looks like AUFS and/or kernel is the problem.

boot2docker/boot2docker did bump both kernel versions and AUFS commit for the 1.9.1 release, so those are both factors which need to be ruled out or investigated further:

Currently trying 1.9.0 ISO with a 1.9.1 binary to see if the surface area of potential bug area can be reduced further.

@nathanleclaire
Contributor

The Dockerfile will build fine and not hang on a boot2docker 1.9.0 ISO with a Docker 1.9.1 binary. The issue seems not to lie with Docker 1.9.1, but rather the environment in which it is being run.

@cpuguy83
Contributor
cpuguy83 commented Dec 1, 2015

I am using the 1.9.1 release with no issue on aufs, but have significantly more cpu/ram/storage than the default machine config.

@thaJeztah
Member

I just tried raising the memory to 4GB for my VM, but still able to reproduce

@nathanleclaire
Contributor

@cpuguy83 AUFS on boot2docker 1.9.1?

@nathanleclaire
Contributor

As noted above, b2d bundles a very specific version of AUFS.

@cpuguy83
Contributor
cpuguy83 commented Dec 1, 2015

Yep

@thaJeztah
Member
Containers: 13
Images: 191
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 221
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-boot2docker
Operating System: Boot2Docker 1.9.1 (TCL 6.4.1); master : cef800b - Fri Nov 20 19:33:59 UTC 2015
CPUs: 1
Total Memory: 3.859 GiB
Name: default
ID: XMQH:4YAW:ZDSA:OWC7:GAPC:US5P:YQ4M:SVMQ:VXNL:RRZC:YNHT:ZBHE
Debug mode (server): true
 File Descriptors: 12
 Goroutines: 19
 System Time: 2015-12-01T23:05:28.760107918Z
 EventsListeners: 0
 Init SHA1:
 Init Path: /usr/local/bin/docker
 Docker Root Dir: /mnt/sda1/var/lib/docker
Labels:
 provider=virtualbox
@andrewgdavis

I also see some java processes becoming defunct in a container. I am able to reproduce this issue with the following steps
run the container:

docker run --rm -it myJavaContainerFromCentos7 bash

create Foo.java with the following:

class Foo {
    public static void main (String[] a) {
        System.out.println("hello world");
    }
}

compile and run it results in a defunct java process, with 1 core using 100%cpu:
javac Foo.java && java Foo

however... if a System.exit(0); is added after the println everything is ok:

class Foo {
    public static void main (String[] a) {
        System.out.println("hello world");
        System.exit(0);  // clean exit, no hang
    }
}

version info:
osx 10.10.3
docker 1.9.1
boot2docker version 1.9.1 uname -a is "linux ci 4.1.13-boot2docker"
numproc = 1

strace output with System.exit(0);

open("/usr/java/jdk1.7.0_75/jre/lib/amd64/jvm.cfg", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=677, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f27b1dab000
read(3, "# Copyright (c) 2003, Oracle and"..., 4096) = 677
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0x7f27b1dab000, 4096)            = 0
stat("/usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so", {st_mode=S_IFREG|0755, st_size=15224066, ...}) = 0
futex(0x7f27b17580d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\245\36\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=15224066, ...}) = 0
mmap(NULL, 15167976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f27b031c000
mprotect(0x7f27b0e8f000, 2097152, PROT_NONE) = 0
mmap(0x7f27b108f000, 802816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb73000) = 0x7f27b108f000
mmap(0x7f27b1153000, 262632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f27b1153000
close(3)                                = 0
open("/usr/java/jdk1.7.0_75/bin/../lib/amd64/jli/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=11922, ...}) = 0
mmap(NULL, 11922, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f27b1da9000
close(3)                                = 0
open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260T\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1141552, ...}) = 0
mmap(NULL, 3150168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f27b001a000
mprotect(0x7f27b011b000, 2093056, PROT_NONE) = 0
mmap(0x7f27b031a000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x100000) = 0x7f27b031a000
close(3)                                = 0
mprotect(0x7f27b031a000, 4096, PROT_READ) = 0
munmap(0x7f27b1da9000, 11922)           = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f27b1ca4000
mprotect(0x7f27b1ca4000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f27b1da3fb0,                                                                                                    flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,  parent_tidptr=0x7f27b1da49d0, tls=0x7f27b1da4700, child_tidptr=0x7f27b1da49d0) = 118
futex(0x7f27b1da49d0, FUTEX_WAIT, 118, NULLhellowerld
 <unfinished ...>
 +++ exited with 0 +++

strace output without System.exit(0);

open("/usr/java/jdk1.7.0_75/jre/lib/amd64/jvm.cfg", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=677, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fac9a490000
read(3, "# Copyright (c) 2003, Oracle and"..., 4096) = 677
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0x7fac9a490000, 4096)            = 0
stat("/usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so", {st_mode=S_IFREG|0755, st_size=15224066, ...}) = 0
futex(0x7fac99e3d0d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\245\36\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=15224066, ...}) = 0
mmap(NULL, 15167976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac98a01000
mprotect(0x7fac99574000, 2097152, PROT_NONE) = 0
mmap(0x7fac99774000, 802816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb73000) = 0x7fac99774000
mmap(0x7fac99838000, 262632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac99838000
close(3)                                = 0
open("/usr/java/jdk1.7.0_75/bin/../lib/amd64/jli/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=11922, ...}) = 0
mmap(NULL, 11922, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fac9a48e000
close(3)                                = 0
open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260T\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1141552, ...}) = 0
mmap(NULL, 3150168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac986ff000
mprotect(0x7fac98800000, 2093056, PROT_NONE) = 0
mmap(0x7fac989ff000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x100000) = 0x7fac989ff000
close(3)                                = 0
mprotect(0x7fac989ff000, 4096, PROT_READ) = 0
munmap(0x7fac9a48e000, 11922)           = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fac9a389000
mprotect(0x7fac9a389000, 4096, PROT_NONE) = 0
clone(child_stack=0x7fac9a488fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fac9a4899d0, tls=0x7fac9a489700, child_tidptr=0x7fac9a4899d0) = 142
futex(0x7fac9a4899d0, FUTEX_WAIT, 142, NULLhellowerld
) = 0
exit_group(0)                           = ?

the process is now hung but you can enter the container:

docker exec -it myContainer bash

and see the following:

ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 23:47 ?        00:00:00 bash
root       138     1  0 23:51 ?        00:00:00 strace java Foo
root       141   138 24 23:51 ?        00:01:21 [java] <defunct>
root       151     0  1 23:57 ?        00:00:00 bash
root       167   151  0 23:57 ?        00:00:00 ps -ef

quick look at stats:

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O               BLOCK I/O
myContainer                  24.72%              64.18 MB / 8.365 GB   0.77%               11.09 MB / 202.6 kB   8.192 kB / 14.99

Everything works fine in 1.8.3.

@lerox
lerox commented Dec 2, 2015

+1, Docker version 1.9.1, build a34a1d5, OS X

@moutend
moutend commented Dec 2, 2015

+1, Docker version 1.9.1, build a34a1d5, OS X 10.10.5, Docker Machine Version: 0.5.1 (HEAD)

@liuml07
liuml07 commented Dec 2, 2015

+1

Docker version 1.9.1, build a34a1d5, OS X 10.11.1 (15B42)

@jjmartres

+1

Docker version 1.9.1, build a34a1d5 on OS X 10.11.1

@WardF WardF added a commit to Unidata/cloudidv that referenced this issue Dec 3, 2015
@WardF WardF Added a work around for docker bug in 1.9.1. See docker/docker#18180 …
…for more information.
13b3030
@icecrime icecrime removed this from the 1.9.2 milestone Dec 3, 2015
@nathanleclaire
Contributor

This issue really is quite bizarre. If I strace the failing apt-get command, the end of the output is:

stat("/etc/apt/sources.list", {st_mode=S_IFREG|0644, st_size=161, ...}) = 0
open("/etc/apt/sources.list", O_RDONLY) = 5
read(5, "deb http://httpredir.debian.org/"..., 8191) = 161
pipe([6, 7])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc6fc88aa10) = 14
close(7)                                = 0
fcntl(6, F_GETFL)                       = 0 (flags O_RDONLY)
fstat(6, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc6fc892000
lseek(6, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(6, Process 14 attached
 <unfinished ...>
[pid    14] rt_sigaction(SIGPIPE, {SIG_DFL, [PIPE], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, 8) = 0
[pid    14] rt_sigaction(SIGQUIT, {SIG_DFL, [QUIT], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_DFL, [], 0}, 8) = 0
[pid    14] rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_DFL, [], 0}, 8) = 0
[pid    14] rt_sigaction(SIGWINCH, {SIG_DFL, [WINCH], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {0x7fc6fc0e5750, [WINCH], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, 8) = 0
[pid    14] rt_sigaction(SIGCONT, {SIG_DFL, [CONT], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_DFL, [], 0}, 8) = 0
[pid    14] rt_sigaction(SIGTSTP, {SIG_DFL, [TSTP], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_DFL, [], 0}, 8) = 0
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(5, F_SETFD, FD_CLOEXEC) = 0
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(6, F_SETFD, FD_CLOEXEC) = 0
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(7, F_SETFD, FD_CLOEXEC) = 0
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(8, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(9, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(10, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
[pid    14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid    14] fcntl(11, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)

Where those (Bad file descriptor) errors continue to loop indefinitely.

@andrewgdavis
RLIMIT_NOFILE
              Specifies a value one greater than the maximum file descriptor
              number that can be opened by this process.  Attempts (open(2),
              pipe(2), dup(2), etc.)  to exceed this limit yield the error
              EMFILE.  (Historically, this limit was named RLIMIT_OFILE on
              BSD.)

SIGPIPE is failing? this might correspond to my previous post where I saw java "hello world" causing zombie processes without an explicit "System.exit(0);" -- or maybe thats a completely different issue. if so sorry for the noise.

what happens to your cpu while looping indefinitely?

@nathanleclaire
Contributor

@andrewgdavis It's at 100%

screen shot 2015-12-03 at 3 55 36 pm

@nathanleclaire
Contributor

java "hello world" causing zombie processes without an explicit "System.exit(0);"

That certainly sounds similar to the problem encountered here.

@tianon
Member
tianon commented Dec 4, 2015

I can definitely confirm the b2d issue (even did the bisect to track it most positively to the 4.1.13 kernel bump). I can also reproduce on 4.2.6 with b2d.

As an additional kink, my Gentoo host is currently on 4.1.13 + AUFS patches also, and I'm seeing the same exact problem, so we've definitely ruled out anything b2d-specific.

I think it might be worth trawling through commits between 4.1.12 and 4.1.13 to see if anything that might be related jumps out.

(ie, https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.1.13)

@nathanleclaire
Contributor

Yup, something breaks from kernel 4.1.12 => 4.1.13. I can confirm that baking a boot2docker ISO for the former doesn't trip this bug but the former does.

@nathanleclaire
Contributor

So, it's not specifically related to boot2docker, but seems to be related to the kernel version interacting with AUFS.

@tianon
Member
tianon commented Dec 4, 2015
@andrewgdavis

i've got a hair brained theory...

http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.1.13&id=6c0da28df5dac10672efe955eb89051a850008eb

the commit above makes a change to filemap.c to generic_perform_write(struct file *file, struct iov_iter *i, loff_t pos)

below is the chunk of code i personally want to test because the comment describes both deadlock and livelock race conditions and i see the cpu pegged at 100%. but thats just me and my jump-to-conclusions mat.

4.1.13 mm/filemap.c#l_2448

...
 2448 again:
 2449       /*
 2450        * Bring in the user page that we will copy from _first_.
 2451        * Otherwise there's a nasty deadlock on copying from the
 2452        * same page as we're writing to, without it being marked
 2453        * up-to-date.
 2454        *
 2455        * Not only is this an optimisation, but it is also required
 2456        * to check that the address is actually valid, when atomic
 2457        * usercopies are used, below.
 2458        */
 2459       if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
 2460           status = -EFAULT;
 2461           break;
 2462       }
 2463 
 2464       if (fatal_signal_pending(current)) {
 2465           status = -EINTR;
 2466           break;
 2467       }
 2468 
 2469       status = a_ops->write_begin(file, mapping, pos, bytes, flags,
 2470                       &page, &fsdata);
 2471       if (unlikely(status < 0))
 2472           break;
 2473 
 2474       if (mapping_writably_mapped(mapping))
 2475           flush_dcache_page(page);
 2476 
 2477       copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 2478       flush_dcache_page(page);
 2479 
 2480       status = a_ops->write_end(file, mapping, pos, bytes, copied,
 2481                       page, fsdata);
 2482       if (unlikely(status < 0))
 2483           break;
 2484       copied = status;
 2485 
 2486       cond_resched();
 2487 
 2488       iov_iter_advance(i, copied);
 2489       if (unlikely(copied == 0)) {
 2490           /*
 2491            * If we were unable to copy any data at all, we must
 2492            * fall back to a single segment length write.
 2493            *
 2494            * If we didn't fallback here, we could livelock
 2495            * because not all segments in the iov can be copied at
 2496            * once without a pagefault.
 2497            */
 2498           bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
 2499                       iov_iter_single_seg_count(i));
 2500           goto again;
 2501       }
 2502       pos += copied;
 2503       written += copied;
 2504 
 2505       balance_dirty_pages_ratelimited(mapping);
 2506   } while (iov_iter_count(i));
@bean5
bean5 commented Dec 4, 2015

@andrewgdavis one could use that commit during git bisect as a specific testing point!

@jakirkham

Seeing a similar hang when shutting down mongodb. Definitely present in 1.9.x. Not present in 1.8.x.

@virtualzone

I've been able to solve this issue for myself by increasing the docker-machine VM's memory from 1024 to 2048 MB and assigning 2 CPUs instead of 1.

@marsinvasion

Works:

VM: Ubuntu 14.04 (2gb ram)
Docker Engine: 1.9.1
Docker base image: ubuntu:latest

Does not work:

VM: Ubuntu 15.10 (2 gb ram)
Docker Engine: 1.9.1,1.9.0,1.8.3
Docker base image: ubuntu:latest, ubuntu:14.04

@nathanleclaire
Contributor

@marsinvasion If possible, can you print the output of uname -a on both tested systems?

@marsinvasion

VM: Ubuntu 14.04
Linux ubuntu 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

VM: Ubuntu 15.10
Linux ubuntu 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

@egorpushkin

+1
Docker version 1.9.1, build a34a1d5 on OS X 10.11.1

@jchester
jchester commented Dec 5, 2015

Encountered on OS X 10.9.5 with docker 1.9.1.

Inspired by @marsinvasion, I got a successful workaround by giving my docker-machine 2 CPUs and 4096Mb RAM.

@jchester
jchester commented Dec 5, 2015

Oops, spoke to soon. It stopped working upon changing a Dockerfile I'm working on and re-running build.

@jamshid
Contributor
jamshid commented Dec 6, 2015

Also seeing this hellacious bug (docker-machine boot2docker 1.9.1 on OS X), from a previously building ubuntu:15.04 image. It seems to require restarting my docker server to get those zombie containers to go away.

I thought docker-library/openjdk#19 was related but maybe not, here we're getting a hang, there they got an error about not finding "java".

@escapenguin

Switched my server to overlay as a workaround. Before that it created a bunch of zombie containers as well.

Docker version 1.9.1, build a34a1d5 on OS X 10.11.1

@jamshid
Contributor
jamshid commented Dec 7, 2015

Anyone know what's involved in migrating an existing boot2docker.iso system to https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/ or is it easier to do a full rebuild? That page has ominous warnings about CentOS image builds -- what are the "yum" workarounds, is it related to #10180?

@jakirkham

Definitely not fixed by Docker Toolbox 1.9.1a. Suffering from this bug with that version. Looking back through the comments, it looks like I'm not the only one.

@Snapu
Snapu commented Dec 7, 2015

nope still not building

@dubvfan87

I had to delete the VM in virtualbox and start from scratch for it to work.

@jakirkham

Also, tried deleting and creating a new VM several times to no avail.

@andrew-svds

Installed 1.9.1a, did docker-machine rm default and used Docker Quickstart Terminal to regenerate default machine. Rebuilt images (that derive from java:7-jre) and ran, still does not work. Continues to work just fine with overlay machine built as suggested above:

$ docker-machine create -d virtualbox --engine-storage-driver overlay overlay
@Snapu
Snapu commented Dec 7, 2015

^thanks! I can confirm the overlay machine is working.

@jakirkham

Using overlay as the engine storage driver also worked for fixing the MongoDB shutdown hang.

@gmacario gmacario referenced this issue in gmacario/easy-jenkins Feb 5, 2016
Closed

Rebase on Docker 1.10.0 #29

2 of 2 tasks complete
@mcku
mcku commented Feb 7, 2016

I can confirm that this issue does not exist on Docker 1.10.0, which fixed my situation on OS X 10.11 as well. Otherwise I was going to downgrade to 1.9.0.

@jamshid
Contributor
jamshid commented Feb 7, 2016

I'm still getting the java hung container/process problem on docker 1.10:

root     30480  0.1  0.0      0     0 ?        Z    16:15   0:00 [update-hosts] <defunct>

@AkihiroSuda I'm trying your Quick Workarounds (thanks!) but I'm not able to install the older kernel on my Debian 8 (jessie) server, I get:

E: Version '3.16.7-ckt11-1+deb8u3' for 'linux-image-3.16.0-4-amd64' was not found

When I try @mikeatlas suggestions (btw had to sudo apt-get install software-properties-common to get sudo add-apt-repository ppa:chiluk/1533043 to work) I get an update failure, which I guess is why the install doesn't work

$ sudo add-apt-repository ppa:chiluk/1533043
You are about to add the following PPA to your system:
 This ppa contains the proposed fix for 1533043, and I would appreciate testing and results reported back to  LP#1533043.

Thank you,
 More info: https://launchpad.net/~chiluk/+archive/ubuntu/1533043
Press [ENTER] to continue or ctrl-c to cancel adding it

gpg: keyring `/tmp/tmp_j6e2_s5/secring.gpg' created
gpg: keyring `/tmp/tmp_j6e2_s5/pubring.gpg' created
gpg: requesting key E2B6D4A9 from hkp server keyserver.ubuntu.com
gpg: /tmp/tmp_j6e2_s5/trustdb.gpg: trustdb created
gpg: key E2B6D4A9: public key "Launchpad PPA for Dave Chiluk" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
OK
$ sudo apt-get update
Ign http://ftp.us.debian.org jessie InRelease
Hit http://security.debian.org jessie/updates InRelease
...
Get:15 https://apt.dockerproject.org debian-jessie/main Translation-en [454 B]
Ign https://apt.dockerproject.org debian-jessie/main Translation-en
Err http://ppa.launchpad.net jessie/main amd64 Packages
  404  Not Found
Ign http://ppa.launchpad.net jessie/main Translation-en_US
Ign http://ppa.launchpad.net jessie/main Translation-en
Fetched 8,877 B in 3s (2,935 B/s)
W: Failed to fetch http://ppa.launchpad.net/chiluk/1533043/ubuntu/dists/jessie/main/binary-amd64/Packages  404  Not Found

E: Some index files failed to download. They have been ignored, or old ones used instead.

$ sudo apt-get install linux-image-3.13.0-77-generic \
>                        linux-image-extra-3.13.0-77-generic -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package linux-image-3.13.0-77-generic
E: Couldn't find any package by regex 'linux-image-3.13.0-77-generic'
E: Unable to locate package linux-image-extra-3.13.0-77-generic
E: Couldn't find any package by regex 'linux-image-extra-3.13.0-77-generic'

My docker info:

$ docker info
Containers: 98
 Running: 9
 Paused: 0
 Stopped: 89
Images: 1415
Server Version: 1.10.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1371
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.6 GiB
Name: r62
ID: VUJF:KPXB:UXL6:TP3G:75CE:WQND:PJGJ:GG45:MCMI:JTV5:Q3IR:6FHC
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Labels:
 provider=generic
@mikeatlas

@jamshid thanks for the tip about needing software-properties-common, I updated my post above.

@jamshid: After you add the PPA and do apt-get update, check and see what kernels are available to your machine... It looks like there's a newer build (3.13.0-78) but I don't see it available after running an update myself here. However, here's how you can figure out what kernels are available to install:

$ apt-cache search linux-image-3.13.0-7
[... snip older builds ...]
linux-image-3.13.0-77-generic - Linux kernel image for version 3.13.0 on 64 bit x86 SMP

If you don't see something along the lines of linux-image-3.13.0-77-generic or greater, something else must not be right.

@mikeatlas

Oh, @jamshid you're running Debian 8? Note above: Downgrade kernel to version 3.16.7-ckt11 of release 3.16.0 (apt-get install linux-image-3.16.0-4-amd64=3.16.7-ckt11-1+deb8u3) or older

@davojan
davojan commented Feb 8, 2016

apt-get install linux-image-3.16.0-4-amd64=3.16.7-ckt11-1+deb8u3 on Debian 8 gives

Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Version '3.16.7-ckt11-1+deb8u3' for 'linux-image-3.16.0-4-amd64' was not found

An hour of active googlening to find the ckt11 package didn't help.
Please, any suggestions how to downgrade recent Debian 8 kernel?

apt-cache policy linux-image-3.16.0-4-amd64
linux-image-3.16.0-4-amd64:
  Installed: 3.16.7-ckt20-1+deb8u3
  Candidate: 3.16.7-ckt20-1+deb8u3
  Version table:
 *** 3.16.7-ckt20-1+deb8u3 0
        500 http://security.debian.org/ jessie/updates/main amd64 Packages
        100 /var/lib/dpkg/status
     3.16.7-ckt20-1+deb8u2 0
        500 http://ftp.debian.org/debian/ jessie/main amd64 Packages
        500 http://httpredir.debian.org/debian/ jessie/main amd64 Packages
@jderrien
jderrien commented Feb 8, 2016

@davojan You can find the packages installed previously in /var/cache/apt/archives. You should be able to downgrade with dpkg -i <old_package>.deb.

@technolo-g

Confirmed that installing the new kernel from the PPA fixed the issue for me (Ubuntu 14.04.3 / Kernel 3.13.0-78-generic / Docker 1.9.1 )
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533043

@jamshid
Contributor
jamshid commented Feb 9, 2016

Has anyone gotten the downgrade instructions to work with Debian (not Ubuntu)? I'm wondering if the reason my apt-get update fails with below error (after adding the ppa repo):

W: Failed to fetch http://ppa.launchpad.net/chiluk/1533043/ubuntu/dists/jessie/main/binary-amd64/Packages  404  Not Found

is that only ubuntu packages are available on https://launchpad.net/~chiluk/+archive/ubuntu/1533043? Hmm I'm confused, I thought Ubuntu was based on Debian.

@AkihiroSuda
Contributor

@jamshid Ubuntu ppa does not support Debian.

If 3.16.7-ckt11-1+deb8u3 is unfortunately no longer available, you can patch the latest kernel: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812207#47

(I can upload the deb package, if someone needs it)

@wzrdtales

If someone needs the kernel, I have uploaded it here, it is a bit newer than deb8u3, but it does not seem like it would have the bug, at least I did not run into it running this for quite a while, but patching the latest kernel is probably the better solution. However if you need it:
https://wizardtales.com/linux-image-3.16.0-0.bpo.4-amd64_3.16.7-ckt11-1+deb8u6~bpo70+1_amd64.deb

@AkihiroSuda
Contributor

@wzrdtales πŸ‘

@kirill-a kirill-a referenced this issue in jenkinsci/docker-plugin Feb 9, 2016
Closed

Docker containers are not deleted after build #375

@fflatorre

For those like me still struggling to get this sorted I'd like to point you out to TINI as a nice workaround :
https://github.com/krallin/tini

with few lines in the dockerfile I got a decent init process capable of removing zombies.
This will allow to avoid the transition to devicemapper.

Cheers,
Francesco

@jakirkham

So, I use tini. However, that didn't help me here as the problem I encountered was while the image was being built.

Also, when running a container, I use tini, but this still affected me.

@AkihiroSuda
Contributor

@fflatorre
Thank you for information, but the zombie issue which tini can solve seems different from this issue.
#18180 (comment)

Actually, even with tini, we can get a zombie:

FROM java:7
ENV TINI_VERSION v0.9.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "--"]
CMD ["taskset", "0x1", "java"]
$ docker build -t foobar .
$ docker run -it --rm foobar
Usage: java [-options] class [args...]
           (to execute a class)
...
See http://www.oracle.com/technetwork/java/javase/documentation/index.html for more details.
(hangs up here and becomes a zombie)
@fflatorre

@AkihiroSuda @jakirkham I forget to mention we are not experiencing this issue when building the image. We build a very basic image and then the provisioning logic is delegated to a bunch of ansible scripts. During the provisioning one of the process (kafka) used to hang. TINI so far seems to have mitigated that issue.
I acknowledge it might not be a solution for you, indeed I'd suggest to downgrade it from workaround to placebo :)
Hope we can get is sorted soon.

@gawbul
gawbul commented Feb 13, 2016

I had the same issue running Docker 1.9.1 on OSX 10.11.3:

$ docker -v
Docker version 1.9.1, build a34a1d5

Upgrading to the latest Docker Toolbox release fixed:

$ docker -v
Docker version 1.10.1, build 9e83765
@AkihiroSuda
Contributor

For information, I listed up some issues and workarounds related to AUFS/Overlay/BtrFS/ZFS/devicemapper storage drivers: https://github.com/AkihiroSuda/docker-issues/

Hope this can help those who are interested in #18180 and others..

@schmunk42
Contributor

@AkihiroSuda I tried to follow the link https://launchpad.net/%7Echiluk/+archive/ubuntu/1533043/+packages but I am not allow to view the page.

See also: docker/toolbox#318 (comment)

@AkihiroSuda
Contributor

@schmunk42
I also could not access the page.
I think chiluk is remaking packages. You can ask to him at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533043

@vpetersson

For people who are having issues with this, here's how to resolve the issue on Ubuntu 14.04 using the -proposed kernel. This will of course not be relevant once the kernel graduates into the main branch.

Before we begin, we're can confirm that we're running on an affected kernel by running (i.e. <3.19.0-50 on Ubuntu 14.04):

$ uname -r
3.19.0-49-generic

Since we know, this, we first need to Enable Proposed packages by running:

$ echo "deb http://archive.ubuntu.com/ubuntu/ trusty-proposed restricted main multiverse universe" | sudo tee -a /etc/apt/sources.list
$ echo -e "Package: *\nPin: release a=trusty-proposed\nPin-Priority: 400" | sudo tee -a  /etc/apt/preferences.d/proposed-updates

With that done, let's install the updated kernel:

$ sudo apt-get update
$ sudo apt-get install linux-image-3.19.0-50-generic/trusty-proposed linux-image-extra-3.19.0-50-generic/trusty-proposed

And the let's reboot

$ sudo shutdown -r now

After the reboot, we can confirm that the latest are now running on the latest kernel:

$ uname -r
3.19.0-50-generic
@IainColledge

Thanks @vpetersson am trying to find out what will happen when this version of the kernel is released, will it just overwrite the proposed install or do you have to do something to go back to normal please?

@vpetersson

@IainColledge Yes, I would imagine that would be the case, but I'm not entirely sure.

@AkihiroSuda
Contributor

Updated the table about Ubuntu and Debian.

LATEST QUICK WORKAROUNDS

Distro Workaround
General Use devicemapper/overlay/btrfs (but it may cause another problem..).
If you can upgrade AUFS and build the kernel manually, you can also use AUFS v20160111 or later.
Boot2Docker βœ… Upgrade to v1.10.0 or later
Ubuntu 14.04LTS βœ… Upgrade kernel to 3.13.0-79.123 or later
Ubuntu 15.04 βœ… Upgrade kernel to 3.19.0-51.57 or later
Ubuntu 15.10 βœ… Upgrade kernel to 4.2.0-30.35 or later
Debian 7/8 ⬇️ Downgrade kernel to version 3.16.7-ckt11 of release 3.16.0 or older (dpkg archive by @wzrdtales)
Debian 9 βœ… (does not support AUFS since kernel 3.18-1~exp1)
Gentoo βœ… Upgrade to recent ones (⚠️ not tested)
RHEL/CentOS βœ… (does not support AUFS)
openSUSE βœ… (does not support AUFS)

Distributors Issue Tickets

Distro Status Issue URL
Boot2Docker βœ… Closed boot2docker/boot2docker#1113
Ubuntu βœ… Closed https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533043
Debian ◻️ In Progress https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812207

One more thing; I listed up some known bugs about storage drivers: https://github.com/AkihiroSuda/docker-issues

@AkihiroSuda AkihiroSuda added a commit to AkihiroSuda/issues-docker that referenced this issue Feb 23, 2016
@AkihiroSuda AkihiroSuda Update the workaround link (docker/docker#18180) bd28312
@fxposter

Just in case someone wants to have latest "linux_3.16.7-ckt20-1+deb8u3" debian kernel with patches, mentioned earlier - I've built it manually, and it's at https://fxposter.org/linux-image-3.16.0-4-amd64_3.16.7-ckt20-1+deb8u3a~test_amd64.deb.

@jiangty-addepar

amazing! I've been having this problem for a few weeks now, and I guess the fix for Ubuntu was just released yesterday :P

@pdsouza
pdsouza commented Feb 24, 2016

Confirming that the latest 14.04LTS kernel update to 3.19.0-51 puts an end to my java zombies. Thanks!

@AkihiroSuda
Contributor

Debian supported this issue.

LATEST QUICK WORKAROUNDS

Distro Workaround
General Use devicemapper/overlay/btrfs (but it may cause another problem..).
If you can upgrade AUFS and build the kernel manually, you can also use AUFS v20160111 or later.
Boot2Docker βœ… Upgrade to v1.10.0 or later
Ubuntu 14.04LTS βœ… Upgrade kernel to 3.13.0-79.123 or later
Ubuntu 15.04 βœ… Upgrade kernel to 3.19.0-51.57 or later
Ubuntu 15.10 βœ… Upgrade kernel to 4.2.0-30.35 or later
Debian 7 βœ… Upgrade kernel to 3.2.73-2+deb7u3 (of linux-image-3.2.0-4-amd64 package) or later
Debian 8 βœ… Upgrade kernel to 3.16.7-ckt20-1+deb8u4 (of linux-image-3.16.0-4-amd64 package) or later
Debian 9 βœ… (does not support AUFS since kernel 3.18-1~exp1)
Gentoo βœ… Upgrade to recent ones (⚠️ not tested)
RHEL/CentOS βœ… (does not support AUFS)
openSUSE βœ… (does not support AUFS)

Distributors Issue Tickets

Distro Status Issue URL
Boot2Docker βœ… Closed boot2docker/boot2docker#1113
Ubuntu βœ… Closed https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533043
Debian βœ… Closed https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812207
@noisy
noisy commented Mar 8, 2016

upgrading kernel of 14.04LTS worked for me πŸ‘

@Eyjafjallajokull

I'm on OSX on Boot2Docker version 1.10.2, build master : 611be10, Docker version 1.10.2, build c3959b1 and first got this from docker-compose:

Recreating docker_preview_1
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Then tried docker kill 38e1e2590dfa but process hangs forever. docker.log:

time="2016-03-09T14:49:13.053004077Z" level=debug msg="Calling POST /v1.21/containers/38e1e2590dfa5d77482b8fbf6b14f01e8d5278622b8e5d7262cd2cdeb777690b/stop"
time="2016-03-09T14:49:13.053058084Z" level=debug msg="POST /v1.21/containers/38e1e2590dfa5d77482b8fbf6b14f01e8d5278622b8e5d7262cd2cdeb777690b/stop?t=10"
time="2016-03-09T14:49:13.053097711Z" level=debug msg="Sending 15 to 38e1e2590dfa5d77482b8fbf6b14f01e8d5278622b8e5d7262cd2cdeb777690b"
time="2016-03-09T14:49:23.053530062Z" level=info msg="Container 38e1e2590dfa5d77482b8fbf6b14f01e8d5278622b8e5d7262cd2cdeb777690b failed to exit within 10 seconds of SIGTERM - using the force"
time="2016-03-09T14:49:23.053720529Z" level=debug msg="Sending 9 to 38e1e2590dfa5d77482b8fbf6b14f01e8d5278622b8e5d7262cd2cdeb777690b"
time="2016-03-09T14:49:33.054082100Z" level=info msg="Container 38e1e2590dfa failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2016-03-09T14:49:34.254353402Z" level=debug msg="Calling GET /v1.22/containers/json"
time="2016-03-09T14:49:34.254413283Z" level=debug msg="GET /v1.22/containers/json"
time="2016-03-09T14:49:54.293708866Z" level=debug msg="Calling POST /v1.22/containers/38e1e2590dfa/kill"
time="2016-03-09T14:49:54.293752784Z" level=debug msg="POST /v1.22/containers/38e1e2590dfa/kill?signal=KILL"
time="2016-03-09T14:49:54.293802705Z" level=debug msg="Sending 9 to 38e1e2590dfa5d77482b8fbf6b14f01e8d5278622b8e5d7262cd2cdeb777690b"
time="2016-03-09T14:50:04.294276946Z" level=info msg="Container 38e1e2590dfa failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2016-03-09T14:50:26.678957119Z" level=debug msg="clean 3 unused exec commands"
@nigelgbanks nigelgbanks referenced this issue in just-containers/s6-overlay Mar 16, 2016
Closed

zombie process is not reaped #135

@sielaq sielaq referenced this issue in eBayClassifiedsGroup/PanteraS Apr 7, 2016
Closed

HAProxy keeping ports open #199

@rawlink rawlink referenced this issue in boxcutter/ubuntu Apr 25, 2016
Closed

Ubuntu docker images java process hang #57

@MichaelSimons MichaelSimons referenced this issue in dotnet/dotnet-docker Apr 26, 2016
Closed

Image appears to pin CPU #33

@azbarcea azbarcea added a commit to apifocal/docker-apacheds that referenced this issue Apr 27, 2016
@azbarcea azbarcea fix build for apacheds; overcomes docker:#18180 b663c11
@einhverfr

Just as a note (I know this is closed but not sure if it makes sense to open as a new issue). I was having the same issue on a later version until I switched to devmapper.

$ docker info
Containers: 4
 Running: 3
 Paused: 0
 Stopped: 1
Images: 81
Server Version: 1.12.1
Storage Driver: devicemapper
 Pool Name: docker-8:1-9044034-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 2.726 GB
 Data Space Total: 107.4 GB
 Data Space Available: 96.43 GB
 Metadata Space Used: 4.387 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.143 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.77 (2012-10-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.13.0-77-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.56 GiB
Name: ravn
ID: L2WX:3RQ7:W6IC:7MY3:M3ZC:7MP2:3ZMP:VHW4:TLXM:VLYO:NNZ5:2FVW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8
@AkihiroSuda
Contributor

@einhverfr The issue is fixed in kernel 3.13.0-79.123 (your one seems to be 3.13.0-77)

@martinm82

Can this issue really be solved with a Kernel upgrade? We are encountering the same problem with Docker 1.9.1 on Ubuntu 14.04 with Kernel 3.13.0-83-generic.

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:12:04 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:12:04 UTC 2015
 OS/Arch:      linux/amd64
@thaJeztah
Member

@martinm82 yes, this issue was a kernel issue. It's possible something else can result in a similar behavior, or if there's a regression in the kernel. However, please open a new issue if you're having issues on the current release; keep in mind that docker 1.9.1 is EOL, so won't be receiving updates anymore.

@thaJeztah
Member

I am locking the discussion on this issue, because the original issue here was resolved, and I want to prevent this issue from collecting possibly unrelated issues. See this comment; #18180 (comment) for the kernel versions needed to fix this issue

@thaJeztah thaJeztah locked and limited conversation to collaborators Feb 14, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.