Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.10] "runtime/cgo: pthread_create failed: Resource temporarily unavailable" on CentOS 7 #20096

Closed
mcmohd opened this issue Feb 8, 2016 · 27 comments
Labels
priority/P3 Best effort: those are nice to have / minor issues.
Milestone

Comments

@mcmohd
Copy link

mcmohd commented Feb 8, 2016

Hi,

I just upgraded to docker 1.10 and got struct with an issue where I'm not able to create large number of containers. I believe docker is either hanging or crashing as soon as number of containers are reaching more than 500. When I debug /var/log/messages I found that its giving resource unavailability issue on the same machine where I used to create around 1200 containers successfully.

When I studied I found that there has been an introduction of TasksMax flag which sets number of threads to 512 by default but this flag is not supported by CentOS 7 or any OS versions running 3.10.xxx and giving following error:

[/etc/systemd/system.conf:58] Unknown lvalue 'TasksMax' in section 'Manager'

Kindly suggest a way forward because it completed stopped our operation and we are not able to proceed with a high number of containers. I tried to remove TasksMax from docker.service file still there is no success. Here is the detail of docker:

[root@p4029667 log]# docker info
Containers: 442
 Running: 401
 Paused: 0
 Stopped: 41
Images: 30
Server Version: 1.10.0-rc3
Storage Driver: devicemapper
 Pool Name: docker-253:1-538163109-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/vg-docker/data
 Metadata file: /dev/vg-docker/metadata
 Data Space Used: 34.29 GB
 Data Space Total: 536.9 GB
 Data Space Available: 502.6 GB
 Metadata Space Used: 299.7 MB
 Metadata Space Total: 4.295 GB
 Metadata Space Available: 3.995 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2015-10-14)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.10.0-123.20.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 188.7 GiB
Name: p4029667.pubip.serverbeach.com
ID: GYAC:IFA4:2ZBZ:FYMM:GT5G:CIIF:WSMY:3FVS:FZBU:B7LN:4WSQ:ZB6I
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Following is the info related to version:

root@p4029667 log]# docker version
Client:
 Version:      1.10.0-rc3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   08c24cc
 Built:        Tue Feb  2 22:54:00 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.0-rc3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   08c24cc
 Built:        Tue Feb  2 22:54:00 2016
 OS/Arch:      linux/amd64
@GordonTheTurtle
Copy link

If you are reporting a new issue, make sure that we do not have any duplicates already open. You can ensure this by searching the issue list for this repository. If there is a duplicate, please close your issue and add a comment to the existing issue instead.

If you suspect your issue is a bug, please edit your issue description to include the BUG REPORT INFORMATION shown below. If you fail to provide this information within 7 days, we cannot debug your issue and will close it. We will, however, reopen it if you later provide the information.

For more information about reporting issues, see CONTRIBUTING.md.

You don't have to include this information if this is a feature request

(This is an automated, informational response)


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:

docker version:
docker info:

Provide additional environment details (AWS, VirtualBox, physical, etc.):

List the steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

Describe the results you expected:

Provide additional info you think is important:

----------END REPORT ---------

#ENEEDMOREINFO

@thaJeztah
Copy link
Member

The TasksMax warning looks like a duplicate of #20036. Afaict, the warning is only a warning, and doesn't affect the way docker runs; for older versions of systemd (and kernel versions below 4.3) this should not have a difference.

b.t.w., I see you're still running a release-candidate (1.10.0 has been released, but a 1.10.1 patch-release will be issued with resolves an issue with firewalld).

(edit: linked wrong issue)

@thaJeztah
Copy link
Member

Would you be able to provide the logs you found in /var/log/messages? Also, could you see if running the daemon with D (debug) gives anything useful?

@mcmohd
Copy link
Author

mcmohd commented Feb 9, 2016

Yes thaJeztah I will provide you log when next time. But I think docker ps -a has been fixed it release 1.10 then any idea why its getting stuck when number of instances are going beyond 500 or 550??

@mcmohd
Copy link
Author

mcmohd commented Feb 9, 2016

Here is the log from messages file:

Feb  7 23:47:21 p4029667 node: docker create --memory=100m --env-file=/var/www/html/docker.env -u 37842:37842 --ulimit nproc=300 -p 37842:37842 newbase jx /home/cg/src/index.jx 37842 cpp11 1454734359-3964
Feb  7 23:47:21 p4029667 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth54cff0d: link becomes ready
Feb  7 23:47:21 p4029667 kernel: docker0: port 145(veth54cff0d) entered forwarding state
Feb  7 23:47:21 p4029667 kernel: docker0: port 145(veth54cff0d) entered forwarding state
Feb  7 23:47:21 p4029667 node: Retunring with error : Error: Command failed: runtime/cgo: pthread_create failed: Resource temporarily unavailable
Feb  7 23:47:21 p4029667 node: SIGABRT: abort
Feb  7 23:47:21 p4029667 node: PC=0x7f4f2fab85f7 m=5
Feb  7 23:47:21 p4029667 node: goroutine 0 [idle]:
Feb  7 23:47:21 p4029667 node: goroutine 1 [runnable, locked to thread]:
Feb  7 23:47:21 p4029667 node: runtime.Gosched()
Feb  7 23:47:21 p4029667 node: /usr/local/go/src/runtime/proc.go:166 +0x14
Feb  7 23:47:21 p4029667 node: github.com/docker/libnetwork/ipamutils.initGranularPredefinedNetworks(0x0, 0x0, 0x0)
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/docker/libnetwork/ipamutils/utils.go:38 +0x111
Feb  7 23:47:21 p4029667 node: github.com/docker/libnetwork/ipamutils.init.1()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/docker/libnetwork/ipamutils/utils.go:17 +0x4d
Feb  7 23:47:21 p4029667 node: github.com/docker/libnetwork/ipamutils.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/docker/libnetwork/ipamutils/utils_linux.go:74 +0x59
Feb  7 23:47:21 p4029667 node: github.com/docker/libnetwork/ipam.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/docker/libnetwork/ipam/utils.go:81 +0x5e
Feb  7 23:47:21 p4029667 node: github.com/docker/libnetwork/ipams/builtin.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/docker/libnetwork/ipams/builtin/builtin.go:35 +0x45
Feb  7 23:47:21 p4029667 node: github.com/docker/libnetwork.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/docker/libnetwork/store.go:422 +0xa6
Feb  7 23:47:21 p4029667 node: github.com/docker/docker/container.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/container/store.go:28 +0xe9
Feb  7 23:47:21 p4029667 node: github.com/docker/docker/daemon.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/daemon/wait.go:17 +0x5b
Feb  7 23:47:21 p4029667 node: github.com/docker/docker/api/server/router/local.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/api/server/router/local/local.go:107 +0xa3
Feb  7 23:47:21 p4029667 node: github.com/docker/docker/api/server/router/build.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/api/server/router/build/build_routes.go:274 +0x44
Feb  7 23:47:21 p4029667 node: github.com/docker/docker/api/server.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/api/server/server_unix.go:132 +0xad
Feb  7 23:47:21 p4029667 node: main.init()
Feb  7 23:47:21 p4029667 node: /root/rpmbuild/BUILD/docker-engine/docker/flags.go:30 +0x92
Feb  7 23:47:21 p4029667 node: goroutine 17 [syscall, locked to thread]:
Feb  7 23:47:21 p4029667 node: runtime.goexit()
Feb  7 23:47:21 p4029667 node: /usr/local/go/src/runtime/asm_amd64.s:1721 +0x1
Feb  7 23:47:21 p4029667 node: goroutine 35 [syscall]:
Feb  7 23:47:21 p4029667 node: os/signal.loop()
Feb  7 23:47:21 p4029667 node: /usr/local/go/src/os/signal/signal_unix.go:22 +0x18
Feb  7 23:47:21 p4029667 node: created by os/signal.init.1
Feb  7 23:47:21 p4029667 node: /usr/local/go/src/os/signal/signal_unix.go:28 +0x37
Feb  7 23:47:21 p4029667 node: rax    0x0
Feb  7 23:47:21 p4029667 node: rbx    0x7f4f2fe3e868
Feb  7 23:47:21 p4029667 node: rcx    0xffffffffffffffff
Feb  7 23:47:21 p4029667 node: rdx    0x6
Feb  7 23:47:21 p4029667 node: rdi    0x2838
Feb  7 23:47:21 p4029667 node: rsi    0x384d
Feb  7 23:47:21 p4029667 node: rbp    0x1990bcf
Feb  7 23:47:21 p4029667 node: rsp    0x7f4f2bde1838
Feb  7 23:47:21 p4029667 node: r8     0xa
Feb  7 23:47:21 p4029667 node: r9     0x7f4f2bde2700
Feb  7 23:47:21 p4029667 node: r10    0x8
Feb  7 23:47:21 p4029667 node: r11    0x202
Feb  7 23:47:21 p4029667 node: r12    0x7f4f1c0008c0
Feb  7 23:47:21 p4029667 node: r13    0x193f074
Feb  7 23:47:21 p4029667 node: r14    0x0
Feb  7 23:47:21 p4029667 node: r15    0x8
Feb  7 23:47:21 p4029667 node: rip    0x7f4f2fab85f7
Feb  7 23:47:21 p4029667 node: rflags 0x202
Feb  7 23:47:21 p4029667 node: cs     0x33
Feb  7 23:47:21 p4029667 node: fs     0x0
Feb  7 23:47:21 p4029667 node: gs     0x0
Feb  7 23:47:21 p4029667 node: Retunring with error : Error: Command failed: runtime/cgo: pthread_create failed: Resource temporarily unavailable
Feb  7 23:47:21 p4029667 node: SIGABRT: abort
Feb  7 23:47:21 p4029667 node: PC=0x7f84d21075f7 m=6
Feb  7 23:47:21 p4029667 node: goroutine 0 [idle]:
Feb  7 23:47:21 p4029667 node: goroutine 20 [running]:
Feb  7 23:47:21 p4029667 node: runtime.systemstack_switch()
Feb  7 23:47:21 p4029667 node: /usr/local/go/src/runtime/asm_amd64.s:216 fp=0xc82003dc98 sp=0xc82003dc90
Feb  7 23:47:21 p4029667 node: runtime.gc(0x0)
Feb  7 23:47:21 p4029667 node: /usr/local/go/src/runtime/mgc.go:1006 +0x1db fp=0xc82003df90 sp=0xc82003dc98
Feb  7 23:47:21 p4029667 node: runtime.backgroundgc()
Feb  7 23:47:21 p4029667 node: /usr/local/go/src/runtime/mgc.go:897 +0x3d fp=0xc82003dfc0 sp=0xc82003df90

@mcmohd
Copy link
Author

mcmohd commented Feb 9, 2016

Can you please check my syntax to launch a container? Here i'm trying to run every container with different user ID and providing --nproc limit to 300, which I believe will limit the number of processes related to given user and not systemwide.

@thaJeztah thaJeztah added this to the 1.10.1 milestone Feb 9, 2016
@thaJeztah thaJeztah changed the title TasksMax flag is not supported on CentOS 7 [1.10] "runtime/cgo: pthread_create failed: Resource temporarily unavailable" on CentOS 7 Feb 9, 2016
@thaJeztah
Copy link
Member

Thanks for that output, @mcmohd. Syntax looks ok to me at a glance, so wondering if there's something else that causes this.

I renamed the issue, because (as discussed above) I don't think this is related to the TasksMax option

@tiborvass
Copy link
Contributor

@mcmohd can you try setting TasksMax=infinity just out of curiosity.

EDIT: @thaJeztah look what i've found: #9868

@thaJeztah
Copy link
Member

@tiborvass wondering what changed though in 1.10; does it use that many more processes?

@tiborvass tiborvass added the priority/P2 Normal priority: default priority applied. label Feb 10, 2016
@cpuguy83
Copy link
Member

@mcmohd Can you provide details on how your containers are setup? What logging driver are you using?

@cpuguy83
Copy link
Member

Aslo, is this the full trace?

@stelund
Copy link

stelund commented Feb 10, 2016

I also encounter the crash, on a different distro (manjaro). Here is my full crash trace. It happens after I start several containers with plenty of processes in them.

docker-crash.txt

I have a custom systemd unit for docker, without TaskMax. I will change it now to default with TaskMax set.

@mcmohd
Copy link
Author

mcmohd commented Feb 10, 2016

@tiborvass, as I mentioned I'm running CentOS 7 where TasksMax flag is not supported.
@thaJeztah, I'm still running 1.10.0-rc3, build 08c24cc
@cpuguy83, sir here is the command which I'm using to create containers.

docker create --memory=100m --env-file=/var/www/html/docker.env -u 37842:37842 --ulimit nproc=300 -p 37842:37842 newbase jx /home/cg/src/index.jx 37842 cpp11 1454734359-3964

Now let me give you complete story: I'm using image which is almost 6GB and running a machine with 192GB RAM with 24 CPUs and CentOS 7.

Great news is that from last 2 days I did not get even a single crash even my number of concurrent containers crossed more than 1000. Glad to share it. Let me tell you what changes I did.

(1) First of all I reduced the number of files per container from infinity to 1024 and a little reduction in number of processes per container.

docker create --memory=100m --env-file=/var/www/html/docker.env -u 37842:37842 --ulimit nproc=250 --ulimit nofile=1024 -p 37842:37842 newbase jx /home/cg/src/index.jx 37842 cpp11 1454734359-3964

(2) Increased number of files limit at OS level inside /etc/security/limits.conf which was earlier set at very low, I think 65K.

  • - nofile 1048576

(3) Increased number of threads at OS level in /proc/sys/kernel/threads-max, earlier it was set to 1545841 and now I set it at 3091639

(4) Increased number of maximum number of processes at kernel level in /proc/sys/kernel/pid_max, earlier it was set at 32768 and now I increased it at 4194304

You can check this thread for further help to tweak virtual memory and stack size, though I did not touched them.

http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux

But I'm happy that so far it's going very smooth, and let's finger cross and see for next few days.

Thank you very much for coming ahead and providing required support as usual.

Kind regards
mohtashim
tutorialspoint.com

@mcmohd
Copy link
Author

mcmohd commented Feb 10, 2016

Missed to mentioned that I increased stack size also at docker level inside /usr/lib/systemd/system/docker.service. Here is complete file enter:

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/docker daemon --storage-opt dm.basesize=10G --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata --storage-opt dm.fs=xfs --storage-opt dm.blkdiscard=false --storage-opt dm.use_deferred_deletion=true --storage-opt dm.use_deferred_removal=true
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
LimitSTACK=33554432

[Install]
WantedBy=multi-user.target

@stelund
Copy link

stelund commented Feb 10, 2016

No more crashes after settings TasksMax for me.

@mcmohd
Copy link
Author

mcmohd commented Feb 10, 2016

@stelund, what OS you are running?

@stelund
Copy link

stelund commented Feb 10, 2016

@mcmohd manjaro (https://manjaro.github.io/) an derivative from Arch

@tiborvass tiborvass added priority/P3 Best effort: those are nice to have / minor issues. and removed priority/P2 Normal priority: default priority applied. labels Feb 10, 2016
@tiborvass
Copy link
Contributor

@mcmohd I don't understand. You're saying:

But I'm happy that so far it's going very smooth, and let's finger cross and see for next few days.

Does it mean you're not seeing this issue anymore ?

@mcmohd
Copy link
Author

mcmohd commented Feb 11, 2016

Yes tiborvass, upto 1000 containers it's running fine, I'm waiting when it will go more than 1200.

@tiborvass
Copy link
Contributor

@mcmohd I hope you don't mind, I'm closing this issue. If you see it happen again, let us know with as much information and reproducibility as possible. Thanks!

@mcmohd
Copy link
Author

mcmohd commented Feb 12, 2016

Sure sir, for now you can close it.

@tsrivishnu
Copy link

I had the similar issue while running multiple containers on Virtualbox's Ubuntu 64bit guest. It pops up when the containers are run by a script automatically one after the other. Once you retry after it failed, it succeeds without leaving a chance to reproduce.

However, for some reason, I tried to remove an image with docker rmi ... and it keeps exiting with this error.

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f34b919ecc9 m=3

goroutine 0 [idle]:

goroutine 6 [syscall]:
runtime.notetsleepg(0x20add20, 0xffffffffffffffff, 0x1)
        /usr/local/go/src/runtime/lock_futex.go:202 +0x4e fp=0xc820023f40 sp=0xc820023f18
runtime.signal_recv(0x6)
        /usr/local/go/src/runtime/sigqueue.go:111 +0x132 fp=0xc820023f78 sp=0xc820023f40
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x18 fp=0xc820023fc0 sp=0xc820023f78
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1721 +0x1 fp=0xc820023fc8 sp=0xc820023fc0
created by os/signal.init.1
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x37

goroutine 1 [runnable, locked to thread]:
github.com/docker/docker/pkg/tarsum.NewTHash(0x164bdd0, 0x6, 0x1948820, 0x0, 0x0)
        /usr/src/docker/.gopath/src/github.com/docker/docker/pkg/tarsum/tarsum.go:133 +0x8d
github.com/docker/docker/pkg/tarsum.init()
        /usr/src/docker/.gopath/src/github.com/docker/docker/pkg/tarsum/tarsum.go:150 +0x1ca
github.com/docker/docker/builder.init()
        /usr/src/docker/.gopath/src/github.com/docker/docker/builder/tarsum.go:158 +0xa1
github.com/docker/docker/builder/dockerfile.init()
        /usr/src/docker/.gopath/src/github.com/docker/docker/builder/dockerfile/support.go:16 +0x6f
github.com/docker/docker/api/server/router/local.init()
        /usr/src/docker/.gopath/src/github.com/docker/docker/api/server/router/local/local.go:107 +0x71
github.com/docker/docker/api/server/router/build.init()
        /usr/src/docker/.gopath/src/github.com/docker/docker/api/server/router/build/build_routes.go:274 +0x44
github.com/docker/docker/api/server.init()
        /usr/src/docker/.gopath/src/github.com/docker/docker/api/server/server_unix.go:132 +0xad
main.init()
        /usr/src/docker/docker/flags.go:30 +0x92

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1721 +0x1

rax    0x0
rbx    0x7f34b9527868
rcx    0xffffffffffffffff
rdx    0x6
rdi    0x48e9
rsi    0x48eb
rbp    0x1996ddf
rsp    0x7f34b6f7e8a8
r8     0xa
r9     0x7f34b6f7f700
r10    0x8
r11    0x202
r12    0x7f34b00008c0
r13    0x1944f70
r14    0x0
r15    0x8
rip    0x7f34b919ecc9
rflags 0x202
cs     0x33
fs     0x0
gs     0x0

It turned out to be that there isn't enough memory available.

$ free -m
             total       used       free     shared    buffers     cached
Mem:           489        484          5         18          0         22
-/+ buffers/cache:        462         27
Swap:            0          0          0

I used top to find which process is taking up my Memory and found Java taking about 60% and pkilled it to see about 280 free memory from free -m command.

once there is enough memory, the command runs all normal.

@ganesanramkumar
Copy link

@mcmohd , You have mentioned that you specified a nproc of 250 which is the overall process limit for the user. That means only 250 containers can be created. How in your case does that work?

@PepijnK
Copy link

PepijnK commented Nov 15, 2016

Hi, I'm also seeing this issue when running unlimited nproc docker images. It appears to only happen with my GO web application. Somehow it uses all resources and the docker host crashes. Limiting nproc fixed it, but I think it is pretty bad that code running inside containers can crash the host.

@thaJeztah
Copy link
Member

@PepijnK it's important to always set constraints on a container (e.g., limit its memory, cpu). Even though processes in a container don't have file access to the host, and cannot access processes outside the container, that doesn't mean they cannot consume resources. By default, no limits are set on the amount of memory, and cpu a container is allowed to use, so if your host is running out of memory, the kernel starts to randomly kill processes.

@PepijnK
Copy link

PepijnK commented Nov 15, 2016

@thaJeztah I assumed the daemon would protect himself against that, but containers are not fully isolated (like in case of virtual machines), which is why they are lightweight. A security/performance tradeoff I guess. So, ok, I will put constraints on my containers..

@thaJeztah
Copy link
Member

@PepijnK containers and virtual machines suit a different goal, and generally complement each other. The daemon is configured with a negative OOM score; --oom-score-adjust=-500, which means it's very unlikely to be killed before containers are killed (but not "unkillable"). The daemon is not in control there, that's a task for the kernel; docker tells the daemon how to "provision" a container and what constraints to put on them, after that the daemon only monitors (since docker 1.12, you can even stop the daemon, and the containers keep running).

So, ok, I will put constraints on my containers..

That's not any different than VM's; when deploying VM's, you'll also specify the amount of memory, cpu (and disk) a VM uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P3 Best effort: those are nice to have / minor issues.
Projects
None yet
Development

No branches or pull requests

9 participants