New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.10] "runtime/cgo: pthread_create failed: Resource temporarily unavailable" on CentOS 7 #20096
Comments
If you are reporting a new issue, make sure that we do not have any duplicates already open. You can ensure this by searching the issue list for this repository. If there is a duplicate, please close your issue and add a comment to the existing issue instead. If you suspect your issue is a bug, please edit your issue description to include the BUG REPORT INFORMATION shown below. If you fail to provide this information within 7 days, we cannot debug your issue and will close it. We will, however, reopen it if you later provide the information. For more information about reporting issues, see CONTRIBUTING.md. You don't have to include this information if this is a feature request (This is an automated, informational response) BUG REPORT INFORMATIONUse the commands below to provide key information from your environment:
Provide additional environment details (AWS, VirtualBox, physical, etc.): List the steps to reproduce the issue: Describe the results you received: Describe the results you expected: Provide additional info you think is important: ----------END REPORT --------- #ENEEDMOREINFO |
The TasksMax warning looks like a duplicate of #20036. Afaict, the warning is only a warning, and doesn't affect the way docker runs; for older versions of systemd (and kernel versions below 4.3) this should not have a difference. b.t.w., I see you're still running a release-candidate (1.10.0 has been released, but a 1.10.1 patch-release will be issued with resolves an issue with firewalld). (edit: linked wrong issue) |
Would you be able to provide the logs you found in |
Yes thaJeztah I will provide you log when next time. But I think docker ps -a has been fixed it release 1.10 then any idea why its getting stuck when number of instances are going beyond 500 or 550?? |
Here is the log from messages file:
|
Can you please check my syntax to launch a container? Here i'm trying to run every container with different user ID and providing --nproc limit to 300, which I believe will limit the number of processes related to given user and not systemwide. |
Thanks for that output, @mcmohd. Syntax looks ok to me at a glance, so wondering if there's something else that causes this. I renamed the issue, because (as discussed above) I don't think this is related to the TasksMax option |
@mcmohd can you try setting EDIT: @thaJeztah look what i've found: #9868 |
@tiborvass wondering what changed though in 1.10; does it use that many more processes? |
@mcmohd Can you provide details on how your containers are setup? What logging driver are you using? |
Aslo, is this the full trace? |
I also encounter the crash, on a different distro (manjaro). Here is my full crash trace. It happens after I start several containers with plenty of processes in them. I have a custom systemd unit for docker, without TaskMax. I will change it now to default with TaskMax set. |
@tiborvass, as I mentioned I'm running CentOS 7 where TasksMax flag is not supported.
Now let me give you complete story: I'm using image which is almost 6GB and running a machine with 192GB RAM with 24 CPUs and CentOS 7. Great news is that from last 2 days I did not get even a single crash even my number of concurrent containers crossed more than 1000. Glad to share it. Let me tell you what changes I did. (1) First of all I reduced the number of files per container from infinity to 1024 and a little reduction in number of processes per container.
(2) Increased number of files limit at OS level inside
(3) Increased number of threads at OS level in (4) Increased number of maximum number of processes at kernel level in You can check this thread for further help to tweak virtual memory and stack size, though I did not touched them.
But I'm happy that so far it's going very smooth, and let's finger cross and see for next few days. Thank you very much for coming ahead and providing required support as usual. Kind regards |
Missed to mentioned that I increased stack size also at docker level inside /usr/lib/systemd/system/docker.service. Here is complete file enter:
|
No more crashes after settings TasksMax for me. |
@stelund, what OS you are running? |
@mcmohd manjaro (https://manjaro.github.io/) an derivative from Arch |
@mcmohd I don't understand. You're saying:
Does it mean you're not seeing this issue anymore ? |
Yes tiborvass, upto 1000 containers it's running fine, I'm waiting when it will go more than 1200. |
@mcmohd I hope you don't mind, I'm closing this issue. If you see it happen again, let us know with as much information and reproducibility as possible. Thanks! |
Sure sir, for now you can close it. |
I had the similar issue while running multiple containers on Virtualbox's Ubuntu 64bit guest. It pops up when the containers are run by a script automatically one after the other. Once you retry after it failed, it succeeds without leaving a chance to reproduce. However, for some reason, I tried to remove an image with
It turned out to be that there isn't enough memory available.
I used once there is enough memory, the command runs all normal. |
@mcmohd , You have mentioned that you specified a nproc of 250 which is the overall process limit for the user. That means only 250 containers can be created. How in your case does that work? |
Hi, I'm also seeing this issue when running unlimited nproc docker images. It appears to only happen with my GO web application. Somehow it uses all resources and the docker host crashes. Limiting nproc fixed it, but I think it is pretty bad that code running inside containers can crash the host. |
@PepijnK it's important to always set constraints on a container (e.g., limit its memory, cpu). Even though processes in a container don't have file access to the host, and cannot access processes outside the container, that doesn't mean they cannot consume resources. By default, no limits are set on the amount of memory, and cpu a container is allowed to use, so if your host is running out of memory, the kernel starts to randomly kill processes. |
@thaJeztah I assumed the daemon would protect himself against that, but containers are not fully isolated (like in case of virtual machines), which is why they are lightweight. A security/performance tradeoff I guess. So, ok, I will put constraints on my containers.. |
@PepijnK containers and virtual machines suit a different goal, and generally complement each other. The daemon is configured with a negative OOM score;
That's not any different than VM's; when deploying VM's, you'll also specify the amount of memory, cpu (and disk) a VM uses. |
Hi,
I just upgraded to docker 1.10 and got struct with an issue where I'm not able to create large number of containers. I believe docker is either hanging or crashing as soon as number of containers are reaching more than 500. When I debug /var/log/messages I found that its giving resource unavailability issue on the same machine where I used to create around 1200 containers successfully.
When I studied I found that there has been an introduction of TasksMax flag which sets number of threads to 512 by default but this flag is not supported by CentOS 7 or any OS versions running 3.10.xxx and giving following error:
[/etc/systemd/system.conf:58] Unknown lvalue 'TasksMax' in section 'Manager'
Kindly suggest a way forward because it completed stopped our operation and we are not able to proceed with a high number of containers. I tried to remove TasksMax from docker.service file still there is no success. Here is the detail of docker:
Following is the info related to version:
The text was updated successfully, but these errors were encountered: