Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue allocating threads on 3.10.11-slim vs 3.10.12-slim #835

Closed
sf-chris opened this issue Jun 15, 2023 · 16 comments
Closed

Issue allocating threads on 3.10.11-slim vs 3.10.12-slim #835

sf-chris opened this issue Jun 15, 2023 · 16 comments

Comments

@sf-chris
Copy link

sf-chris commented Jun 15, 2023

Hey everyone, thanks for taking the time to review this issue, I'm really stuck with how to debug an issue with have with the docker images: 3.10.11-slim and 3.10.12-slim.

Inside the python3.10.11-slim image, I can easily start a thread, for example:

import threading
x = threading.Thread(target=lambda x: print(x), args=(1,))
x.start()

However, when running the same code inside the python 3.10.12-slim image, I get the following error:

import threading
x = threading.Thread(target=lambda x: print(x), args=(1,))
x.start()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/threading.py", line 935, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

Thread availability on the server seem to not be the problem, at least as far as I can tell

cat /proc/sys/kernel/threads-max
31010

Stack overflow says this is how many threads are running

ps -eo nlwp | tail -n +2 | awk '{ num_threads += $1 } END { print num_threads }'
518

It may also be useful to note that this is only happening on one of my servers (ec2 in aws) and does not happen on other servers (same version of ubuntu/docker, and on my local, docker for mac).

Lastly, the version of Docker that's on all the servers are 20.10.7 and the version of ubuntu is 16.04.6)

Any ideas would be greatly appreciated!

@ProfRoxas
Copy link

I also noticed this issue, so far I learned that it's happening when using the debian 12 (bookworm) based image, which the slim defaults to.
3.11-slim-bookworm fails, but 3.11-slim-bullseye or 3.11-slim-buster works fine

@Cookiehook
Copy link

I'm seeing the same problem on 3.10. Also looks like it's caused by the Bullseye -> Bookworm swap on the default. Also fixed by explicitly pinning to Bullseye.

@sf-chris
Copy link
Author

Do you have any ideas why this only happens in some environments and not others? Seems to be only on one of my specific aws VMs and not present on others nor mac for desktop.

@Marx314
Copy link

Marx314 commented Jun 15, 2023

can confirm similar issue on 3.11.4-slim (bookworm) doing a pip install tox got me the following stacktrace

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/cli/req_command.py", line 248, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
                ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
                                       ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
                ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
    dist = self._prepare_distribution()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/operations/prepare.py", line 516, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/operations/prepare.py", line 587, in _prepare_linked_requirement
    local_file = unpack_url(
                 ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/operations/prepare.py", line 166, in unpack_url
    file = get_http_url(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/operations/prepare.py", line 107, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/network/download.py", line 147, in __call__
    for chunk in chunks:
  File "/usr/local/lib/python3.11/site-packages/pip/_internal/cli/progress_bars.py", line 52, in _rich_progress_bar
    with progress:
  File "/usr/local/lib/python3.11/site-packages/pip/_vendor/rich/progress.py", line 1169, in __enter__
    self.start()
  File "/usr/local/lib/python3.11/site-packages/pip/_vendor/rich/progress.py", line 1160, in start
    self.live.start(refresh=True)
  File "/usr/local/lib/python3.11/site-packages/pip/_vendor/rich/live.py", line 132, in start
    self._refresh_thread.start()
  File "/usr/local/lib/python3.11/threading.py", line 957, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

@yosifkit
Copy link
Member

From #837 (comment):

I'd suggest updating docker and libseccomp on the host. Newer base OS's use newer system calls and an older libseccomp can block them since they are unknown to it. You can verify that it is libseccomp by running the bookworm image with --security-opt seccomp=unconfined.

This is similar to the update to Ubuntu focal: docker-library/mongo#606 (comment)

If you are having issues with the new bookworm based images, then switching to the *-bullseye images is one possible workaround.

Similar to #836 and #837

@sf-chris
Copy link
Author

Okay so @yosifkit is correct in that a newer version of libseccomp seems to be causing the issue (I have literally no idea what that library does.)

In the example above, the reading issue was only present on one of the ec2 instances (A) and not present on the second (B).

I checked the versions on each of the servers:

A: libseccomp2:amd64 2.5.1-1ubuntu1~16.04.1
B: libseccomp2:amd64 2.4.3-1ubuntu3.16.04.3

I updated server A (the one with the issues) to use the same version as B, and the threading issue inside the bookworm container was gone.

I'm curious to know why libseccomp is the issue and how reverting a version made a difference? Does anyone have the time to explain that to me?

@yosifkit
Copy link
Member

libseccomp lets you configure allowed syscalls for a process. Docker sets a default seccomp profile for all containers such that only certain syscalls are allowed. I am very confused that downgrading libsccomp2 would make it work; usually an updated libseccomp is needed. 😕

https://docs.docker.com/engine/security/seccomp/

@tianon
Copy link
Member

tianon commented Jun 15, 2023

Please tell me that's not Ubuntu 16.04 I see in those package versions? 😬 🙈

@sf-chris
Copy link
Author

Please tell me that's not Ubuntu 16.04

why

@tianon
Copy link
Member

tianon commented Jun 20, 2023

Ubuntu 16.04 is quite old at this point, and unless you're paying Canonical for their ESM, has been EOL for ~2 years. 😬

(In general, expecting newer distributions to run successfully on top of older distributions is going to continue to have this problem over time as new kernel system calls are created and start to be used, which is the root of this particular failure to run.)

@brycedrennan

This comment was marked as duplicate.

@shamoon
Copy link

shamoon commented Jun 21, 2023

We have users experiencing this on Ubuntu 22.04, so Im not sure it's limited to older distros. (I personally cannot re-create this but we have multiple reports of it)

@tianon
Copy link
Member

tianon commented Jun 21, 2023

docker version + runc --version + dpkg-query --show --showformat '${Package}\t${Version}\n' libseccomp2 (and even docker info but to a much lesser extent) would be useful output for narrowing that down

@tom666-debug
Copy link

docker version: 19.03.8
runc --version: 1.0.1-dev
libseccomp2: 2.5.1-1bpo10+1A5.0.0.202303281914
OS: Linux version 4.19.0-24-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.282-1 (2023-04-29)

@mephinet
Copy link

I can reproduce this issue with podman, however not using the slim images, but the normal ones:
While
podman run --rm -ti python:3.10.11 pip install numpy
runs successfully,
podman run --rm -ti python:3.10.12 pip install numpy
fails with

RuntimeError: can't start new thread

max user processes is set to 30842

@tianon
Copy link
Member

tianon commented Jul 19, 2023

From #837 (comment):

I'd suggest updating docker and libseccomp on the host. Newer base OS's use newer system calls and an older libseccomp can block them since they are unknown to it. You can verify that it is libseccomp by running the bookworm image with --security-opt seccomp=unconfined.
This is similar to the update to Ubuntu focal: docker-library/mongo#606 (comment)

If you are having issues with the new bookworm based images, then switching to the *-bullseye images is one possible workaround.

Similar to #836 and #837

gkapkowski added a commit to CryptoverseCC/cryptoverse-login that referenced this issue Jul 29, 2023
gkapkowski added a commit to CryptoverseCC/cryptoverse-login that referenced this issue Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants