Add python bindings to the global thread pool functionality #24238

khoover · 2025-03-28T19:13:43Z

Description

Allows users to configure and enable the global thread pool via Python, and have inference sessions use it instead of session-local thread pools.

Motivation and Context

Forked off of #23495 to take over implementation, see issue #23523.

Our particular use case involves a single service instance serving thousands of individual models, each relatively small (e.g. small decision trees). Creating individual services for each model is too much overhead, and attempting to start several thousand thread-pools is a non-starter. We could possibly have each session be single-threaded, but we would like to be able to separate the request handler thread count from the compute thread count (e.g. 2 handler threads but 4 intra-op ones).

snnn · 2025-03-28T19:17:51Z

We don't have a good way to shutdown the thread pool. It should be done before the python process started to exit and unload DLLs. So this doc: https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-best-practices#deadlocks-caused-by-lock-order-inversion
So, generally speaking, if you have a thread pool and you declared it as a global var, then you must manually shutdown it, otherwise it will cause a deadlock. The issue was highly reproducible. However, recently probably the restriction is gone? I am not sure. But the Windows doc is not updated. So I am not very sure if this will work.

snnn · 2025-03-28T19:23:15Z

Like this: apache/mxnet#11163

khoover · 2025-04-17T22:48:22Z

@microsoft-github-policy-service agree company="Instacart"

khoover · 2025-04-18T16:40:13Z

We don't have a good way to shutdown the thread pool. It should be done before the python process started to exit and unload DLLs. So this doc: https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-best-practices#deadlocks-caused-by-lock-order-inversion So, generally speaking, if you have a thread pool and you declared it as a global var, then you must manually shutdown it, otherwise it will cause a deadlock. The issue was highly reproducible. However, recently probably the restriction is gone? I am not sure. But the Windows doc is not updated. So I am not very sure if this will work.

Would adding onnxruntime::Environment::ShutdownGlobalThreadPools and then hooking it into atexit in Python work? I imagine the existing C++ API has examples of needing to do this as well, no?

EDIT: I see, the native implementation would just let Environment destruct at the end of the program, and that handles shutting down. But the Python bindings have a static shared_ptr<Environment>, and it's really just the thread pools that need to go.

…tting the global threading options

snnn · 2025-04-21T19:04:00Z

On Windows when a thread shuts down, the OS also needs to notify all loaded DLLs in case if they need to do any cleanup work. It cannot be too late. So, in general all user created threads should be shutdown before unloading DLLs. Theoretically speaking, there is another way of doing this: the cleanup functions should test if the currently the process is shutting down, if yes, giving up doing any clean up. We do not have such logics in ORT yet. We may consider doing it later.

Avoids compilation errors involving the affinity string.

onnxruntime/python/onnxruntime_pybind_state.cc

khoover · 2025-05-19T17:12:41Z

So, one proposal here; similar to the C++ API where you create an Env at the top of main() and use that throughout the program, and it eventually destructs right before main returns, have a build_env that returns a handle to a shared_ptr<Env> on the Python side, and require passing that in when creating inference sessions (or make inference sessions creation a method on that handle). Then EnvInitializer holds a weak_ptr<Env> that it can use to provide the handle to internal calls as needed.

(Could even hide it from the Python users by moving the global methods to Python implementations that use a Python-module global.)

@yuslepukhin is there any reason this wouldn't work over the current approach?

yuslepukhin · 2025-05-19T18:15:45Z

For the environment-level thread pool, it would be helpful to review how this is handled in the C# API, as it highlights several important issues.

The key concerns are:

The environment must be instantiated in all cases. In Python, this typically happens automatically and often implicitly. However, with this change, we need more control over that process.
To address this, we should expose Env as an object that can be explicitly managed via the Python API, while maintaining compatibility with existing code.
There are multiple ways to create an environment depending on the required options. Once Env is exposed, we should either support these variations or provide a clear path to add them in the future.
Once the environment is created, it's too late to set any options.
Please refer to https://github.com/microsoft/onnxruntime/blob/main/csharp/src/Microsoft.ML.OnnxRuntime/OrtEnv.shared.cs#L149 for context.

What is also needed here is unit tests.

The issue of shutting down the global threads on Python model unload mentioned by @snnn is probably still valid, but it is likely to be uncovered (or not) by the unit tests.

The idea of shutting down the TP does not live well in GC like environments like Python, as we can get objects that are not GCed and alive from Python point of view, but the internals are shuttered by a shutdown api. It requires discipline, and the Python code is not typically written in this manner.

Said that, there is a chance we can get it work, but not as presented in this PR.

onnxruntime/python/onnxruntime_pybind_state.cc

khoover · 2025-05-19T21:26:54Z

Looking at the C# a bit, it seems like they go with roughly what I'm suggesting of having a static singleton instance on the C#-defined OrtEnv class that wraps a handle to the C++ object. They use this SafeHandle class in C# which ensures the singleton can't be finalized while any native calls using it are in progress; I think the Python equivalent would be something like

_ENV: EnvHandle | None = None
_ENV_LOCK = threading.Lock()

def get_env() -> EnvHandle:
    if (env := _ENV) is not None:
        return env

    with _ENV_LOCK:
        if _ENV is not None:
            return _ENV
        _ENV = native_module.GetEnv()
        return _ENV

@atexit.register
def _destroy_env() -> None:
    with _ENV_LOCK:
        _ENV = None
        # Force the handle to be collected, if it was the last reference.
        gc.collect()

and then global python-exported functions like set_default_logger_severity get re-implemented as methods on the EnvHandle (with the global exports re-written using the Python get_env() and calling the methods).

yuslepukhin · 2025-05-19T22:00:51Z

We will need a couple of things:

make sure we can set the options before the environment is instantiated.
atexit gc is not enough, we will need to destroy the threadpool if exists.

snnn · 2025-05-21T05:28:40Z

We don't have a good way to shutdown the thread pool. It should be done before the python process started to exit and unload DLLs. So this doc: https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-best-practices#deadlocks-caused-by-lock-order-inversion So, generally speaking, if you have a thread pool and you declared it as a global var, then you must manually shutdown it, otherwise it will cause a deadlock. The issue was highly reproducible. However, recently probably the restriction is gone? I am not sure. But the Windows doc is not updated. So I am not very sure if this will work.

Sorry. I think I was overconcerned. I couldn't reproduce the deadlock problem anymore. Here is what I just learnt:

If there is a thread that is using the DLL, the DLL cannot be unloaded. FreeLibrary is a no-op.
If the worker threads have been shutdown before the DLL's dll_process_detach notification was called, there is no concern. Because you don't need to shutdown the threads again.

khoover · 2025-05-21T18:14:43Z

Continuing to look at the C# implementation, two more things:

The OrtEnv is not guaranteed to be destroyed before the process ends; same as any other object, it relies on the finalizer being called, which can be skipped for various reasons.
The OrtEnv (and the underlying C++ environment) can be destroyed before all InferenceSessions referencing the underlying C++ environment are destroyed. The InferenceSession constructors just grab the raw pointer from the OrtEnv handle, there's no shared_ptr tracking or holding a reference to the C# OrtEnv.

With that in mind, maybe we don't have to change anything? The environment will get destroyed when all the inference sessions and the onnxruntime.capi._pybind_state module are destroyed, with the added benefit over C# of guaranteeing the environment is alive until the sessions are gone.

I do need a bit of help figuring out how to test this, though; I've found the test/python directory, ~~but I'm not sure how I'd have it create a new process just for this~~ [EDIT: found the build.py, a subprocess per test file I see] (or how to test that the pool's shut-down before exiting in a normal exit).

snnn · 2025-05-21T20:24:46Z

The two tensor RT pipelines are failing

The environment will be destroyed automatically when all InferenceSessions and the `_pybind_state` module are destroyed. The current state of the world almost mimics the C# behaviour, except the environment lives until all sessions are destroyed (rather than C# allowing the environment to be destroyed before all sessions are).

snnn · 2025-05-27T17:20:59Z

But the TensorRT tests could not pass

Do it early because some of the EP specifics need an environment.

khoover · 2025-05-28T15:50:30Z

@yuslepukhin the issue should be resolved now. Looking at that windows cuda check that failed, seems like the error was while downloading https://github.com/onnx/onnx/archive/refs/heads/rel-1.15.0.zip:

2025-05-28T04:05:23.7868644Z info PrepareTestData Downloading onnx opset20(v1.15.0): https://github.com/onnx/onnx/archive/refs/heads/rel-1.15.0.zip^M
2025-05-28T04:05:25.1521100Z info PrepareTestData Preparing folders under D:\a\_work\onnxruntime\onnxruntime\js\test\data\node\opset20^M
2025-05-28T04:05:25.1545670Z node:events:502^M
2025-05-28T04:05:25.1546200Z       throw er; // Unhandled 'error' event^M
2025-05-28T04:05:25.1546550Z       ^^M
2025-05-28T04:05:25.1546930Z ^M
2025-05-28T04:05:25.1547111Z Error: read ECONNRESET^M
2025-05-28T04:05:25.1547763Z     at TLSWrap.onStreamRead (node:internal/stream_base_commons:218:20)^M
2025-05-28T04:05:25.1548493Z Emitted 'error' event on ClientRequest instance at:^M
2025-05-28T04:05:25.1548992Z     at emitErrorEvent (node:_http_client:101:11)^M
2025-05-28T04:05:25.1549459Z     at TLSSocket.socketErrorListener (node:_http_client:504:5)^M
2025-05-28T04:05:25.1549988Z     at TLSSocket.emit (node:events:524:28)^M
2025-05-28T04:05:25.1550769Z     at TLSSocket.emit (node:domain:489:12)^M
2025-05-28T04:05:25.1551221Z     at emitErrorNT (node:internal/streams/destroy:169:8)^M
2025-05-28T04:05:25.1551686Z     at emitErrorCloseNT (node:internal/streams/destroy:128:3)^M
2025-05-28T04:05:25.1552272Z     at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {^M
2025-05-28T04:05:25.1552740Z   errno: -4077,^M
2025-05-28T04:05:25.1553079Z   code: 'ECONNRESET',^M
2025-05-28T04:05:25.1553295Z   syscall: 'read'^M
2025-05-28T04:05:25.1553486Z }^M
2025-05-28T04:05:25.1553587Z ^M
2025-05-28T04:05:25.1553670Z Node.js v20.19.2^M

khoover · 2025-05-30T19:41:08Z

I'd like to think this is ready; WASM is almost certainly unrelated.

khoover · 2025-06-09T22:44:29Z

@snnn I've tried to incorporate the changes from #24891 here, but it mostly amounts to removing the at-import env creation and adding a bunch of if (!isEnvInit()) { CreateOrtEnv(); } blocks whenever the env is referenced in a context where it might be uninit.

An alternative would be to allow late-binding global thread pools if the env has been initialized without them. Then the Python bindings can continue creating the env at import time, and the global thread pools can be attached after-the-fact. I don't believe there's any code that would break if we did this.

onnxruntime/python/onnxruntime_pybind_module.cc

onnxruntime/python/onnxruntime_pybind_state.cc

orttraining/orttraining/python/orttraining_python_module.cc

snnn · 2025-06-10T03:44:24Z

it mostly amounts to removing the at-import env creation and adding a bunch of if (!isEnvInit()) { CreateOrtEnv(); } blocks whenever the env is referenced in a context where it might be uninit.

I don't see there you tried to uninit an OrtEnv. Did I miss something?

khoover · 2025-06-10T03:45:46Z

it mostly amounts to removing the at-import env creation and adding a bunch of if (!isEnvInit()) { CreateOrtEnv(); } blocks whenever the env is referenced in a context where it might be uninit.

I don't see there you tried to uninit an OrtEnv. Did I miss something?

Yeah, if you look at the merge, it also includes all these changes. The two PRs after revert them and go with the post-init global thread pools approach.

…s to True

snnn · 2025-06-14T23:18:34Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-06-14T23:18:53Z

Azure Pipelines successfully started running 5 pipeline(s).

snnn · 2025-06-16T05:15:32Z

Thanks!

khoover force-pushed the add-python-global-thread-pool branch from 6a2b524 to 44d5cf6 Compare April 17, 2025 23:27

alex-halpin and others added 3 commits April 18, 2025 16:47

enable global thread pool in python

9c19ed7

Lint fixes

42b4287

Include utility for std::move

510733f

khoover force-pushed the add-python-global-thread-pool branch from 44d5cf6 to 510733f Compare April 18, 2025 16:47

khoover added 2 commits April 18, 2025 22:14

Create threadpool shutdown method for Environment

c8453d1

Register global thread pool shutdown via atexit in Python

5baf1ee

khoover marked this pull request as ready for review April 18, 2025 22:36

Register the shutdown when the EnvInitializer is created, not when se…

89a602d

…tting the global threading options

khoover and others added 3 commits April 21, 2025 12:17

Fix typo

6f0ce1e

Merge branch 'main' into add-python-global-thread-pool

c22c663

Fix member hiding

9a1f7fe

snnn requested a review from skottmckay May 17, 2025 03:27

snnn added the python Pull requests that update Python code label May 17, 2025

Use default initializers over initializer list

42b2a10

Avoids compilation errors involving the affinity string.

khoover commented May 19, 2025

View reviewed changes

onnxruntime/python/onnxruntime_pybind_state.cc Outdated Show resolved Hide resolved

yuslepukhin reviewed May 19, 2025

View reviewed changes

onnxruntime/python/onnxruntime_pybind_state.cc Outdated Show resolved Hide resolved

khoover added 3 commits May 27, 2025 18:12

Merge branch 'main' into add-python-global-thread-pool

4b5b095

Add an explicit function for ensuring the ort environment is initialized

635450c

When creating InferenceSessions, ensure the environment is initialized

e81c90e

Do it early because some of the EP specifics need an environment.

snnn closed this May 31, 2025

snnn reopened this May 31, 2025

khoover marked this pull request as draft June 5, 2025 16:21

khoover marked this pull request as ready for review June 5, 2025 16:21

Merge changes from microsoft#24891

170f36a

khoover added 2 commits June 10, 2025 00:17

Update environment to allow late instantiation of global thread pools

fb12fd0

Return to eager env init, use late-binding of global thread pools

e93897c

khoover force-pushed the add-python-global-thread-pool branch from fa82356 to e93897c Compare June 10, 2025 00:22

khoover commented Jun 10, 2025

View reviewed changes

onnxruntime/python/onnxruntime_pybind_module.cc Show resolved Hide resolved

snnn reviewed Jun 10, 2025

View reviewed changes

onnxruntime/python/onnxruntime_pybind_state.cc Show resolved Hide resolved

snnn reviewed Jun 10, 2025

View reviewed changes

orttraining/orttraining/python/orttraining_python_module.cc Outdated Show resolved Hide resolved

khoover added 3 commits June 9, 2025 20:52

Fix dropped whitespace

c082bcf

Fix dropped static

c42cd0d

Add missing environment.h include

7d244e8

khoover requested a review from snnn June 10, 2025 15:34

Add test for raising an exception when setting use_per_session_thread…

1e4b6fe

…s to True

snnn approved these changes Jun 14, 2025

View reviewed changes

snnn merged commit 089c52e into microsoft:main Jun 16, 2025
83 checks passed

Add python bindings to the global thread pool functionality #24238

Add python bindings to the global thread pool functionality #24238

Conversation

khoover commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

snnn commented Mar 28, 2025

Uh oh!

snnn commented Mar 28, 2025

Uh oh!

khoover commented Apr 17, 2025

Uh oh!

khoover commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snnn commented Apr 21, 2025

Uh oh!

Uh oh!

khoover commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuslepukhin commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

khoover commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuslepukhin commented May 19, 2025

Uh oh!

snnn commented May 21, 2025

Uh oh!

khoover commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snnn commented May 21, 2025

Uh oh!

snnn commented May 27, 2025

Uh oh!

khoover commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khoover commented May 30, 2025

Uh oh!

khoover commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snnn commented Jun 10, 2025

Uh oh!

khoover commented Jun 10, 2025

Uh oh!

snnn commented Jun 14, 2025

Uh oh!

azure-pipelines bot commented Jun 14, 2025

Uh oh!

Uh oh!

snnn commented Jun 16, 2025

Uh oh!

Uh oh!

khoover commented Mar 28, 2025 •

edited

Loading

khoover commented Apr 18, 2025 •

edited

Loading

khoover commented May 19, 2025 •

edited

Loading

yuslepukhin commented May 19, 2025 •

edited

Loading

khoover commented May 19, 2025 •

edited

Loading

khoover commented May 21, 2025 •

edited

Loading

khoover commented May 28, 2025 •

edited

Loading

khoover commented Jun 9, 2025 •

edited

Loading