Skip to content

Add python bindings to the global thread pool functionality #24238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 16, 2025

Conversation

khoover
Copy link
Contributor

@khoover khoover commented Mar 28, 2025

Description

Allows users to configure and enable the global thread pool via Python, and have inference sessions use it instead of session-local thread pools.

Motivation and Context

Forked off of #23495 to take over implementation, see issue #23523.

Our particular use case involves a single service instance serving thousands of individual models, each relatively small (e.g. small decision trees). Creating individual services for each model is too much overhead, and attempting to start several thousand thread-pools is a non-starter. We could possibly have each session be single-threaded, but we would like to be able to separate the request handler thread count from the compute thread count (e.g. 2 handler threads but 4 intra-op ones).

@snnn
Copy link
Member

snnn commented Mar 28, 2025

We don't have a good way to shutdown the thread pool. It should be done before the python process started to exit and unload DLLs. So this doc: https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-best-practices#deadlocks-caused-by-lock-order-inversion
So, generally speaking, if you have a thread pool and you declared it as a global var, then you must manually shutdown it, otherwise it will cause a deadlock. The issue was highly reproducible. However, recently probably the restriction is gone? I am not sure. But the Windows doc is not updated. So I am not very sure if this will work.

@snnn
Copy link
Member

snnn commented Mar 28, 2025

Like this: apache/mxnet#11163

@khoover
Copy link
Contributor Author

khoover commented Apr 17, 2025

@microsoft-github-policy-service agree company="Instacart"

@khoover khoover force-pushed the add-python-global-thread-pool branch from 6a2b524 to 44d5cf6 Compare April 17, 2025 23:27
@khoover
Copy link
Contributor Author

khoover commented Apr 18, 2025

We don't have a good way to shutdown the thread pool. It should be done before the python process started to exit and unload DLLs. So this doc: https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-best-practices#deadlocks-caused-by-lock-order-inversion So, generally speaking, if you have a thread pool and you declared it as a global var, then you must manually shutdown it, otherwise it will cause a deadlock. The issue was highly reproducible. However, recently probably the restriction is gone? I am not sure. But the Windows doc is not updated. So I am not very sure if this will work.

Would adding onnxruntime::Environment::ShutdownGlobalThreadPools and then hooking it into atexit in Python work? I imagine the existing C++ API has examples of needing to do this as well, no?

EDIT: I see, the native implementation would just let Environment destruct at the end of the program, and that handles shutting down. But the Python bindings have a static shared_ptr<Environment>, and it's really just the thread pools that need to go.

@khoover khoover force-pushed the add-python-global-thread-pool branch from 44d5cf6 to 510733f Compare April 18, 2025 16:47
@khoover khoover marked this pull request as ready for review April 18, 2025 22:36
@snnn
Copy link
Member

snnn commented Apr 21, 2025

On Windows when a thread shuts down, the OS also needs to notify all loaded DLLs in case if they need to do any cleanup work. It cannot be too late. So, in general all user created threads should be shutdown before unloading DLLs. Theoretically speaking, there is another way of doing this: the cleanup functions should test if the currently the process is shutting down, if yes, giving up doing any clean up. We do not have such logics in ORT yet. We may consider doing it later.

@snnn snnn requested a review from skottmckay May 17, 2025 03:27
@snnn snnn added the python Pull requests that update Python code label May 17, 2025
Avoids compilation errors involving the affinity string.
@khoover
Copy link
Contributor Author

khoover commented May 19, 2025

So, one proposal here; similar to the C++ API where you create an Env at the top of main() and use that throughout the program, and it eventually destructs right before main returns, have a build_env that returns a handle to a shared_ptr<Env> on the Python side, and require passing that in when creating inference sessions (or make inference sessions creation a method on that handle). Then EnvInitializer holds a weak_ptr<Env> that it can use to provide the handle to internal calls as needed.

(Could even hide it from the Python users by moving the global methods to Python implementations that use a Python-module global.)

@yuslepukhin is there any reason this wouldn't work over the current approach?

@yuslepukhin
Copy link
Member

yuslepukhin commented May 19, 2025

For the environment-level thread pool, it would be helpful to review how this is handled in the C# API, as it highlights several important issues.

The key concerns are:

  • The environment must be instantiated in all cases. In Python, this typically happens automatically and often implicitly. However, with this change, we need more control over that process.
  • To address this, we should expose Env as an object that can be explicitly managed via the Python API, while maintaining compatibility with existing code.
  • There are multiple ways to create an environment depending on the required options. Once Env is exposed, we should either support these variations or provide a clear path to add them in the future.
  • Once the environment is created, it's too late to set any options.
    Please refer to https://github.com/microsoft/onnxruntime/blob/main/csharp/src/Microsoft.ML.OnnxRuntime/OrtEnv.shared.cs#L149 for context.

What is also needed here is unit tests.

The issue of shutting down the global threads on Python model unload mentioned by @snnn is probably still valid, but it is likely to be uncovered (or not) by the unit tests.

The idea of shutting down the TP does not live well in GC like environments like Python, as we can get objects that are not GCed and alive from Python point of view, but the internals are shuttered by a shutdown api. It requires discipline, and the Python code is not typically written in this manner.

Said that, there is a chance we can get it work, but not as presented in this PR.

@khoover
Copy link
Contributor Author

khoover commented May 19, 2025

Looking at the C# a bit, it seems like they go with roughly what I'm suggesting of having a static singleton instance on the C#-defined OrtEnv class that wraps a handle to the C++ object. They use this SafeHandle class in C# which ensures the singleton can't be finalized while any native calls using it are in progress; I think the Python equivalent would be something like

_ENV: EnvHandle | None = None
_ENV_LOCK = threading.Lock()

def get_env() -> EnvHandle:
    if (env := _ENV) is not None:
        return env

    with _ENV_LOCK:
        if _ENV is not None:
            return _ENV
        _ENV = native_module.GetEnv()
        return _ENV

@atexit.register
def _destroy_env() -> None:
    with _ENV_LOCK:
        _ENV = None
        # Force the handle to be collected, if it was the last reference.
        gc.collect()

and then global python-exported functions like set_default_logger_severity get re-implemented as methods on the EnvHandle (with the global exports re-written using the Python get_env() and calling the methods).

@yuslepukhin
Copy link
Member

We will need a couple of things:

  • make sure we can set the options before the environment is instantiated.
  • atexit gc is not enough, we will need to destroy the threadpool if exists.

@snnn
Copy link
Member

snnn commented May 21, 2025

We don't have a good way to shutdown the thread pool. It should be done before the python process started to exit and unload DLLs. So this doc: https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-best-practices#deadlocks-caused-by-lock-order-inversion So, generally speaking, if you have a thread pool and you declared it as a global var, then you must manually shutdown it, otherwise it will cause a deadlock. The issue was highly reproducible. However, recently probably the restriction is gone? I am not sure. But the Windows doc is not updated. So I am not very sure if this will work.

Sorry. I think I was overconcerned. I couldn't reproduce the deadlock problem anymore. Here is what I just learnt:

  1. If there is a thread that is using the DLL, the DLL cannot be unloaded. FreeLibrary is a no-op.
  2. If the worker threads have been shutdown before the DLL's dll_process_detach notification was called, there is no concern. Because you don't need to shutdown the threads again.

@khoover
Copy link
Contributor Author

khoover commented May 21, 2025

Continuing to look at the C# implementation, two more things:

  1. The OrtEnv is not guaranteed to be destroyed before the process ends; same as any other object, it relies on the finalizer being called, which can be skipped for various reasons.
  2. The OrtEnv (and the underlying C++ environment) can be destroyed before all InferenceSessions referencing the underlying C++ environment are destroyed. The InferenceSession constructors just grab the raw pointer from the OrtEnv handle, there's no shared_ptr tracking or holding a reference to the C# OrtEnv.

With that in mind, maybe we don't have to change anything? The environment will get destroyed when all the inference sessions and the onnxruntime.capi._pybind_state module are destroyed, with the added benefit over C# of guaranteeing the environment is alive until the sessions are gone.

I do need a bit of help figuring out how to test this, though; I've found the test/python directory, but I'm not sure how I'd have it create a new process just for this [EDIT: found the build.py, a subprocess per test file I see] (or how to test that the pool's shut-down before exiting in a normal exit).

@snnn
Copy link
Member

snnn commented May 21, 2025

The two tensor RT pipelines are failing

The environment will be destroyed automatically when all
InferenceSessions and the `_pybind_state` module are destroyed. The
current state of the world almost mimics the C# behaviour, except the
environment lives until all sessions are destroyed (rather than C#
allowing the environment to be destroyed before all sessions are).
@snnn
Copy link
Member

snnn commented May 27, 2025

But the TensorRT tests could not pass

@khoover
Copy link
Contributor Author

khoover commented May 28, 2025

@yuslepukhin the issue should be resolved now. Looking at that windows cuda check that failed, seems like the error was while downloading https://github.com/onnx/onnx/archive/refs/heads/rel-1.15.0.zip:

2025-05-28T04:05:23.7868644Z info PrepareTestData Downloading onnx opset20(v1.15.0): https://github.com/onnx/onnx/archive/refs/heads/rel-1.15.0.zip^M
2025-05-28T04:05:25.1521100Z info PrepareTestData Preparing folders under D:\a\_work\onnxruntime\onnxruntime\js\test\data\node\opset20^M
2025-05-28T04:05:25.1545670Z node:events:502^M
2025-05-28T04:05:25.1546200Z       throw er; // Unhandled 'error' event^M
2025-05-28T04:05:25.1546550Z       ^^M
2025-05-28T04:05:25.1546930Z ^M
2025-05-28T04:05:25.1547111Z Error: read ECONNRESET^M
2025-05-28T04:05:25.1547763Z     at TLSWrap.onStreamRead (node:internal/stream_base_commons:218:20)^M
2025-05-28T04:05:25.1548493Z Emitted 'error' event on ClientRequest instance at:^M
2025-05-28T04:05:25.1548992Z     at emitErrorEvent (node:_http_client:101:11)^M
2025-05-28T04:05:25.1549459Z     at TLSSocket.socketErrorListener (node:_http_client:504:5)^M
2025-05-28T04:05:25.1549988Z     at TLSSocket.emit (node:events:524:28)^M
2025-05-28T04:05:25.1550769Z     at TLSSocket.emit (node:domain:489:12)^M
2025-05-28T04:05:25.1551221Z     at emitErrorNT (node:internal/streams/destroy:169:8)^M
2025-05-28T04:05:25.1551686Z     at emitErrorCloseNT (node:internal/streams/destroy:128:3)^M
2025-05-28T04:05:25.1552272Z     at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {^M
2025-05-28T04:05:25.1552740Z   errno: -4077,^M
2025-05-28T04:05:25.1553079Z   code: 'ECONNRESET',^M
2025-05-28T04:05:25.1553295Z   syscall: 'read'^M
2025-05-28T04:05:25.1553486Z }^M
2025-05-28T04:05:25.1553587Z ^M
2025-05-28T04:05:25.1553670Z Node.js v20.19.2^M

@khoover
Copy link
Contributor Author

khoover commented May 30, 2025

I'd like to think this is ready; WASM is almost certainly unrelated.

@snnn snnn closed this May 31, 2025
@snnn snnn reopened this May 31, 2025
@khoover khoover marked this pull request as draft June 5, 2025 16:21
@khoover khoover marked this pull request as ready for review June 5, 2025 16:21
@khoover
Copy link
Contributor Author

khoover commented Jun 9, 2025

@snnn I've tried to incorporate the changes from #24891 here, but it mostly amounts to removing the at-import env creation and adding a bunch of if (!isEnvInit()) { CreateOrtEnv(); } blocks whenever the env is referenced in a context where it might be uninit.

An alternative would be to allow late-binding global thread pools if the env has been initialized without them. Then the Python bindings can continue creating the env at import time, and the global thread pools can be attached after-the-fact. I don't believe there's any code that would break if we did this.

@khoover khoover force-pushed the add-python-global-thread-pool branch from fa82356 to e93897c Compare June 10, 2025 00:22
@snnn
Copy link
Member

snnn commented Jun 10, 2025

it mostly amounts to removing the at-import env creation and adding a bunch of if (!isEnvInit()) { CreateOrtEnv(); } blocks whenever the env is referenced in a context where it might be uninit.

I don't see there you tried to uninit an OrtEnv. Did I miss something?

@khoover
Copy link
Contributor Author

khoover commented Jun 10, 2025

it mostly amounts to removing the at-import env creation and adding a bunch of if (!isEnvInit()) { CreateOrtEnv(); } blocks whenever the env is referenced in a context where it might be uninit.

I don't see there you tried to uninit an OrtEnv. Did I miss something?

Yeah, if you look at the merge, it also includes all these changes. The two PRs after revert them and go with the post-init global thread pools approach.

@khoover khoover requested a review from snnn June 10, 2025 15:34
@snnn
Copy link
Member

snnn commented Jun 14, 2025

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@snnn snnn merged commit 089c52e into microsoft:main Jun 16, 2025
83 checks passed
@snnn
Copy link
Member

snnn commented Jun 16, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants