Open
Description
Describe the issue
I have shared env. allocators in my program:
// Create shared GPU allocator (registered in env.)
try
{
ASSERT_CUDA_STATUS(cudaSetDevice(gpuNum));
static Ort::MemoryInfo memoryInfo("Cuda", OrtArenaAllocator, gpuNum, OrtMemTypeDefault);
static Ort::ArenaCfg arenaConfig(0, ArenaExtendStrategy::kSameAsRequested, -1, -1);
// This creates an ORT-internal allocator instance and registers it in the environment for sharing
vector<const char*> keys, values;
ASSERT_ORT_STATUS(api.CreateAndRegisterAllocatorV2(*_env, "CUDAExecutionProvider", memoryInfo, arenaConfig, keys.data(), values.data(), 0));
NN_LOG(NN_INFO) << "Created ORT-internal shared GPU memory allocator for GPU device " << to_string(gpuNum + 1);
}
catch (const exception& e)
{
THROW_EXCEPTION_WITH_EX("Could not create an ORT-internal shared GPU memory allocator", e);
}
The session is defined like this:
Ort::SessionOptions sessionOptions;
sessionOptions.AddConfigEntry(kOrtSessionOptionsUseDeviceAllocatorForInitializers, "0"); // Allocate initializers via Arena allocator
sessionOptions.AddConfigEntry(kOrtSessionOptionsConfigUseEnvAllocators, "1"); // Use allocators registered in the env instead of per session
As it turned out, initializers are loaded using session-local allocators, and only then replaced with env. allocators (
): // now that we have all the execution providers, create the session state
session_state_ = std::make_unique<SessionState>(
model_->MainGraph(),
execution_providers_,
GetIntraOpThreadPoolToUse(),
GetInterOpThreadPoolToUse(),
data_transfer_mgr_,
external_data_loader_mgr_,
*session_logger_,
session_profiler_,
session_options_,
prepacked_weights_container_);
bool use_env_allocators =
session_options_.config_options.GetConfigOrDefault(kOrtSessionOptionsConfigUseEnvAllocators, "0") == "1";
if (use_env_allocators) {
LOGS(*session_logger_, INFO) << "This session will use the allocator registered with the environment.";
session_state_->UpdateAllocatorsWithEnvAllocators(environment_.GetRegisteredSharedAllocators());
}
Session state constructor:
In my case, the session-local allocator is not configured, because I expect to use the same global allocator for all sessions.
In addition, the logic of creating session-local allocator, then replacing it with global one and releasing the local one, is unnecessary and error-prone since the global allocator is meant to be used across all sessions anyway.
To reproduce
Configures the env. and session as described above.
Urgency
No response
Platform
Windows
OS Version
11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.21
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response