Description
Describe the issue
The correct CudaPinned allocator from CUDA EP's CudaPinned (pointing the correct CUDAPinnedAllocator) IS replaced with "buggy" CudaPinned allocator (pointing CPUAllocator)!
SessionState before applying env. shared allocators:
SessionState right after applying env. shared allocators:
This happens because the environment holds CudaPinned allocator pointing CPUAllocator backend device_allocator.
Here's my code used to register CudaPinned allocator:
void RegisterCudaPinnedEnvAllocator(OrtApi& api, OrtEnv* env)
{
nvtxRangePush("RegisterCudaPinnedEnvAllocator");
OrtMemoryInfo* cudaPinnedMemoryInfo;
ASSERT_ORT_STATUS(api.CreateMemoryInfo("CudaPinned", OrtArenaAllocator, 0, OrtMemTypeCPUOutput, &cudaPinnedMemoryInfo));
OrtArenaCfg* cudaPinnedArenaConfig;
ASSERT_ORT_STATUS(api.CreateArenaCfg(0, ArenaExtendStrategy::kNextPowerOfTwo, -1, -1, &cudaPinnedArenaConfig));
// This creates an ORT-internal allocator instance and registers it in the environment for sharing
vector<const char*> keys, values;
ASSERT_ORT_STATUS(api.CreateAndRegisterAllocatorV2(env, "CPUExecutionProvider", cudaPinnedMemoryInfo, cudaPinnedArenaConfig, keys.data(), values.data(), 0));
api.ReleaseArenaCfg(cudaPinnedArenaConfig);
api.ReleaseMemoryInfo(cudaPinnedMemoryInfo);
nvtxRangePop();
}
This is how I revealed the issue:
This is without shared env. allocators:
To reproduce
void RegisterCudaPinnedEnvAllocator(OrtApi& api, OrtEnv* env)
{
nvtxRangePush("RegisterCudaPinnedEnvAllocator");
OrtMemoryInfo* cudaPinnedMemoryInfo;
ASSERT_ORT_STATUS(api.CreateMemoryInfo("CudaPinned", OrtArenaAllocator, 0, OrtMemTypeCPUOutput, &cudaPinnedMemoryInfo));
OrtArenaCfg* cudaPinnedArenaConfig;
ASSERT_ORT_STATUS(api.CreateArenaCfg(0, ArenaExtendStrategy::kNextPowerOfTwo, -1, -1, &cudaPinnedArenaConfig));
// This creates an ORT-internal allocator instance and registers it in the environment for sharing
vector<const char*> keys, values;
ASSERT_ORT_STATUS(api.CreateAndRegisterAllocatorV2(env, "CPUExecutionProvider", cudaPinnedMemoryInfo, cudaPinnedArenaConfig, keys.data(), values.data(), 0));
api.ReleaseArenaCfg(cudaPinnedArenaConfig);
api.ReleaseMemoryInfo(cudaPinnedMemoryInfo);
nvtxRangePop();
}
I tried to search in ORT repo for any clues or similar issues but found nothing.
Urgency
Urgent! Directly impacts performance when shared environment allocators are activated.
Platform
Windows
OS Version
11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8
Model File
No response
Is this a quantized model?
Unknown