[Performance] CudaPinned allocator used wrong backend device_allocator if shared environment allocators are activated

### Describe the issue

The correct CudaPinned allocator from CUDA EP's CudaPinned (pointing the correct CUDAPinnedAllocator) IS replaced with "buggy" CudaPinned allocator (pointing CPUAllocator)!

https://github.com/microsoft/onnxruntime/blob/849eee8370ac9c862682deba32bc5721ca1f994c/onnxruntime/core/session/inference_session.cc#L2106

SessionState before applying env. shared allocators:

![Image](https://github.com/user-attachments/assets/63be23ce-3602-4308-b2a7-57105d2f23a7)

SessionState right after applying env. shared allocators:

![Image](https://github.com/user-attachments/assets/423133a7-977f-4888-a8e3-94c8b33d8442)

This happens because the environment holds CudaPinned allocator pointing CPUAllocator backend device_allocator.

Here's my code used to register CudaPinned allocator:

```
void RegisterCudaPinnedEnvAllocator(OrtApi& api, OrtEnv* env)
{
	nvtxRangePush("RegisterCudaPinnedEnvAllocator");

	OrtMemoryInfo* cudaPinnedMemoryInfo;
	ASSERT_ORT_STATUS(api.CreateMemoryInfo("CudaPinned", OrtArenaAllocator, 0, OrtMemTypeCPUOutput, &cudaPinnedMemoryInfo));

	OrtArenaCfg* cudaPinnedArenaConfig;
	ASSERT_ORT_STATUS(api.CreateArenaCfg(0, ArenaExtendStrategy::kNextPowerOfTwo, -1, -1, &cudaPinnedArenaConfig));

	// This creates an ORT-internal allocator instance and registers it in the environment for sharing
	vector<const char*> keys, values;
	ASSERT_ORT_STATUS(api.CreateAndRegisterAllocatorV2(env, "CPUExecutionProvider", cudaPinnedMemoryInfo, cudaPinnedArenaConfig, keys.data(), values.data(), 0));

	api.ReleaseArenaCfg(cudaPinnedArenaConfig);
	api.ReleaseMemoryInfo(cudaPinnedMemoryInfo);
	nvtxRangePop();
}
```

This is how I revealed the issue:

![Image](https://github.com/user-attachments/assets/3d120c41-f006-4fee-bb0d-325cf555e147)

This is without shared env. allocators:

![Image](https://github.com/user-attachments/assets/212bbd4b-597d-4465-87a1-944f17f84237)

### To reproduce

void RegisterCudaPinnedEnvAllocator(OrtApi& api, OrtEnv* env)
{
	nvtxRangePush("RegisterCudaPinnedEnvAllocator");

	OrtMemoryInfo* cudaPinnedMemoryInfo;
	ASSERT_ORT_STATUS(api.CreateMemoryInfo("CudaPinned", OrtArenaAllocator, 0, OrtMemTypeCPUOutput, &cudaPinnedMemoryInfo));

	OrtArenaCfg* cudaPinnedArenaConfig;
	ASSERT_ORT_STATUS(api.CreateArenaCfg(0, ArenaExtendStrategy::kNextPowerOfTwo, -1, -1, &cudaPinnedArenaConfig));

	// This creates an ORT-internal allocator instance and registers it in the environment for sharing
	vector<const char*> keys, values;
	ASSERT_ORT_STATUS(api.CreateAndRegisterAllocatorV2(env, "CPUExecutionProvider", cudaPinnedMemoryInfo, cudaPinnedArenaConfig, keys.data(), values.data(), 0));

	api.ReleaseArenaCfg(cudaPinnedArenaConfig);
	api.ReleaseMemoryInfo(cudaPinnedMemoryInfo);
	nvtxRangePop();
}

I tried to search in ORT repo for any clues or similar issues but found nothing.

### Urgency

Urgent! Directly impacts performance when shared environment allocators are activated.

### Platform

Windows

### OS Version

11

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

849eee8

### ONNX Runtime API

C++

### Architecture

X64

### Execution Provider

CUDA

### Execution Provider Library Version

CUDA 11.8

### Model File

_No response_

### Is this a quantized model?

Unknown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] CudaPinned allocator used wrong backend device_allocator if shared environment allocators are activated #25211

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] CudaPinned allocator used wrong backend device_allocator if shared environment allocators are activated #25211

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions