Skip to content

Initializers use wrong allocator #25108

Open
@AndreyOrb

Description

@AndreyOrb

Describe the issue

I have shared env. allocators in my program:

// Create shared GPU allocator (registered in env.)
try
{
	ASSERT_CUDA_STATUS(cudaSetDevice(gpuNum));

	static Ort::MemoryInfo memoryInfo("Cuda", OrtArenaAllocator, gpuNum, OrtMemTypeDefault);
	static Ort::ArenaCfg arenaConfig(0, ArenaExtendStrategy::kSameAsRequested, -1, -1);

	// This creates an ORT-internal allocator instance and registers it in the environment for sharing
	vector<const char*> keys, values;
	ASSERT_ORT_STATUS(api.CreateAndRegisterAllocatorV2(*_env, "CUDAExecutionProvider", memoryInfo, arenaConfig, keys.data(), values.data(), 0));

	NN_LOG(NN_INFO) << "Created ORT-internal shared GPU memory allocator for GPU device " << to_string(gpuNum + 1);
}
catch (const exception& e)
{
	THROW_EXCEPTION_WITH_EX("Could not create an ORT-internal shared GPU memory allocator", e);
}

The session is defined like this:

Ort::SessionOptions sessionOptions;
sessionOptions.AddConfigEntry(kOrtSessionOptionsUseDeviceAllocatorForInitializers, "0"); // Allocate initializers via Arena allocator
sessionOptions.AddConfigEntry(kOrtSessionOptionsConfigUseEnvAllocators, "1"); // Use allocators registered in the env instead of per session

As it turned out, initializers are loaded using session-local allocators, and only then replaced with env. allocators (

session_state_ = std::make_unique<SessionState>(
):

    // now that we have all the execution providers, create the session state
    session_state_ = std::make_unique<SessionState>(
        model_->MainGraph(),
        execution_providers_,
        GetIntraOpThreadPoolToUse(),
        GetInterOpThreadPoolToUse(),
        data_transfer_mgr_,
        external_data_loader_mgr_,
        *session_logger_,
        session_profiler_,
        session_options_,
        prepacked_weights_container_);

    bool use_env_allocators =
        session_options_.config_options.GetConfigOrDefault(kOrtSessionOptionsConfigUseEnvAllocators, "0") == "1";
    if (use_env_allocators) {
      LOGS(*session_logger_, INFO) << "This session will use the allocator registered with the environment.";
      session_state_->UpdateAllocatorsWithEnvAllocators(environment_.GetRegisteredSharedAllocators());
    }

Session state constructor:

allocators_unique_ptr_ = std::make_unique<AllocatorMap>();

In my case, the session-local allocator is not configured, because I expect to use the same global allocator for all sessions.
In addition, the logic of creating session-local allocator, then replacing it with global one and releasing the local one, is unnecessary and error-prone since the global allocator is meant to be used across all sessions anyway.

To reproduce

Configures the env. and session as described above.

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.21

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiissues related to all other APIs: C, C++, Python, etc.ep:CUDAissues related to the CUDA execution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions