ONNX I/O Binding #20743

suhailes1 · 2024-05-21T09:40:07Z

Describe the issue

I need to bind tensor input and output using I/O Binding for ONNX runtime model. but i didn't get output. the output tensor return NULL pointer. I will attached the code below.

I checked the input tensor data and shape as well as output tensor. but i am getting NULL pointer. How can i solve the issue. anyone have the experience with I/O Binding. please give me a tip for solving this.

To reproduce

// this is a function. session passed as an argument to the function
std::vectorOrt::Value input_tensors;
std::vectorOrt::Value output_tensors;
std::vector<const char*> input_node_names_c_str;
std::vector<const char*> output_node_names_c_str;
int64_t input_height = input_node_dims[0].at(2);
int64_t input_width = input_node_dims[0].at(3);

// // Pass gpu_graph_id to RunOptions through RunConfigs
Ort::RunOptions run_option;
// gpu_graph_id is optional if the session uses only one cuda graph
run_option.AddConfigEntry("gpu_graph_id", "1");

// Dimension expansion [CHW -> NCHW]
std::vector<int64_t> input_tensor_shape = {1, 3, input_height, input_width};
std::vector<int64_t> output_tensor_shape = {1, 300, 84};
size_t input_tensor_size = vector_product(input_tensor_shape);
size_t output_tensor_size = vector_product(output_tensor_shape);
std::vector input_tensor_values(p_blob, p_blob + input_tensor_size);

Ort::IoBinding io_binding{session};
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);

input_tensors.push_back(Ort::Value::CreateTensor(
memory_info, input_tensor_values.data(), input_tensor_size,
input_tensor_shape.data(), input_tensor_shape.size()
));

// Check if input and output node names are empty
for (const auto& inputNodeName : input_node_names) {
if (std::string(inputNodeName).empty()) {
std::cerr << "Empty input node name found." << std::endl;
}
}

// format conversion
for (const auto& inputName : input_node_names) {
input_node_names_c_str.push_back(inputName.c_str());
}

for (const auto& outputName : output_node_names) {
output_node_names_c_str.push_back(outputName.c_str());
}

io_binding.BindInput(input_node_names_c_str[0], input_tensors[0]);

Ort::MemoryInfo output_mem_info{"Cuda", OrtDeviceAllocator, 0,
OrtMemTypeDefault};

cudaMalloc(&output_data_ptr, output_tensor_size * sizeof(float));
output_tensors.push_back(Ort::Value::CreateTensor(
output_mem_info, static_cast<float*>(output_data_ptr),output_tensor_size,
output_tensor_shape.data(),output_tensor_shape.size()));

io_binding.BindOutput(output_node_names_c_str[0], output_tensors[0]);
session.Run(run_option, io_binding);

//Get output results
auto* rawOutput = output_tensors[0].GetTensorData();
cout<<rawOutput<<endl; //suhail
cudaFree(output_data_ptr); //suhail
std::vector<int64_t> outputShape = output_tensors[0].GetTensorTypeAndShapeInfo().GetShape();
for(auto i:outputShape){cout<<i<<" ";} cout<<endl; //suhail
size_t count = output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount();
cout<<count<<endl; //suhail
std::vector output(rawOutput, rawOutput + count);

Urgency

I need to confirm the session.run() overhead issue solved by using I/O Binding in my project.

Platform

Linux

OS Version

0 I need to bind tensor input and output using I/O Binding for ONNX runtime model. but i didn't get output. the output tensor return NULL pointer. I will attached the code below.

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime-linux-x64-gpu-1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.04

Model File

No response

Is this a quantized model?

No

yuslepukhin · 2024-05-21T18:05:37Z

Where do you populate input/output names?

suhailes1 · 2024-05-21T18:13:17Z

Hi yuslepukhin,
please check below snippet. i got the input and output name from there

Ort::Session session(env, model_path.c_str(), session_options);

// Create an allocator object based on default options to provide memory allocation functions for subsequent operations
Ort::AllocatorWithDefaultOptions allocator;

// Get the number of input nodes
size_t num_input_nodes = session.GetInputCount();

// Get the number of output nodes
size_t num_output_nodes = session.GetOutputCount();

// Get input node name and dimensions
for (int i = 0; i < num_input_nodes; i++) {
    auto input_name = session.GetInputNameAllocated(i, allocator);
    input_node_names.push_back(input_name.get());
    Ort::TypeInfo input_typeinfo = session.GetInputTypeInfo(i);
    auto input_tensorinfo = input_typeinfo.GetTensorTypeAndShapeInfo();
    auto input_dims = input_tensorinfo.GetShape();

    ONNXTensorElementDataType inputType = input_tensorinfo.GetElementType();

    if (input_dims.at(0) == IMR_ERROR)
    {
        std::cout << "[Warning] Got dynamic batch size. Setting output batch size to "
                << BATCH_SIZE << "." << std::endl;
        input_dims.at(0) = BATCH_SIZE;
    }

    input_node_dims.push_back(input_dims);

    std::cout << "[INFO] Input name and shape is: " << input_name.get() << " [";
    for (size_t j = 0; j < input_dims.size(); j++) {
        std::cout << input_dims[j];
        if (j != input_dims.size()-1) {
            std::cout << ",";
        }
    }
    std::cout << ']' << std::endl;
}

// Get output node name
std::vector <vector <int64_t>> output_node_dims;
for (int i = 0; i < num_output_nodes; i++) {
    auto output_name = session.GetOutputNameAllocated(i, allocator);
    output_node_names.push_back(output_name.get());
    Ort::TypeInfo output_typeinfo = session.GetOutputTypeInfo(i);
    auto output_tensorinfo = output_typeinfo.GetTensorTypeAndShapeInfo();
    auto output_dims = output_tensorinfo.GetShape();

    if (output_dims.at(0) == IMR_ERROR)
    {
        std::cout << "[Warning] Got dynamic batch size. Setting output batch size to "
                << BATCH_SIZE << "." << std::endl;
        output_dims.at(0) = BATCH_SIZE;
    }

    output_node_dims.push_back(output_dims);

    std::cout << "[INFO] Output name and shape is: " << output_name.get() << " [";
    for (size_t j = 0; j < output_dims.size(); j++) {
        std::cout << output_dims[j];
        if (j != output_dims.size()-1) {
            std::cout << ",";
        }
    }
    std::cout << ']' << std::endl;
}

suhailes1 · 2024-05-21T18:16:23Z

I refer the code from git. due to my project needs CUDA performance i need to add I/O Binding. so i gone through the reference from the [https://onnxruntime.ai/docs/performance/tune-performance/iobinding.html] ONNX Runtime site.
is there anything i miss from the I/O Binding logic?

tianleiwu · 2024-05-22T15:10:57Z

(1) Cuda graph requires inputs binded to a fixed buffer in GPU (so you will need copy input to the same address in GPU memory for every run). In your case, the input is binded to CPU:

Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);

(2)
Memory is deleted before reading the data to a vector:

cudaFree(output_data_ptr); //suhail
...
std::vector output(rawOutput, rawOutput + count);

Please follow the following examples to use IO/Binding and CUDA Graph:

onnxruntime/onnxruntime/test/shared_lib/test_inference.cc

Line 2074 in 068bb3d

TEST(CApiTest, io_binding_cuda) {

onnxruntime/onnxruntime/test/shared_lib/test_inference.cc

Line 2177 in 068bb3d

TEST(CApiTest, basic_cuda_graph) {

suhailes1 · 2024-05-23T11:07:59Z

Hi Tianlei Wu,

Thank u so much. I resolved my issue with the reference which you shared.

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label May 21, 2024

suhailes1 closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX I/O Binding #20743

ONNX I/O Binding #20743

suhailes1 commented May 21, 2024

yuslepukhin commented May 21, 2024

suhailes1 commented May 21, 2024

suhailes1 commented May 21, 2024

tianleiwu commented May 22, 2024 •

edited

suhailes1 commented May 23, 2024

ONNX I/O Binding #20743

ONNX I/O Binding #20743

Comments

suhailes1 commented May 21, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

yuslepukhin commented May 21, 2024

suhailes1 commented May 21, 2024

suhailes1 commented May 21, 2024

tianleiwu commented May 22, 2024 • edited

suhailes1 commented May 23, 2024

tianleiwu commented May 22, 2024 •

edited