Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX I/O Binding #20743

Closed
suhailes1 opened this issue May 21, 2024 · 5 comments
Closed

ONNX I/O Binding #20743

suhailes1 opened this issue May 21, 2024 · 5 comments
Labels
ep:CUDA issues related to the CUDA execution provider

Comments

@suhailes1
Copy link

Describe the issue

I need to bind tensor input and output using I/O Binding for ONNX runtime model. but i didn't get output. the output tensor return NULL pointer. I will attached the code below.

I checked the input tensor data and shape as well as output tensor. but i am getting NULL pointer. How can i solve the issue. anyone have the experience with I/O Binding. please give me a tip for solving this.

To reproduce

// this is a function. session passed as an argument to the function
std::vectorOrt::Value input_tensors;
std::vectorOrt::Value output_tensors;
std::vector<const char*> input_node_names_c_str;
std::vector<const char*> output_node_names_c_str;
int64_t input_height = input_node_dims[0].at(2);
int64_t input_width = input_node_dims[0].at(3);

// // Pass gpu_graph_id to RunOptions through RunConfigs
Ort::RunOptions run_option;
// gpu_graph_id is optional if the session uses only one cuda graph
run_option.AddConfigEntry("gpu_graph_id", "1");

// Dimension expansion [CHW -> NCHW]
std::vector<int64_t> input_tensor_shape = {1, 3, input_height, input_width};
std::vector<int64_t> output_tensor_shape = {1, 300, 84};
size_t input_tensor_size = vector_product(input_tensor_shape);
size_t output_tensor_size = vector_product(output_tensor_shape);
std::vector input_tensor_values(p_blob, p_blob + input_tensor_size);

Ort::IoBinding io_binding{session};
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);

input_tensors.push_back(Ort::Value::CreateTensor(
memory_info, input_tensor_values.data(), input_tensor_size,
input_tensor_shape.data(), input_tensor_shape.size()
));

// Check if input and output node names are empty
for (const auto& inputNodeName : input_node_names) {
if (std::string(inputNodeName).empty()) {
std::cerr << "Empty input node name found." << std::endl;
}
}

// format conversion
for (const auto& inputName : input_node_names) {
input_node_names_c_str.push_back(inputName.c_str());
}

for (const auto& outputName : output_node_names) {
output_node_names_c_str.push_back(outputName.c_str());
}

io_binding.BindInput(input_node_names_c_str[0], input_tensors[0]);

Ort::MemoryInfo output_mem_info{"Cuda", OrtDeviceAllocator, 0,
OrtMemTypeDefault};

cudaMalloc(&output_data_ptr, output_tensor_size * sizeof(float));
output_tensors.push_back(Ort::Value::CreateTensor(
output_mem_info, static_cast<float*>(output_data_ptr),output_tensor_size,
output_tensor_shape.data(),output_tensor_shape.size()));

io_binding.BindOutput(output_node_names_c_str[0], output_tensors[0]);
session.Run(run_option, io_binding);

//Get output results
auto* rawOutput = output_tensors[0].GetTensorData();
cout<<rawOutput<<endl; //suhail
cudaFree(output_data_ptr); //suhail
std::vector<int64_t> outputShape = output_tensors[0].GetTensorTypeAndShapeInfo().GetShape();
for(auto i:outputShape){cout<<i<<" ";} cout<<endl; //suhail
size_t count = output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount();
cout<<count<<endl; //suhail
std::vector output(rawOutput, rawOutput + count);

Urgency

I need to confirm the session.run() overhead issue solved by using I/O Binding in my project.

Platform

Linux

OS Version

0 I need to bind tensor input and output using I/O Binding for ONNX runtime model. but i didn't get output. the output tensor return NULL pointer. I will attached the code below.

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime-linux-x64-gpu-1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.04

Model File

No response

Is this a quantized model?

No

@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label May 21, 2024
@yuslepukhin
Copy link
Member

Where do you populate input/output names?

@suhailes1
Copy link
Author

Hi yuslepukhin,
please check below snippet. i got the input and output name from there

Ort::Session session(env, model_path.c_str(), session_options);

// Create an allocator object based on default options to provide memory allocation functions for subsequent operations
Ort::AllocatorWithDefaultOptions allocator;

// Get the number of input nodes
size_t num_input_nodes = session.GetInputCount();

// Get the number of output nodes
size_t num_output_nodes = session.GetOutputCount();

// Get input node name and dimensions
for (int i = 0; i < num_input_nodes; i++) {
    auto input_name = session.GetInputNameAllocated(i, allocator);
    input_node_names.push_back(input_name.get());
    Ort::TypeInfo input_typeinfo = session.GetInputTypeInfo(i);
    auto input_tensorinfo = input_typeinfo.GetTensorTypeAndShapeInfo();
    auto input_dims = input_tensorinfo.GetShape();

    ONNXTensorElementDataType inputType = input_tensorinfo.GetElementType();

    if (input_dims.at(0) == IMR_ERROR)
    {
        std::cout << "[Warning] Got dynamic batch size. Setting output batch size to "
                << BATCH_SIZE << "." << std::endl;
        input_dims.at(0) = BATCH_SIZE;
    }

    input_node_dims.push_back(input_dims);

    std::cout << "[INFO] Input name and shape is: " << input_name.get() << " [";
    for (size_t j = 0; j < input_dims.size(); j++) {
        std::cout << input_dims[j];
        if (j != input_dims.size()-1) {
            std::cout << ",";
        }
    }
    std::cout << ']' << std::endl;
}

// Get output node name
std::vector <vector <int64_t>> output_node_dims;
for (int i = 0; i < num_output_nodes; i++) {
    auto output_name = session.GetOutputNameAllocated(i, allocator);
    output_node_names.push_back(output_name.get());
    Ort::TypeInfo output_typeinfo = session.GetOutputTypeInfo(i);
    auto output_tensorinfo = output_typeinfo.GetTensorTypeAndShapeInfo();
    auto output_dims = output_tensorinfo.GetShape();

    if (output_dims.at(0) == IMR_ERROR)
    {
        std::cout << "[Warning] Got dynamic batch size. Setting output batch size to "
                << BATCH_SIZE << "." << std::endl;
        output_dims.at(0) = BATCH_SIZE;
    }

    output_node_dims.push_back(output_dims);

    std::cout << "[INFO] Output name and shape is: " << output_name.get() << " [";
    for (size_t j = 0; j < output_dims.size(); j++) {
        std::cout << output_dims[j];
        if (j != output_dims.size()-1) {
            std::cout << ",";
        }
    }
    std::cout << ']' << std::endl;
}

@suhailes1
Copy link
Author

I refer the code from git. due to my project needs CUDA performance i need to add I/O Binding. so i gone through the reference from the [https://onnxruntime.ai/docs/performance/tune-performance/iobinding.html] ONNX Runtime site.
is there anything i miss from the I/O Binding logic?

@tianleiwu
Copy link
Contributor

tianleiwu commented May 22, 2024

(1) Cuda graph requires inputs binded to a fixed buffer in GPU (so you will need copy input to the same address in GPU memory for every run). In your case, the input is binded to CPU:

Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);

(2)
Memory is deleted before reading the data to a vector:

cudaFree(output_data_ptr); //suhail
...
std::vector output(rawOutput, rawOutput + count);

Please follow the following examples to use IO/Binding and CUDA Graph:

TEST(CApiTest, io_binding_cuda) {

TEST(CApiTest, basic_cuda_graph) {

@suhailes1
Copy link
Author

Hi Tianlei Wu,

Thank u so much. I resolved my issue with the reference which you shared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

3 participants