-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNX I/O Binding #20743
Comments
Where do you populate input/output names? |
Hi yuslepukhin, Ort::Session session(env, model_path.c_str(), session_options);
|
I refer the code from git. due to my project needs CUDA performance i need to add I/O Binding. so i gone through the reference from the [https://onnxruntime.ai/docs/performance/tune-performance/iobinding.html] ONNX Runtime site. |
(1) Cuda graph requires inputs binded to a fixed buffer in GPU (so you will need copy input to the same address in GPU memory for every run). In your case, the input is binded to CPU:
(2)
Please follow the following examples to use IO/Binding and CUDA Graph:
|
Hi Tianlei Wu, Thank u so much. I resolved my issue with the reference which you shared. |
Describe the issue
I need to bind tensor input and output using I/O Binding for ONNX runtime model. but i didn't get output. the output tensor return NULL pointer. I will attached the code below.
I checked the input tensor data and shape as well as output tensor. but i am getting NULL pointer. How can i solve the issue. anyone have the experience with I/O Binding. please give me a tip for solving this.
To reproduce
// this is a function. session passed as an argument to the function
std::vectorOrt::Value input_tensors;
std::vectorOrt::Value output_tensors;
std::vector<const char*> input_node_names_c_str;
std::vector<const char*> output_node_names_c_str;
int64_t input_height = input_node_dims[0].at(2);
int64_t input_width = input_node_dims[0].at(3);
// // Pass gpu_graph_id to RunOptions through RunConfigs
Ort::RunOptions run_option;
// gpu_graph_id is optional if the session uses only one cuda graph
run_option.AddConfigEntry("gpu_graph_id", "1");
// Dimension expansion [CHW -> NCHW]
std::vector<int64_t> input_tensor_shape = {1, 3, input_height, input_width};
std::vector<int64_t> output_tensor_shape = {1, 300, 84};
size_t input_tensor_size = vector_product(input_tensor_shape);
size_t output_tensor_size = vector_product(output_tensor_shape);
std::vector input_tensor_values(p_blob, p_blob + input_tensor_size);
Ort::IoBinding io_binding{session};
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
input_tensors.push_back(Ort::Value::CreateTensor(
memory_info, input_tensor_values.data(), input_tensor_size,
input_tensor_shape.data(), input_tensor_shape.size()
));
// Check if input and output node names are empty
for (const auto& inputNodeName : input_node_names) {
if (std::string(inputNodeName).empty()) {
std::cerr << "Empty input node name found." << std::endl;
}
}
// format conversion
for (const auto& inputName : input_node_names) {
input_node_names_c_str.push_back(inputName.c_str());
}
for (const auto& outputName : output_node_names) {
output_node_names_c_str.push_back(outputName.c_str());
}
io_binding.BindInput(input_node_names_c_str[0], input_tensors[0]);
Ort::MemoryInfo output_mem_info{"Cuda", OrtDeviceAllocator, 0,
OrtMemTypeDefault};
cudaMalloc(&output_data_ptr, output_tensor_size * sizeof(float));
output_tensors.push_back(Ort::Value::CreateTensor(
output_mem_info, static_cast<float*>(output_data_ptr),output_tensor_size,
output_tensor_shape.data(),output_tensor_shape.size()));
io_binding.BindOutput(output_node_names_c_str[0], output_tensors[0]);
session.Run(run_option, io_binding);
//Get output results
auto* rawOutput = output_tensors[0].GetTensorData();
cout<<rawOutput<<endl; //suhail
cudaFree(output_data_ptr); //suhail
std::vector<int64_t> outputShape = output_tensors[0].GetTensorTypeAndShapeInfo().GetShape();
for(auto i:outputShape){cout<<i<<" ";} cout<<endl; //suhail
size_t count = output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount();
cout<<count<<endl; //suhail
std::vector output(rawOutput, rawOutput + count);
Urgency
I need to confirm the session.run() overhead issue solved by using I/O Binding in my project.
Platform
Linux
OS Version
0 I need to bind tensor input and output using I/O Binding for ONNX runtime model. but i didn't get output. the output tensor return NULL pointer. I will attached the code below.
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
onnxruntime-linux-x64-gpu-1.15.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.04
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: