New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-stream parallel execution with one GPU ERROR #846
Comments
Hello @Jacoobr , thanks for reporting. |
Hi @ttyio, thanks for you reply. I tried to create one 'IExecutionContext' of each cuda stream, the error has not appeared. But i got another error about dynamic profile setting with IExecutionContext like this:
Because i need the Engine can infer different input size image, so i create a optimization profile to let the engine fit different input size image inference with the code bellowed:
when i run with one |
Hello @Jacoobr , thanks for reply. |
Hi @ttyio, At build engine stage, i create only one dynamic shape profile in
And at infer stage, i set two contexts each with different buffers and streams. i use
I don't understand why the profile that i set at build stage can not be paralleled with different context with asynchronous |
Ah, thanks for the code @Jacoobr ,
|
Hello @ttyio, i tried to create two profiles and assign to separate context the error with profile at parallel infer stage just gone. Thank your very much. But the two parallel inference results only the first is right, the second context inference result are not right. What's more, i test the two parallel execution time compared with only one execution, the cost time of two parallel execution are 2x compared with only one execution. So, i think the parallel execution with the two contexts are failed. Would you mind give some advice with parallel execution, or what's wrong with my process?
terminal console:
Appreciated for your any reply! thanks. |
Hello @Jacoobr , For the accuracy issue, I have not run it through, but there is typo in your the code, For the throughput issue, this is case by case for different network, for example, when gpu is already fully occupied with one network, if we kick off another network simultaneously, we won't get higher occupancy. You can play with nsight systems for overall performance, and nsight compute for CUDA kernel performance. Hope this helps, thanks! |
Hi @ttyio. Sorry for the mistake of the code. I fix the typo of the code and the two inference results all are correct. Thank you very much. But the infer time of two parallel executions still take 2x than only one execution, it's not certain the two executions are parallel asynchronously. I will check it later with nsight compute tool. Any way, thank you again. |
I created a network with multiple profiles therefore it can be created more than one context. This is the code:
} My goal is to have multiple contexts that do parallel inference on one GPU. |
He is using CudaMemcpy which is device blocking, use CudaMemcpyAsync instead |
I tried to execute two parallel infer with one context and two stream, but i got the Error bellowed:
the code for executing two parallel infer (the execution of enqueueV2() are asynchronously) with one context and two stream are followed:
Can someone gives me some advice for paralleling infer with the code?
Environment
Windows OS + TensorRT 7.1.3 + one 2060super GPU
The text was updated successfully, but these errors were encountered: