Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[To Disable LayerFusion] add copyOutputToHost function for only selected outputs #495

Open
wants to merge 1 commit into
base: release/8.6
Choose a base branch
from

Conversation

cathy-kim
Copy link

@cathy-kim cathy-kim commented Apr 15, 2020

When there is an bug(#380) in layer fusion or you just want to debug layer fusions,
using mark_output is the only solution to disable layer fusion now(#252).
https://github.com/NVIDIA/TensorRT/issues/252#issuecomment-577468499

But the latency increases because to use copyOutputToHost brings unintended outputs from device to host.
https://github.com/NVIDIA/TensorRT/issues/252#issuecomment-593716265

Therefore, added two functions in buffers.h to bring the selected outputs only(by index) to host.

@rajeevsrao rajeevsrao changed the base branch from master to release/8.6 March 16, 2023 05:41
//!
//! \brief Copy the selected contents of output device buffers by bidingIdx to output host buffers synchronously.
//!
void copyOutputToHost(const std::vector<int>& bindingIndices)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

To follow our coding convention, it would be great if you can make the following modifications:

  • Use east-const style (i.e. const std::vector<int>& -> std::vector<int> const&) for all occurrences of const.
  • Use int32_t instead of int.
  • Use 4-whitespace indentation instead of 2-whitespace indentation.

We appreciate it!

const cudaMemcpyKind memcpyType = cudaMemcpyDeviceToHost;
if (!mEngine->bindingIsInput(i))
{
CHECK(cudaMemcpy(dstPtr, srcPtr, byteSize, memcpyType, stream));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudaMemcpy should be cudaMemcpyAsync

{
for (int i : bindingIndices)
{
void* dstPtr = mManagedBuffers[i]->hostBuffer.data();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move these inside the if(){....} clause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants