Skip to content

[SYSTEMDS-3024] Improve performance by batching data descriptor trans…#1567

Closed
corepointer wants to merge 1 commit into
apache:mainfrom
corepointer:3024_cuda-codegen-batched-transfers
Closed

[SYSTEMDS-3024] Improve performance by batching data descriptor trans…#1567
corepointer wants to merge 1 commit into
apache:mainfrom
corepointer:3024_cuda-codegen-batched-transfers

Conversation

@corepointer
Copy link
Copy Markdown
Contributor

…fers

The spoof cuda operators do several little cudaMemcpy() invocations per operator execution. By transferring all data in one go the overhead can be reduced. In addition, using asynchronous copies can further improve things and are a first step towards using more asynchronicity in the GPU operations.

…fers

The spoof cuda operators do several little cudaMemcpy() invocations per operator execution. By transferring all data in one go the overhead can be reduced. In addition, using asynchronous copies can further improve things and are a first step towards using more asynchronicity in the GPU operations.
@corepointer
Copy link
Copy Markdown
Contributor Author

LGTMS - CUDA codegen tests ran through just fine.

ywcb00 pushed a commit to ywcb00/systemds that referenced this pull request Apr 5, 2022
…fers

The spoof cuda operators do several little cudaMemcpy() invocations per operator execution. By transferring all data in one go the overhead can be reduced. In addition, using asynchronous copies can further improve things and are a first step towards using more asynchronicity in the GPU operations.

Closes apache#1567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant