Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU and sparse: I/O and performance optimizations #946

Open
sk1p opened this issue Feb 1, 2021 · 3 comments
Open

GPU and sparse: I/O and performance optimizations #946

sk1p opened this issue Feb 1, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@sk1p
Copy link
Member

sk1p commented Feb 1, 2021

GPU support is still experimental, and needs to be looked at from a performance perspective. Transfers can generally overlap computation, and unified memory could be used. The tiling scheme should also be negotiated appropriately - this needs to be benchmarked.

Edit by @uellue:

Sparse input data support #1207 also requires re-organizing and optimizing tile shape negotiation and execution plan based on the UDFs that will run, array backend that will be used and resources that are available on different kinds of workers. This issue can be extended to the generic case of optimizing settings for different backends.

@sk1p sk1p added the enhancement New feature or request label Feb 1, 2021
@sk1p sk1p added this to the 0.7 milestone Feb 1, 2021
@sk1p
Copy link
Member Author

sk1p commented Feb 11, 2021

While waiting for cupy to compile (sigh) I had a look at how we could use unified memory for our GPU support. cuSignal has support for this, but it is just a small wrapper around the numba mapped_array function

I still want to benchmark this, but based on this, we could implement GPU-awareness into the I/O backend:

  • on initializing the backend impl class, set destination to CPU or GPU
  • enabling GPU support also forces the copy mode of the mmap backend
  • then we directly decode the data into the shared CPU/GPU buffer
  • hopefully the driver is clever enough to transfer the data in an efficient way

If this doesn't pan out, we can think about how to integrate multiple GPU streams and manually manage overlapping of transfers and compute.

@uellue uellue modified the milestones: 0.7, 0.8 Apr 12, 2021
@sk1p sk1p modified the milestones: 0.8, backlog Aug 24, 2021
@uellue uellue changed the title GPU: I/O and performance optimizations GPU and sparse: I/O and performance optimizations Feb 9, 2023
@uellue uellue modified the milestones: backlog, 0.11 Feb 9, 2023
@uellue
Copy link
Member

uellue commented Feb 9, 2023

Bumping milestone since performance impact can be large for both sparse data and GPU data, and it is growing in importance.

@sk1p sk1p modified the milestones: 0.11, 0.12 Apr 17, 2023
@sk1p sk1p removed this from the 0.12 milestone Jul 26, 2023
@sk1p
Copy link
Member Author

sk1p commented Aug 30, 2023

Part of the equation, including initial benchmarks for transfers: LiberTEM/sparseconverter#20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants