-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU and sparse: I/O and performance optimizations #946
Comments
While waiting for cupy to compile (sigh) I had a look at how we could use unified memory for our GPU support. cuSignal has support for this, but it is just a small wrapper around the numba mapped_array function I still want to benchmark this, but based on this, we could implement GPU-awareness into the I/O backend:
If this doesn't pan out, we can think about how to integrate multiple GPU streams and manually manage overlapping of transfers and compute. |
Bumping milestone since performance impact can be large for both sparse data and GPU data, and it is growing in importance. |
Part of the equation, including initial benchmarks for transfers: LiberTEM/sparseconverter#20 |
GPU support is still experimental, and needs to be looked at from a performance perspective. Transfers can generally overlap computation, and unified memory could be used. The tiling scheme should also be negotiated appropriately - this needs to be benchmarked.
Edit by @uellue:
Sparse input data support #1207 also requires re-organizing and optimizing tile shape negotiation and execution plan based on the UDFs that will run, array backend that will be used and resources that are available on different kinds of workers. This issue can be extended to the generic case of optimizing settings for different backends.
The text was updated successfully, but these errors were encountered: