-
-
Notifications
You must be signed in to change notification settings - Fork 19
Support for ptx modules with external functions #2
Comments
The incremental linking API is not yet wrapped indeed, but that would be a pretty trivial effort. I'll see whether I have some time this week to do so. I'm not sure what your use case is though, would you be linking against libraries which provide device "object files" (eg. PTX dumps, I haven't looked into how such libraries ship device code)? At least some of those libraries are header-only, which means that after compilation with And FYI, @vchuravy is looking into exactly this, but from the CUDAnative side: enabling Julia wrappers for device-side CUDA libraries using Cxx to parse and CUDAnative (+ our modified julia compiler with PTX support over at JuliaGPU/julia) to compile to PTX modules. |
@MichaelOhlrogge For a preview how this might eventually look see JuliaGPU/CUDAnative.jl#2 |
@maleadt Great, that would be awesome if you got that running, thank you! I'm also excited to hear about it as a coming development in CUDAnative and @vchuravy 's work on that. Regarding the use case - the immediate impetus was that I was trying to get a script to run CUSPARSE and CUBLAS calls on multiple GPUs in parallel. Neither of the Julia packages for those CUDA libraries support multiple GPUs at this time, so I was going to create my own .ptx module that contained a function that launched the CUBLAS/CUSPARSE routines directly from CUDA, and then use the I ultimately ended up working out a hack which functions relatively well to allow the Julia CUSPARSE and CUBLAS packages to function over multiple GPUs. I've been in touch with the people maintaining the CUSPARSE GitHub repo and will be working on incorporating the hack into the package. Nevertheless, the ability to more directly control and call those CUSPARSE and CUBLAS functions from .ptx modules where I can more precisely define things like how the streams work, would be nice. (I also think that there'd be a bit of a performance boost over my slightly hacky implementation in those existing Julia packages). Furthermore, I believe that in the future I will have a use case where I'd want each CUDA core to call a BLAS function, and right now, there'd be no way to do that, as I understand things, without being able to compile a .ptx module that includes a call to a CUBLAS function. |
I'm not certain whether this issue is most appropriately situated in the CUDArt or CUDAdrv repositories or both. I'm posting in both, but will remove it from one or the other if advised so.
I am interesting in having the ability to compile ptx modules that include external functions in them and then import those as functions to use/launch from within Julia. The particular example I was recently working with was for CUBLAS functions, but the principal is far wider. I inquired about the issue on Stack Overflow here. I had thought that it would be relatively manageable, but from the answer I got, it actually sounds like it is relatively complex and involved. On the plus side, it does appear that there are precedents for establishing this kind of capability, e.g. with the JCUDA framework for Java.
I could potentially assist with such an implementation, but I doubt I'd be well positioned to take it on all myself.
Thoughts?
The text was updated successfully, but these errors were encountered: