Skip to content
This repository has been archived by the owner on May 27, 2021. It is now read-only.

Support for ptx modules with external functions #2

Closed
MichaelOhlrogge opened this issue Jul 18, 2016 · 3 comments
Closed

Support for ptx modules with external functions #2

MichaelOhlrogge opened this issue Jul 18, 2016 · 3 comments

Comments

@MichaelOhlrogge
Copy link

I'm not certain whether this issue is most appropriately situated in the CUDArt or CUDAdrv repositories or both. I'm posting in both, but will remove it from one or the other if advised so.

I am interesting in having the ability to compile ptx modules that include external functions in them and then import those as functions to use/launch from within Julia. The particular example I was recently working with was for CUBLAS functions, but the principal is far wider. I inquired about the issue on Stack Overflow here. I had thought that it would be relatively manageable, but from the answer I got, it actually sounds like it is relatively complex and involved. On the plus side, it does appear that there are precedents for establishing this kind of capability, e.g. with the JCUDA framework for Java.

I could potentially assist with such an implementation, but I doubt I'd be well positioned to take it on all myself.

Thoughts?

@maleadt
Copy link
Member

maleadt commented Jul 18, 2016

The incremental linking API is not yet wrapped indeed, but that would be a pretty trivial effort. I'll see whether I have some time this week to do so.

I'm not sure what your use case is though, would you be linking against libraries which provide device "object files" (eg. PTX dumps, I haven't looked into how such libraries ship device code)? At least some of those libraries are header-only, which means that after compilation with nvcc your compiled code shouldn't contain any undefined references.

And FYI, @vchuravy is looking into exactly this, but from the CUDAnative side: enabling Julia wrappers for device-side CUDA libraries using Cxx to parse and CUDAnative (+ our modified julia compiler with PTX support over at JuliaGPU/julia) to compile to PTX modules.

@vchuravy
Copy link
Member

@MichaelOhlrogge For a preview how this might eventually look see JuliaGPU/CUDAnative.jl#2

@MichaelOhlrogge
Copy link
Author

@maleadt Great, that would be awesome if you got that running, thank you! I'm also excited to hear about it as a coming development in CUDAnative and @vchuravy 's work on that.

Regarding the use case - the immediate impetus was that I was trying to get a script to run CUSPARSE and CUBLAS calls on multiple GPUs in parallel. Neither of the Julia packages for those CUDA libraries support multiple GPUs at this time, so I was going to create my own .ptx module that contained a function that launched the CUBLAS/CUSPARSE routines directly from CUDA, and then use the launch() feature from CUDArt to separately launch that function on different streams over different GPUs.

I ultimately ended up working out a hack which functions relatively well to allow the Julia CUSPARSE and CUBLAS packages to function over multiple GPUs. I've been in touch with the people maintaining the CUSPARSE GitHub repo and will be working on incorporating the hack into the package. Nevertheless, the ability to more directly control and call those CUSPARSE and CUBLAS functions from .ptx modules where I can more precisely define things like how the streams work, would be nice. (I also think that there'd be a bit of a performance boost over my slightly hacky implementation in those existing Julia packages).

Furthermore, I believe that in the future I will have a use case where I'd want each CUDA core to call a BLAS function, and right now, there'd be no way to do that, as I understand things, without being able to compile a .ptx module that includes a call to a CUBLAS function.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants