New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC/WIP] NVPTX/Cuda backend support in Julia #11516

Closed
wants to merge 9 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@vchuravy
Member

vchuravy commented Jun 1, 2015

Goal

The long term goal is to enable support for compiling Julia to different accelerator backends like NVPTX and SPIR and in the future SPIR-V and HSAIL. This PR is a rebase of the work done by @maleadt at maleadt/master onto the current Julia master.
It enables the CUDA backend and in combination with CUDA.jl it can compile simple CUDA kernels written in Julia as described in the blog of maleadt

This is still in the early stage and I would like to invite comments and recommendations for how to proceed further.

Components

Target

edf0693 enables the CUDA target, target selection with @target and code compilation. This is the center point of this PR. I would like to discuss the approach taken and how we could scale that to several different backends.

llvmcall

10434e6 enhances the capabilities of llvmcall as described in #8740. This is necessary to be able to call the NVPTX intrinsics.

bitcasts

Julia ignores address space and to make backends work that use different address space our bitcast needs to respect the address space. See #9423 for more information on the matter.

TODO:

  • Tests
    • Should the test compare IR output or can we set up a test architecture with gpuocelot
  • https://github.com/maleadt/julia/blob/master/TODO.md
    • Refuse GC allocations
    • Refuse calling unknown functions
    • Allow codegen of ordinary arrays (auto_unbox?)
    • Allow multiple active codegen contexts. We could possibly generalize this,
      i.e. have 1 single context for host codegen, multiple ones (stack?) for PTX
      code, and select one in emit_function based on the active/parent target.
    • use @target to refuse cross-target calls
    • Implicit return nothing, and check for non-void kernels
    • Remove the old codegen exceptions for Array values (in the past we lowered
      normal Arrays to raw pointers, now we use an immutable CuDeviceArray for
      that purpose).

Credits

Most of the work has been done by @maleadt and Pieter Verstraete see https://github.com/maleadt/julia/compare/master for a proper history. My part in the work was mostly squashing and rebasing the code onto upstream Julia, since @maleadt announced that he won't have time to work on this (see post on julia-dev). I plan to continue working on this to get SPIR/SPIR-V support. Thanks to @SimonDanisch for helping with testing and motivating me.

How to try this

If you don't have a CUDA capable card you will need to use gpuocelot like described in http://blog.maleadt.net/2015/01/15/julia-cuda. Substitute CUDA.jl with my fork at https://github.com/vchuravy/CUDA.jl/tree/vc/fixes

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Jun 1, 2015

Member

The travis errors stem from me using LLVM 3.5.0. I will test with LLVM 3.3.0

Member

vchuravy commented Jun 1, 2015

The travis errors stem from me using LLVM 3.5.0. I will test with LLVM 3.3.0

@grollinger

This comment has been minimized.

Show comment
Hide comment
@grollinger

grollinger Jun 3, 2015

Over the past several months, I have been working on integrating the HSA Runtime with julia for my master's thesis. The goal is to provide a wrapper library similar to OpenCL.jl and CUDA.jl to simplify usage of HSA from julia, as well as to integrate compilation of julia kernel functions to HSAIL/BRIG Targets.

So far, I have mostly concentrated on implementing the HSA.jl Runtime Wrapper and integrating the julia build system with the HSA development version of LLVM

I am now starting to implement the compilation of julia functions to HSAIL.

It looks like the compilation to HSAIL is fairly straightforward, if what you have as an input is SPIR 2.0.
SPIR could also be reused in targeting OpenCL. For these reasons, the plan for my prototype is to implement the (most important parts of the) SPIR Target and, from there, the compilation to HSAIL.

Since you are already looking to build a scalable infrastructure to compile to multiple device targets, I wanted to let you know of my work now.

Maybe we could come up with something that I can already use in my implementation.

grollinger commented Jun 3, 2015

Over the past several months, I have been working on integrating the HSA Runtime with julia for my master's thesis. The goal is to provide a wrapper library similar to OpenCL.jl and CUDA.jl to simplify usage of HSA from julia, as well as to integrate compilation of julia kernel functions to HSAIL/BRIG Targets.

So far, I have mostly concentrated on implementing the HSA.jl Runtime Wrapper and integrating the julia build system with the HSA development version of LLVM

I am now starting to implement the compilation of julia functions to HSAIL.

It looks like the compilation to HSAIL is fairly straightforward, if what you have as an input is SPIR 2.0.
SPIR could also be reused in targeting OpenCL. For these reasons, the plan for my prototype is to implement the (most important parts of the) SPIR Target and, from there, the compilation to HSAIL.

Since you are already looking to build a scalable infrastructure to compile to multiple device targets, I wanted to let you know of my work now.

Maybe we could come up with something that I can already use in my implementation.

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Jun 4, 2015

Member

@Rollingthunder That is great to hear. So this PR should give you a hint how to integrate an additional target into the Julia backend. Also you might find the modified 'llvmcall` useful.

Why do you need to use the SPIR target? I would think that you enable HSAIL as a target similar to CUDA in this PR and don't need the step in between.

Member

vchuravy commented Jun 4, 2015

@Rollingthunder That is great to hear. So this PR should give you a hint how to integrate an additional target into the Julia backend. Also you might find the modified 'llvmcall` useful.

Why do you need to use the SPIR target? I would think that you enable HSAIL as a target similar to CUDA in this PR and don't need the step in between.

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Jun 4, 2015

Contributor

Is there a feature-detection macro from LLVM to determine whether this target is enabled? I imagine some system copies of llvm might not enable it?

Contributor

tkelman commented Jun 4, 2015

Is there a feature-detection macro from LLVM to determine whether this target is enabled? I imagine some system copies of llvm might not enable it?

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Jun 5, 2015

Member

@tkelman There is llvm/Config/Targets.def, but I haven't found out how to turn that into a simple #ifdef. So you can do stuff like below:

enum llvm_targets {
#define LLVM_TARGET(TargetName) TargetName,
#include "llvmc/Config/Targets.def"
#undef LLVM_TARGET
};
Member

vchuravy commented Jun 5, 2015

@tkelman There is llvm/Config/Targets.def, but I haven't found out how to turn that into a simple #ifdef. So you can do stuff like below:

enum llvm_targets {
#define LLVM_TARGET(TargetName) TargetName,
#include "llvmc/Config/Targets.def"
#undef LLVM_TARGET
};
@grollinger

This comment has been minimized.

Show comment
Hide comment
@grollinger

grollinger Jun 5, 2015

The reason I plan to go through SPIR on the way to HSAIL is that the SPIR intrinsics are well documented and there is a mapping from SPIR intrinsics to HSAIL intrinsics (builtins-hsail.ll).

HSAIL LLVM intrinsics are not really documented anywhere as far as I can tell.

There may well be HSAIL intrincics that don't have a mapping from SPIR, but I would rather add this subset using llvmcall later than use only HSAIL intrinsics from the beginning.

Also, this way you could conceivably write one kernel function in julia (@target spir maybe) and use that both on OpenCL and on HSA.

grollinger commented Jun 5, 2015

The reason I plan to go through SPIR on the way to HSAIL is that the SPIR intrinsics are well documented and there is a mapping from SPIR intrinsics to HSAIL intrinsics (builtins-hsail.ll).

HSAIL LLVM intrinsics are not really documented anywhere as far as I can tell.

There may well be HSAIL intrincics that don't have a mapping from SPIR, but I would rather add this subset using llvmcall later than use only HSAIL intrinsics from the beginning.

Also, this way you could conceivably write one kernel function in julia (@target spir maybe) and use that both on OpenCL and on HSA.

@tkelman

This comment has been minimized.

Show comment
Hide comment
@tkelman

tkelman Jun 5, 2015

Contributor

This'll need to do something to allow Julia to still build correctly with these features disabled if it's linking against a copy of LLVM that doesn't support this target. I don't see how this could realistically be merged to master otherwise.

Contributor

tkelman commented Jun 5, 2015

This'll need to do something to allow Julia to still build correctly with these features disabled if it's linking against a copy of LLVM that doesn't support this target. I don't see how this could realistically be merged to master otherwise.

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Jun 5, 2015

Member

I agree, I will see what I can come up with.

On Fri, 5 Jun 2015 at 20:25 Tony Kelman notifications@github.com wrote:

This'll need to do something to allow Julia to still build correctly with
these features disabled if it's linking against a copy of LLVM that doesn't
support this target. I don't see how this could realistically be merged to
master otherwise.


Reply to this email directly or view it on GitHub
#11516 (comment).

Member

vchuravy commented Jun 5, 2015

I agree, I will see what I can come up with.

On Fri, 5 Jun 2015 at 20:25 Tony Kelman notifications@github.com wrote:

This'll need to do something to allow Julia to still build correctly with
these features disabled if it's linking against a copy of LLVM that doesn't
support this target. I don't see how this could realistically be merged to
master otherwise.


Reply to this email directly or view it on GitHub
#11516 (comment).

initialize all available llvm targets.
Clients that require a particular target like `NVPTX` have to ensure
that the target is available in the linked llvm and set up the
appropriate context. See `jl_init_codegem_ptx` as an example.
// TODO: a lot of duplication with to_function and init_julia_llvm_env
// TODO: invoke by call()ing a @target("ptx") function
extern "C" DLLEXPORT
const jl_value_t * jl_to_ptx() {

This comment has been minimized.

@vchuravy

vchuravy Jun 10, 2015

Member

So after taking a look at https://github.com/Keno/DIDebug.jl it should be possible to write jl_to_ptx in terms of CXX.jl and keep it out of Julia base.

This would slim down this PR to @target (and maybe @kernel), some special cases for NVPTX, and more or less independent PRs for preserving_bitcast and llvmcall

@vchuravy

vchuravy Jun 10, 2015

Member

So after taking a look at https://github.com/Keno/DIDebug.jl it should be possible to write jl_to_ptx in terms of CXX.jl and keep it out of Julia base.

This would slim down this PR to @target (and maybe @kernel), some special cases for NVPTX, and more or less independent PRs for preserving_bitcast and llvmcall

vchuravy referenced this pull request Jun 20, 2015

@malmaud

This comment has been minimized.

Show comment
Hide comment
@malmaud

malmaud Sep 2, 2015

Contributor

@vchuravy I'm excited to see this line of work pushed forward - are you still interested in working on this PR?

Contributor

malmaud commented Sep 2, 2015

@vchuravy I'm excited to see this line of work pushed forward - are you still interested in working on this PR?

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Sep 3, 2015

Member

@malmaud Yes I am still interested. I am just waiting for v0.4 to be released and the codegen changes merged.

Member

vchuravy commented Sep 3, 2015

@malmaud Yes I am still interested. I am just waiting for v0.4 to be released and the codegen changes merged.

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Sep 3, 2015

Member

A lot of work is pending on the new codegen changes of @vtjnash, and that is probably one of the first things that will happen after 0.4 (or perhaps even on branching 0.4).

Member

ViralBShah commented Sep 3, 2015

A lot of work is pending on the new codegen changes of @vtjnash, and that is probably one of the first things that will happen after 0.4 (or perhaps even on branching 0.4).

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Oct 28, 2015

Member

A few folks have mentioned this PR to me recently. What keeps us from getting it merged now?

Member

ViralBShah commented Oct 28, 2015

A few folks have mentioned this PR to me recently. What keeps us from getting it merged now?

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Oct 28, 2015

Member

@ViralBShah First of all #11604 would need to be rebased and merged (if the implementation is unontroversial) and after that #9423 would need to be rebased and extended (lots of changes happened since then).

After these changes landed this can move forward, the general questions that would need to be answered whether this should live inside Julia base or if backends should be implemented in associated packages. In the future we would like to add more and more backends and target specific logic should imho not be part of Julia. The only exception I would make is for a very general backend like SPIR-V that could potentially transcend target specific backends.

I hope that explains the current situation. I could rebase #11604 over the weekend.

PS: This also depends on switching to llvm 3.5+

Member

vchuravy commented Oct 28, 2015

@ViralBShah First of all #11604 would need to be rebased and merged (if the implementation is unontroversial) and after that #9423 would need to be rebased and extended (lots of changes happened since then).

After these changes landed this can move forward, the general questions that would need to be answered whether this should live inside Julia base or if backends should be implemented in associated packages. In the future we would like to add more and more backends and target specific logic should imho not be part of Julia. The only exception I would make is for a very general backend like SPIR-V that could potentially transcend target specific backends.

I hope that explains the current situation. I could rebase #11604 over the weekend.

PS: This also depends on switching to llvm 3.5+

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Oct 28, 2015

Member

Thank you for outlining what needs to be done. I do like the idea of adding additional backends as packages. I guess we are not too far from llvm 3.7 now, but @Keno and others know best.

Member

ViralBShah commented Oct 28, 2015

Thank you for outlining what needs to be done. I do like the idea of adding additional backends as packages. I guess we are not too far from llvm 3.7 now, but @Keno and others know best.

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Oct 28, 2015

Member

We would need some infrastructure in base though. I remeber @jakebolewski talking about the need for a Julia llvm interface in JuliaGPU/OpenCL.jl#29, but maybe that is more feasible now with CXX.jl

Member

vchuravy commented Oct 28, 2015

We would need some infrastructure in base though. I remeber @jakebolewski talking about the need for a Julia llvm interface in JuliaGPU/OpenCL.jl#29, but maybe that is more feasible now with CXX.jl

@maleadt

This comment has been minimized.

Show comment
Hide comment
@maleadt

maleadt Oct 28, 2015

Member

I've very recently also been spending some time on the back-end again (see maleadt/julia and maleadt/CUDA.jl), but some code generation changes are holding me back. I'm currently trying to get the commit before the mega codegen restructure working, on LLVM 3.7.

Member

maleadt commented Oct 28, 2015

I've very recently also been spending some time on the back-end again (see maleadt/julia and maleadt/CUDA.jl), but some code generation changes are holding me back. I'm currently trying to get the commit before the mega codegen restructure working, on LLVM 3.7.

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Oct 28, 2015

Member

Tim, nice to hear from you. What do you think would be necessary for having
such backends outside of Julia proper?

On Wed, 28 Oct 2015 at 23:37 Tim Besard notifications@github.com wrote:

I've very recently also been spending some time on the back-end again (see
maleadt/julia https://github.com/maleadt/julia/commits/master and
maleadt/CUDA.jl https://github.com/maleadt/CUDA.jl/commits/master), but
some code generation changes are holding me back. I'm currently trying to
get the commit before the mega codegen restructure working, on LLVM 3.7.


Reply to this email directly or view it on GitHub
#11516 (comment).

Member

vchuravy commented Oct 28, 2015

Tim, nice to hear from you. What do you think would be necessary for having
such backends outside of Julia proper?

On Wed, 28 Oct 2015 at 23:37 Tim Besard notifications@github.com wrote:

I've very recently also been spending some time on the back-end again (see
maleadt/julia https://github.com/maleadt/julia/commits/master and
maleadt/CUDA.jl https://github.com/maleadt/CUDA.jl/commits/master), but
some code generation changes are holding me back. I'm currently trying to
get the commit before the mega codegen restructure working, on LLVM 3.7.


Reply to this email directly or view it on GitHub
#11516 (comment).

@maleadt

This comment has been minimized.

Show comment
Hide comment
@maleadt

maleadt Oct 28, 2015

Member

Despite not being happy with the current if (ctx.target == PTX) proliferation, I haven't given it too much thought. We obviously would need a decent Julia interface to LLVM (would the C interface suffice here?). Maybe emit_function could conditionally (say, based on @target) try and load registered emit_function_for_$TARGET functions? I guess this would depend on the codegen utility functions being more modular than they are right now, and emit_function to be somewhat less monolithic, both of which are quite an undertaking in itself.

Member

maleadt commented Oct 28, 2015

Despite not being happy with the current if (ctx.target == PTX) proliferation, I haven't given it too much thought. We obviously would need a decent Julia interface to LLVM (would the C interface suffice here?). Maybe emit_function could conditionally (say, based on @target) try and load registered emit_function_for_$TARGET functions? I guess this would depend on the codegen utility functions being more modular than they are right now, and emit_function to be somewhat less monolithic, both of which are quite an undertaking in itself.

@maleadt

This comment has been minimized.

Show comment
Hide comment
@maleadt

maleadt Oct 29, 2015

Member

Specific to the CUDA back-end, it would also be useful to have Julia support for pointer address spaces. How feasible would it be to parametrize Ptr in terms of the address space, with the default parameter value being 0 to preserve current behaviour? I had a go at it ages ago, but messing with Ptr seemed to break... a lot.

Member

maleadt commented Oct 29, 2015

Specific to the CUDA back-end, it would also be useful to have Julia support for pointer address spaces. How feasible would it be to parametrize Ptr in terms of the address space, with the default parameter value being 0 to preserve current behaviour? I had a go at it ages ago, but messing with Ptr seemed to break... a lot.

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Nov 2, 2015

Member

@Rollingthunder You have been working on an HSA/SPIR backend. Would you mind sharing your experiences and what you would need from base?

Member

vchuravy commented Nov 2, 2015

@Rollingthunder You have been working on an HSA/SPIR backend. Would you mind sharing your experiences and what you would need from base?

@ViralBShah ViralBShah added the codegen label Nov 2, 2015

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Nov 17, 2015

Member

SPIR-V 1.0 was just released [1] and Khronos provides some LLVM extensions that translate OCL2.0 and OCL2.1 to SPIR-V [2]. When I find time I will focus on address-space aware bitcast and a Julia to SPIR-V translator.

[1] https://www.khronos.org/spir
[2] https://github.com/KhronosGroup/SPIRV-LLVM

Member

vchuravy commented Nov 17, 2015

SPIR-V 1.0 was just released [1] and Khronos provides some LLVM extensions that translate OCL2.0 and OCL2.1 to SPIR-V [2]. When I find time I will focus on address-space aware bitcast and a Julia to SPIR-V translator.

[1] https://www.khronos.org/spir
[2] https://github.com/KhronosGroup/SPIRV-LLVM

@grollinger

This comment has been minimized.

Show comment
Hide comment
@grollinger

grollinger Dec 5, 2015

@vchuravy
Some of the major issues with HSAIL were:

  • The lack of support for address spaces in julia.
    For example, when we were emitting an HSAIL kernel, we would have liked to force its signature to use pointer arguments in the global address space and then have the codegen use these types when generating the function body.
    That didn't work, because there are places in the cg where (even with the preserving_bitcast PR) non-generic AS values do bad things.
  • A way to prevent the boxing of values and the associated GC frame emission.
    Since we cannot translate that to device code, any kernel that triggers boxing cannot be compiled.
    Just suppressing any GC frame code might already help, though obviously the caller of the GPU kernel would need to take care of keeping a reference to memory that is being used.

Other things like how to handle the device code generators need solving too, but it seems that is already being talked about: JuliaGPU/meta#2

grollinger commented Dec 5, 2015

@vchuravy
Some of the major issues with HSAIL were:

  • The lack of support for address spaces in julia.
    For example, when we were emitting an HSAIL kernel, we would have liked to force its signature to use pointer arguments in the global address space and then have the codegen use these types when generating the function body.
    That didn't work, because there are places in the cg where (even with the preserving_bitcast PR) non-generic AS values do bad things.
  • A way to prevent the boxing of values and the associated GC frame emission.
    Since we cannot translate that to device code, any kernel that triggers boxing cannot be compiled.
    Just suppressing any GC frame code might already help, though obviously the caller of the GPU kernel would need to take care of keeping a reference to memory that is being used.

Other things like how to handle the device code generators need solving too, but it seems that is already being talked about: JuliaGPU/meta#2

@maleadt

This comment has been minimized.

Show comment
Hide comment
@maleadt

maleadt Dec 5, 2015

Member

From what I've seen about gpucc, they've developed an address space inference pass to propagate AS information from e.g. the function's entry point, so that might be solvable without much changes to Julia itself. Still, decent language support might be interesting.

Member

maleadt commented Dec 5, 2015

From what I've seen about gpucc, they've developed an address space inference pass to propagate AS information from e.g. the function's entry point, so that might be solvable without much changes to Julia itself. Still, decent language support might be interesting.

@vchuravy

This comment has been minimized.

Show comment
Hide comment
@vchuravy

vchuravy Dec 7, 2015

Member

Maybe instead of changing the pointer interface we could adapt the Ref interface?

Member

vchuravy commented Dec 7, 2015

Maybe instead of changing the pointer interface we could adapt the Ref interface?

@maleadt maleadt closed this Jul 1, 2016

@maleadt maleadt deleted the JuliaGPU:vc/cuda branch Jul 1, 2016

@ViralBShah ViralBShah added the gpu label Sep 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment