-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make NNlibCUDA an extension #492
Changes from 1 commit
18726be
0c20230
0a8628e
ff149c2
0c5d8f1
e665445
7274a65
60f25f4
e555e68
53d288f
af6c8e1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,12 @@ | ||
name = "NNlib" | ||
uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd" | ||
version = "0.8.20" | ||
version = "0.9.0" | ||
|
||
[deps] | ||
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" | ||
Atomix = "a9b6321e-bd34-4604-b9c9-b65b8de01458" | ||
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" | ||
cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had to make
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there some other option? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Making it a strong dep is a non-starter because
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also think we should go with using Flux
gpu_backend = ... # Some hardware detection utility or Preferences.jl magic?
if gpu_backend == "cuda"
using CUDA, cuDNN
elseif gpu_backend == "amdgpu"
using AMDGPU
elseif gpu_backend == "metal"
using Metal
end Even better, the whole loading should be conditionally done by flux itself, but I don't think this can be done without hard dependencies. Anyways, since this is a crucial decision, let's also ask @mcabbott and @darsnack if they are ok with having the CUDA extension here be triggered by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really don't like It's a huge shame to give up on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some packages might want to expose GPU-accelerated functionality without users having to depend on either CUDA or CUDNN. With Preferences, the user environment would then need to include CUDA (i.e. in the Project.toml) in order to set the CUDNN preference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does JuliaPackaging/Preferences.jl#24 work for package extension dependencies? The example seems to imply that packages can set preferences for their dependencies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure. In any case, that example also shows how the active project needs to have That said, although I'm happy to consider alternative ways of loading CUDNN-like functionality, I don't see it happening soon. Without a first-class language feature and using Preferences.jl, it would require users to import CUDA.jl to enable the CUDNN features, which IMO is mostly the same as having them do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By sheer coincidence, I noticed JuliaPackaging/Preferences.jl#53 was reported today. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, I'm currently too busy working on other things to consider reworking the CUDA.jl/cuDNN.jl situation again (especially now that it just stabilized a bit after introducing JLLs), but I'm not opposed to changes. So if anybody would want to explore a different mechanism for shipping CUDA libraries, feel free to open an issue or PR. |
||
GPUArraysCore = "46192b85-c4d5-4398-a991-12ede77f4527" | ||
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c" | ||
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" | ||
|
@@ -16,19 +17,25 @@ Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" | |
|
||
[weakdeps] | ||
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e" | ||
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba" | ||
|
||
[extensions] | ||
NNlibAMDGPUExt = "AMDGPU" | ||
NNlibCUDAExt = "CUDA" | ||
|
||
[compat] | ||
AMDGPU = "0.4.8" | ||
Adapt = "2, 3.2" | ||
Atomix = "0.1" | ||
ChainRulesCore = "1.13" | ||
CUDA = "4" | ||
cuDNN = "1" | ||
GPUArraysCore = "0.1" | ||
KernelAbstractions = "0.9.2" | ||
Requires = "0.5, 1.0" | ||
julia = "1.6" | ||
julia = "1.9" | ||
|
||
[extras] | ||
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e" | ||
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba" | ||
|
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
module NNlibCUDA | ||
module NNlibCUDAExt | ||
|
||
using NNlib | ||
using CUDA, cuDNN | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# CTC loss moved from Flux.jl to NNlib + NNlibCUDA | ||
# CTC loss moved from Flux.jl to NNlib | ||
|
||
## CPU implementation | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
@testset "Batchnorm" begin | ||
v = CUDA.rand(Float32, 2) | ||
m = CUDA.rand(Float32, 2, 5) | ||
|
||
@testset for training in (true, false), track_stats in (true, false) | ||
kws = (training=training, track_stats=track_stats) | ||
|
||
# Normal | ||
batchnorm(v, v, m, v, v, 1.0; kws...) | ||
∇batchnorm(v, v, m, m, v, v, 1.0; kws...) | ||
|
||
# No affine | ||
batchnorm(nothing, nothing, m, v, v, 1.0; kws...) | ||
∇batchnorm(nothing, nothing, m, m, v, v, 1.0; kws...) | ||
|
||
# No tracking | ||
batchnorm(v, v, m, nothing, nothing, 1.0; kws...) | ||
∇batchnorm(v, v, m, m, nothing, nothing, 1.0; kws...) | ||
|
||
# Both or neither tracked or affine params must be set | ||
for (α, β) in ((v, nothing), (nothing, v)) | ||
@test_throws MethodError batchnorm(α, β, m, v, v, 1.0; kws...) | ||
@test_throws MethodError ∇batchnorm(α, β, m, m, v, v, 1.0; kws...) | ||
@test_throws ArgumentError batchnorm(v, v, m, α, β, 1.0; kws...) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, we have no way to tell if changes to NNlib break NNlibCUDA on Julia <1.9. So either we add something like I mentioned in #495 (comment) or we create a separate Reverse CI step/pipeline for NNlibCUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the issue here. Changes to NNlib won't effect julia < 1.9 users since NNlib is julia >= 1.9 from now on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backports, if there is ever gonna be one, we can test locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that Project.toml also bumped Julia compat. To be clear then, merging this would mean we're stopping feature development for Julia <1.9 and now maintaining two backport branches (one for NNlib and one for NNlibCUDA)? I recall there being mixed success with backport branches in FluxML before, are there any lessons learned from that so that this doesn't run into the same issues? cc @FluxML/committers
Edit: I misspoke about the two backport branches. Depending on how we want to handle extensions in Flux, this may require three backport branches, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are stopping development for julia < 1.9 and we will maintain backport branches in case we feel the need and care enough about backporting something, which I consider an unlikely event. In expectation, the benefits far outweigh the drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 vote for moving to 1.9 soon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in that case I'll cast my vote for this too. The rest of the PR LGTM.