make NNlibCUDA an extension #492

CarloLucibello · 2023-05-16T09:01:34Z

supercedes #481

CarloLucibello · 2023-05-16T09:47:26Z

Project.toml


 [deps]
 Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
 Atomix = "a9b6321e-bd34-4604-b9c9-b65b8de01458"
 ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
+cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"


I had to make cuDNN a strong dependence because it seems that there is no way to make using CUDA also trigger the loading of cuDNN from the extension. This is not ideal but the alternatives seem to be worse:

have the extension triggered by using CUDA, cuDNN

keep NNlibCUDA as a separate package

Is there some other option?

Making it a strong dep is a non-starter because cuDNN directly depends on CUDA.jl. I would say we go with option 1, but there are a couple variations on it we could also consider:

Make only cuDNN a weak dep and access CUDA.jl through cuDNN.CUDA.

Create separate extensions for CUDA.jl and cuDNN. Then someone can choose to only load the former if they don't need e.g. conv functionality. This doesn't make anything easier so we should probably consider it after an extension is in place.

I also think we should go with using CUDA, cuDNN. This will also carry over to Flux's usage.
I'm annoyed by portability issues for scripts. Ideally, it should be possible to run the same script on any machine without any edit and using the appropriate device, so something like:

using Flux gpu_backend = ... # Some hardware detection utility or Preferences.jl magic? if gpu_backend == "cuda" using CUDA, cuDNN elseif gpu_backend == "amdgpu" using AMDGPU elseif gpu_backend == "metal" using Metal end

Even better, the whole loading should be conditionally done by flux itself, but I don't think this can be done without hard dependencies.

Anyways, since this is a crucial decision, let's also ask @mcabbott and @darsnack if they are ok with having the CUDA extension here be triggered by using CUDA, cuDNN.

I really don't like using Flux, CUDA, cuDNN, it seems a huge shame to have to load two packages, one of which is a super-obscure internal thing. I mean I don't even know where it lives, https://www.google.com/search?q=cuDNN.jl leads me only to one abandoned 5 years ago.

It's a huge shame to give up on using Flux, CUDA as the interface. I understand that the default use of new package extensions does not allow us to then load cuDNN. I wonder if we should seriously consider either hacks for now (can Requires or something like it load cuDNN?) or finding out if upstream can be changed e.g. to unify cuDNN into CUDA.

Some packages might want to expose GPU-accelerated functionality without users having to depend on either CUDA or CUDNN. With Preferences, the user environment would then need to include CUDA (i.e. in the Project.toml) in order to set the CUDNN preference.

Does JuliaPackaging/Preferences.jl#24 work for package extension dependencies? The example seems to imply that packages can set preferences for their dependencies.

I'm not sure. In any case, that example also shows how the active project needs to have A as a hard dependency, either in [deps] or in [extras], which is the point I was making above.

That said, although I'm happy to consider alternative ways of loading CUDNN-like functionality, I don't see it happening soon. Without a first-class language feature and using Preferences.jl, it would require users to import CUDA.jl to enable the CUDNN features, which IMO is mostly the same as having them do using cuDNN. And even with a first-class feature where, say, packages could express in their Project.toml which features they request of a package, it doesn't seem clear how that would interact with package extensions (what if a user has CUDA.jl but not CUDA.jl+CUDNN, etc).

By sheer coincidence, I noticed JuliaPackaging/Preferences.jl#53 was reported today.

FWIW, I'm currently too busy working on other things to consider reworking the CUDA.jl/cuDNN.jl situation again (especially now that it just stabilized a bit after introducing JLLs), but I'm not opposed to changes. So if anybody would want to explore a different mechanism for shipping CUDA libraries, feel free to open an issue or PR.

CarloLucibello · 2023-06-10T08:15:36Z

I propose we move on here with using NNlib, CUDA, cuDNN. NNlib is more geared at package developers than users, so having to specify both CUDA and cuDNN doesn't seem a big deal.

What should happen with Flux can be discussed elsewhere, the alternatives I see are either using Flux, CUDA, cuDNN or using FluxCUDA

CarloLucibello · 2023-06-11T08:28:44Z

it should be ready for merge

CarloLucibello · 2023-06-13T07:29:30Z

@ToucheSir merge?

ToucheSir · 2023-06-13T19:28:11Z

.buildkite/pipeline.yml

    plugins:
      - JuliaCI/julia#v1:
-          version: "1.6"


Without this, we have no way to tell if changes to NNlib break NNlibCUDA on Julia <1.9. So either we add something like I mentioned in #495 (comment) or we create a separate Reverse CI step/pipeline for NNlibCUDA.

I don't understand the issue here. Changes to NNlib won't effect julia < 1.9 users since NNlib is julia >= 1.9 from now on

For backports, if there is ever gonna be one, we can test locally.

I missed that Project.toml also bumped Julia compat. To be clear then, merging this would mean we're stopping feature development for Julia <1.9 and now maintaining two backport branches (one for NNlib and one for NNlibCUDA)? I recall there being mixed success with backport branches in FluxML before, are there any lessons learned from that so that this doesn't run into the same issues? cc @FluxML/committers

Edit: I misspoke about the two backport branches. Depending on how we want to handle extensions in Flux, this may require three backport branches, right?

To be clear then, merging this would mean we're stopping feature development for Julia <1.9 and now maintaining two backport branches (one for NNlib and one for NNlibCUDA)?

we are stopping development for julia < 1.9 and we will maintain backport branches in case we feel the need and care enough about backporting something, which I consider an unlikely event. In expectation, the benefits far outweigh the drawbacks.

1 vote for moving to 1.9 soon!

Ok, in that case I'll cast my vote for this too. The rest of the PR LGTM.

make NNlibCUDA an extension

18726be

CarloLucibello commented May 16, 2023

View reviewed changes

ToucheSir mentioned this pull request Jun 9, 2023

PackageCompiler fails with Flux on embedded ARM/no GPU FluxML/Flux.jl#2262

Closed

CarloLucibello added 10 commits June 10, 2023 11:47

cuDNN extension

0c20230

update workflows

0a8628e

uncomment when 1.10 is out

ff149c2

cleanup

0c5d8f1

cleanup

e665445

test Project.toml

7274a65

cleanup

60f25f4

rethink extensions

e555e68

batchnorm

53d288f

cleanup

af6c8e1

CarloLucibello mentioned this pull request Jun 11, 2023

move issues from NNlibCUDA.jl #495

Closed

manuelbb-upb mentioned this pull request Jun 13, 2023

fix #484 -- broaden pooling method signatures #485

Merged

2 tasks

ToucheSir reviewed Jun 13, 2023

View reviewed changes

ToucheSir approved these changes Jun 14, 2023

View reviewed changes

CarloLucibello merged commit fbef0d9 into master Jun 14, 2023
12 of 14 checks passed

This was referenced Jun 14, 2023

Re-integrate NNlibCUDA as a package extension #445

Closed

how to make CUDA functionalities an extension FluxML/Flux.jl#2265

Closed

IanButterworth mentioned this pull request Jun 15, 2023

Fix nthreads on 1.9 (failure in the presence of interactive threads) #496

Merged

CarloLucibello deleted the cl/cuda branch June 15, 2023 17:04

CarloLucibello mentioned this pull request Jun 16, 2023

add CUDA extension FluxML/Flux.jl#2268

Merged

ToucheSir mentioned this pull request Jun 17, 2023

[WIP] Move CUDA support to a package extension FluxML/Flux.jl#2132

Closed

3 tasks

mcabbott mentioned this pull request Jun 22, 2023

Better errors for un-implemented functions #427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make NNlibCUDA an extension #492

make NNlibCUDA an extension #492

CarloLucibello commented May 16, 2023

CarloLucibello May 16, 2023

CarloLucibello May 16, 2023

ToucheSir May 16, 2023

CarloLucibello May 23, 2023

mcabbott May 23, 2023

maleadt May 23, 2023

ToucheSir May 23, 2023

maleadt May 24, 2023

ToucheSir May 25, 2023

maleadt Jun 13, 2023

CarloLucibello commented Jun 10, 2023

CarloLucibello commented Jun 11, 2023

CarloLucibello commented Jun 13, 2023

ToucheSir Jun 13, 2023

CarloLucibello Jun 14, 2023

CarloLucibello Jun 14, 2023

ToucheSir Jun 14, 2023 •

edited

CarloLucibello Jun 14, 2023

mcabbott Jun 14, 2023

ToucheSir Jun 14, 2023

make NNlibCUDA an extension #492

make NNlibCUDA an extension #492

Conversation

CarloLucibello commented May 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarloLucibello commented Jun 10, 2023

CarloLucibello commented Jun 11, 2023

CarloLucibello commented Jun 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToucheSir Jun 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToucheSir Jun 14, 2023 •

edited