Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make NNlibCUDA an extension #492

Merged
merged 11 commits into from Jun 14, 2023
Merged

make NNlibCUDA an extension #492

merged 11 commits into from Jun 14, 2023

Conversation

CarloLucibello
Copy link
Member

supercedes #481

Project.toml Outdated

[deps]
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
Atomix = "a9b6321e-bd34-4604-b9c9-b65b8de01458"
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make cuDNN a strong dependence because it seems that there is no way to make using CUDA also trigger the loading of cuDNN from the extension. This is not ideal but the alternatives seem to be worse:

  • have the extension triggered by using CUDA, cuDNN
  • keep NNlibCUDA as a separate package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some other option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it a strong dep is a non-starter because cuDNN directly depends on CUDA.jl. I would say we go with option 1, but there are a couple variations on it we could also consider:

  1. Make only cuDNN a weak dep and access CUDA.jl through cuDNN.CUDA.
  2. Create separate extensions for CUDA.jl and cuDNN. Then someone can choose to only load the former if they don't need e.g. conv functionality. This doesn't make anything easier so we should probably consider it after an extension is in place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should go with using CUDA, cuDNN. This will also carry over to Flux's usage.
I'm annoyed by portability issues for scripts. Ideally, it should be possible to run the same script on any machine without any edit and using the appropriate device, so something like:

using Flux

gpu_backend = ... # Some hardware detection utility or Preferences.jl magic?

 if gpu_backend == "cuda"
  using CUDA, cuDNN
elseif gpu_backend == "amdgpu"
  using AMDGPU
elseif gpu_backend == "metal"
  using Metal
end

Even better, the whole loading should be conditionally done by flux itself, but I don't think this can be done without hard dependencies.

Anyways, since this is a crucial decision, let's also ask @mcabbott and @darsnack if they are ok with having the CUDA extension here be triggered by using CUDA, cuDNN.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like using Flux, CUDA, cuDNN, it seems a huge shame to have to load two packages, one of which is a super-obscure internal thing. I mean I don't even know where it lives, https://www.google.com/search?q=cuDNN.jl leads me only to one abandoned 5 years ago.

It's a huge shame to give up on using Flux, CUDA as the interface. I understand that the default use of new package extensions does not allow us to then load cuDNN. I wonder if we should seriously consider either hacks for now (can Requires or something like it load cuDNN?) or finding out if upstream can be changed e.g. to unify cuDNN into CUDA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some packages might want to expose GPU-accelerated functionality without users having to depend on either CUDA or CUDNN. With Preferences, the user environment would then need to include CUDA (i.e. in the Project.toml) in order to set the CUDNN preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does JuliaPackaging/Preferences.jl#24 work for package extension dependencies? The example seems to imply that packages can set preferences for their dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. In any case, that example also shows how the active project needs to have A as a hard dependency, either in [deps] or in [extras], which is the point I was making above.

That said, although I'm happy to consider alternative ways of loading CUDNN-like functionality, I don't see it happening soon. Without a first-class language feature and using Preferences.jl, it would require users to import CUDA.jl to enable the CUDNN features, which IMO is mostly the same as having them do using cuDNN. And even with a first-class feature where, say, packages could express in their Project.toml which features they request of a package, it doesn't seem clear how that would interact with package extensions (what if a user has CUDA.jl but not CUDA.jl+CUDNN, etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By sheer coincidence, I noticed JuliaPackaging/Preferences.jl#53 was reported today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm currently too busy working on other things to consider reworking the CUDA.jl/cuDNN.jl situation again (especially now that it just stabilized a bit after introducing JLLs), but I'm not opposed to changes. So if anybody would want to explore a different mechanism for shipping CUDA libraries, feel free to open an issue or PR.

@CarloLucibello
Copy link
Member Author

I propose we move on here with using NNlib, CUDA, cuDNN. NNlib is more geared at package developers than users, so having to specify both CUDA and cuDNN doesn't seem a big deal.

What should happen with Flux can be discussed elsewhere, the alternatives I see are either using Flux, CUDA, cuDNN or using FluxCUDA

@CarloLucibello
Copy link
Member Author

it should be ready for merge

@CarloLucibello
Copy link
Member Author

@ToucheSir merge?

plugins:
- JuliaCI/julia#v1:
version: "1.6"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, we have no way to tell if changes to NNlib break NNlibCUDA on Julia <1.9. So either we add something like I mentioned in #495 (comment) or we create a separate Reverse CI step/pipeline for NNlibCUDA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the issue here. Changes to NNlib won't effect julia < 1.9 users since NNlib is julia >= 1.9 from now on

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backports, if there is ever gonna be one, we can test locally.

Copy link
Member

@ToucheSir ToucheSir Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that Project.toml also bumped Julia compat. To be clear then, merging this would mean we're stopping feature development for Julia <1.9 and now maintaining two backport branches (one for NNlib and one for NNlibCUDA)? I recall there being mixed success with backport branches in FluxML before, are there any lessons learned from that so that this doesn't run into the same issues? cc @FluxML/committers

Edit: I misspoke about the two backport branches. Depending on how we want to handle extensions in Flux, this may require three backport branches, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear then, merging this would mean we're stopping feature development for Julia <1.9 and now maintaining two backport branches (one for NNlib and one for NNlibCUDA)?

we are stopping development for julia < 1.9 and we will maintain backport branches in case we feel the need and care enough about backporting something, which I consider an unlikely event. In expectation, the benefits far outweigh the drawbacks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 vote for moving to 1.9 soon!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case I'll cast my vote for this too. The rest of the PR LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants