Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CUDA an optional dependency #253

Merged
merged 6 commits into from
Oct 14, 2023

Conversation

mohamed82008
Copy link
Contributor

@mohamed82008 mohamed82008 commented Oct 13, 2023

This PR makes CUDA an optional dependency using Julia 1.9 extensions. It is a breaking a change for Julia < 1.9 since I remove CUDA as a dependency completely. So users of Julia <1.9 will have to keep using the old Metalhead if they need CUDA support. If you would rather go the Requires-path, let me know and I can do that. For simplicity, I did not do that initially. I also went ahead and did a version bump in the Project.toml.

Closes #250

@ToucheSir
Copy link
Member

Keeping CUDA as a hard dep for Julia <1.9 would eliminate most of the breakage.

@mohamed82008
Copy link
Contributor Author

Keeping CUDA as a hard dep for Julia <1.9 would eliminate most of the breakage.

You can already get that by installing Metalhead <= 0.8. I can make the compat bound for the Julia version in Metalhead 0.9 to be Julia 1.9. That way users will never install Metalhead 0.9 (this PR) with Julia < 1.9.

@mohamed82008
Copy link
Contributor Author

Even if I used Requires for Julia < 1.9 and extensions for Julia >= 1.9, this PR would still be a breaking change in Julia < 1.9 because users now have to add and load CUDA themselves. So I think it is best to set a lower bound as Julia 1.9 and only use the new extensions instead of using both extensions and Requires.

@ToucheSir
Copy link
Member

My point was that Pkg already supprts this hard to weak fallback path: https://pkgdocs.julialang.org/v1/creating-packages/#Transition-from-normal-dependency-to-extension. Segmenting Julia compat by major versions means yet another repo we have to commit to making backports for, and I think we have enough of those already on the FluxML side...

@mohamed82008
Copy link
Contributor Author

Done. I didn't know that we could have optional optional dependencies without Requires.

@mohamed82008
Copy link
Contributor Author

This PR is ready. The same docs failure also happens on master so it should be fixed in a separate PR imo.

@mohamed82008
Copy link
Contributor Author

The buildkite test failure is odd. It might be related to this PR since it is different from master.

@mohamed82008
Copy link
Contributor Author

mohamed82008 commented Oct 14, 2023

The buildkite failure is only for Julia 1.6 where it is hitting a timeout.

@mohamed82008
Copy link
Contributor Author

It is unclear to me why there is a regression in that test. The same test in master fails after 2 h 45 min. The limit is 3 hours so it could be resource load fluctuations on the CI machine that caused it to take > 15 min extra and timeout. Can someone restart the buildkite test?

Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs failure is because neither Documenter nor any of the other deps in https://github.com/FluxML/Metalhead.jl/blob/master/docs/Project.toml have compat bounds. Like you said, that's a problem for another PR. The Buildkite timeouts are also mostly flakiness. I think the main factor is how long it takes to compile everything. Since stable and nightly pass, this LGTM and we can dig further if the failures persist post-merge. Thanks again!

@ToucheSir ToucheSir merged commit eb3f9a4 into FluxML:master Oct 14, 2023
25 of 27 checks passed
@mohamed82008
Copy link
Contributor Author

Thanks for the merge. Can I get a new release now?

@ToucheSir
Copy link
Member

I think changing a hard dep to a weak one is technically a breaking change, but that doesn't preclude us putting out a new version. @FluxML/committers WDYT?

@mohamed82008
Copy link
Contributor Author

In this case, I argue that it is not breaking. No one can use the CuArray methods without loading CUDA first. If you load CUDA, then the extension will be loaded in Julia >= 1.9. Julia < 1.9 should be unaffected.

@darsnack
Copy link
Member

Isn't it breaking since code that previously worked now does not without adding using CUDA? I'm fine making a breaking release.

@mohamed82008
Copy link
Contributor Author

Which code that previously worked? Please write an example. Such code would have to use CUDA to define CuArray?

@mohamed82008
Copy link
Contributor Author

Either way I would appreciate a release asap.

@darsnack
Copy link
Member

Pseudocode because I'm on mobile:

using Metalhead
using Flux

model = Metalhead.resnet(...; dropblock_prob = 0.1) |> Flux.gpu
x = Flux.ones32(224, 224, 3, 10) |> Flux.gpu
model(x)

That would need using CUDA somewhere to behave the same after someone tries to upgrade Metalhead/Julia. With old versions, CUDA would get implicitly loaded by Flux.

I'm not against a release immediately FYI. Just suggesting that the change is technically breaking, but I'm fine releasing a patch as well (if that's what everyone decides is correct).

@mohamed82008
Copy link
Contributor Author

mohamed82008 commented Oct 15, 2023

With the latest released version of Metalhead, did Flux.gpu just rely on Metalhead loading CUDA? What if someone calls Flux.gpu without loading Metalhead or any other package that loads CUDA internally? Is it no-op? Does running computation on the CPU instead of the GPU count as breakage? If so, I think this is arguably Flux's fault for not documenting that you need to load CUDA explicitly before using Flux.gpu to get CuArrays since the workspace is not guaranteed to have CUDA loaded indirectly. If indeed the above code pattern is common now and people just rely on Metalhead loading CUDA internally, then a breaking release indeed makes more sense not to break people's codes.

@ToucheSir
Copy link
Member

What if someone calls Flux.gpu without loading Metalhead or any other package that loads CUDA internally? Is it no-op?

It's a no-op with a warning. Thankfully this was documented even before CUDA.jl was made into a weak dep, so the big change we made there was to add a more in-depth note about GPU support.

Does running computation on the CPU instead of the GPU count as breakage? ...
If indeed the above code pattern is common now and people just rely on Metalhead loading CUDA internally, then a breaking release indeed makes more sense not to break people's codes.

This is the grey area we're trying to figure out live. I would lean towards the warning Flux gives users as sufficient to compensate for the change, but I'm also aware that people often don't read our warnings because they keep posting bug reports about them...

Does changing the deps loaded by a package count as a SemVer breaking change in the age of extensions? I'm not sure, I find there is a dearth of resources on this and it would be nice to have advice from members of the community who've thought about it in depth.

@mohamed82008
Copy link
Contributor Author

mohamed82008 commented Oct 15, 2023

In general there are 2 definitions of breaking change floating around:

  • A change that breaks a documented behaviour.
  • A change that breaks an expected behaviour, documented or not.

When documentation is limited and packages interact in complicated ways, like they do in Julia, the latter is probably the safer bet. But it requires getting to know your users' usage patterns. On the other hand, I have seen the first definition used to justify breakages even in Julia the language itself, e.g. breaking the assumption that tasks are sticky in multi-threading.

I say to be safe, let's just go ahead and make a breaking release. A breaking release that doesn't break anyone's code is better than a patch release that breaks even 1 person's code.

@ToucheSir
Copy link
Member

Is there anything quick we want to get in before cutting a major release? I don't see or recall any changes, but pinging @theabhirath just to be sure.

@theabhirath
Copy link
Member

I just opened #254, which would be nice to sneak in before cutting a major release, but that should be about it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make CUDA an optional dependency in Julia 1.9
4 participants