-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CUDA an optional dependency #253
Conversation
Keeping CUDA as a hard dep for Julia <1.9 would eliminate most of the breakage. |
You can already get that by installing Metalhead <= 0.8. I can make the compat bound for the Julia version in Metalhead 0.9 to be Julia 1.9. That way users will never install Metalhead 0.9 (this PR) with Julia < 1.9. |
Even if I used Requires for Julia < 1.9 and extensions for Julia >= 1.9, this PR would still be a breaking change in Julia < 1.9 because users now have to add and load CUDA themselves. So I think it is best to set a lower bound as Julia 1.9 and only use the new extensions instead of using both extensions and Requires. |
My point was that Pkg already supprts this hard to weak fallback path: https://pkgdocs.julialang.org/v1/creating-packages/#Transition-from-normal-dependency-to-extension. Segmenting Julia compat by major versions means yet another repo we have to commit to making backports for, and I think we have enough of those already on the FluxML side... |
Done. I didn't know that we could have optional optional dependencies without Requires. |
This PR is ready. The same docs failure also happens on master so it should be fixed in a separate PR imo. |
The buildkite test failure is odd. It might be related to this PR since it is different from master. |
The buildkite failure is only for Julia 1.6 where it is hitting a timeout. |
It is unclear to me why there is a regression in that test. The same test in master fails after 2 h 45 min. The limit is 3 hours so it could be resource load fluctuations on the CI machine that caused it to take > 15 min extra and timeout. Can someone restart the buildkite test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs failure is because neither Documenter nor any of the other deps in https://github.com/FluxML/Metalhead.jl/blob/master/docs/Project.toml have compat bounds. Like you said, that's a problem for another PR. The Buildkite timeouts are also mostly flakiness. I think the main factor is how long it takes to compile everything. Since stable and nightly pass, this LGTM and we can dig further if the failures persist post-merge. Thanks again!
Thanks for the merge. Can I get a new release now? |
I think changing a hard dep to a weak one is technically a breaking change, but that doesn't preclude us putting out a new version. @FluxML/committers WDYT? |
In this case, I argue that it is not breaking. No one can use the |
Isn't it breaking since code that previously worked now does not without adding |
Which code that previously worked? Please write an example. Such code would have to use CUDA to define CuArray? |
Either way I would appreciate a release asap. |
Pseudocode because I'm on mobile:
That would need I'm not against a release immediately FYI. Just suggesting that the change is technically breaking, but I'm fine releasing a patch as well (if that's what everyone decides is correct). |
With the latest released version of Metalhead, did |
It's a no-op with a warning. Thankfully this was documented even before CUDA.jl was made into a weak dep, so the big change we made there was to add a more in-depth note about GPU support.
This is the grey area we're trying to figure out live. I would lean towards the warning Flux gives users as sufficient to compensate for the change, but I'm also aware that people often don't read our warnings because they keep posting bug reports about them... Does changing the deps loaded by a package count as a SemVer breaking change in the age of extensions? I'm not sure, I find there is a dearth of resources on this and it would be nice to have advice from members of the community who've thought about it in depth. |
In general there are 2 definitions of breaking change floating around:
When documentation is limited and packages interact in complicated ways, like they do in Julia, the latter is probably the safer bet. But it requires getting to know your users' usage patterns. On the other hand, I have seen the first definition used to justify breakages even in Julia the language itself, e.g. breaking the assumption that tasks are sticky in multi-threading. I say to be safe, let's just go ahead and make a breaking release. A breaking release that doesn't break anyone's code is better than a patch release that breaks even 1 person's code. |
Is there anything quick we want to get in before cutting a major release? I don't see or recall any changes, but pinging @theabhirath just to be sure. |
I just opened #254, which would be nice to sneak in before cutting a major release, but that should be about it! |
This PR makes CUDA an optional dependency using Julia 1.9 extensions.
It is a breaking a change for Julia < 1.9 since I remove CUDA as a dependency completely. So users of Julia <1.9 will have to keep using the old Metalhead if they need CUDA support. If you would rather go theRequires
-path, let me know and I can do that. For simplicity, I did not do that initially. I also went ahead and did a version bump in the Project.toml.Closes #250