-
Notifications
You must be signed in to change notification settings - Fork 214
chore(dev-deps): add pixi setup for cuda_pathfinder
#1090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
cuda_pathfinder/pyproject.toml
Outdated
"nvidia-cufftmp-cu12; sys_platform != 'win32'", | ||
"nvidia-libmathdx-cu12", | ||
] | ||
cu12-ext = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is splitting out -ext
important technically?
In the long run, it's possible that some of the libs will add support for win32
. What would we do then? Move those to the non-ext
group?
I'd have a preference for keeping nvidia
in the names, e.g. nvidia_cu12
or nvidia-cu12
. In the back of my mind: 1. It's more explicit/readable. 2. Maybe/maybe-not we'll have non-nvidia wheels in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is splitting out -ext important technically?
Yes, because pixi cannot solve the environment's dependencies without it. Pixi doesn't know anything about markers, for example.
In the long run, it's possible that some of the libs will add support for win32. What would we do then? Move those to the non-ext group?
Yep.
I'd have a preference for keeping nvidia in the names, e.g. nvidia_cu12 or nvidia-cu12. In the back of my mind: 1. It's more explicit/readable. 2. Maybe/maybe-not we'll have non-nvidia wheels in the future.
I find the nvidia
unnecessarily verbose. I don't see what help this additional information provides that makes development easier or anything more clear. On the other hand, more typing is concretely more annoying :)
Can we wait for those wheels to show up before solving a problem we don't yet have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is very important to me. I realize I have a reputation to err on the more explicit side, but I'm holding on to the believe it's actually helpful for people not immersed in the context. Imagine someone not on our team looking at cu12
here, there isn't much they can mentally attach to. Such super stingy naming forces people to look around more than they have to otherwise. Similarly, it's much more likely to send AIs down a rabbit hole. It's also much more difficult to pin-point references to super short names, especially cu12
or similar will have many unrelated matches.
How about (not sure about underscore vs minus, any works for me): nvidia_wheels_cu12
Also, ext
doesn't tell anything about the reason for the splitting. At the moment I understand only: we need this for pixi. What could be terse hints that make sense here? pixi
is probably too specific? Does win32
make sense? Other things to come to mind: restricted, partially, platform? But I don't know enough about the technicalities enforcing the splitting to know what make sense; just "ext" leaves people with "for unspecified reasons split out", which I'd strongly prefer to avoid.
I'm hoping we can find a healthy compromise, that gives readers/AIs a reasonable chance to get on the right track at a glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a preference for underscores versus dashes.
Such super stingy naming forces people to look around more than they have to otherwise.
Perhaps, but let's try and be more concrete. Here's my thinking.
My goal with this PR, and the pixi work in general, is to get devs the ability to run the test suite as soon as humanly possible.
To that end, naming a collection of dependencies nvidia_wheels_cu12
provides no information that furthers that goal that I can see. There are multiple pieces of information in that name that are irrelevant for running the test suite and getting started contributing to the codebase IMO. Since these environments are designed to be run with pixi run -e <environment>
giving them long names makes things more annoying to use. I really really want this stuff to be easy to use regularly. Long names are directly opposed to that goal.
Here's some thoughts on the proposed naming components and some questions.
nvidia
refers to something, I'm not sure what exactly, perhaps the source of the dependencies? How does that help clarify anything except what company built the wheel? Is that information useful for the purposes of development?wheels
refers to a packaging implementation detail that seems like it actively harms initial understanding by providing an unnecessary level of detail. At the level of "run the test suite", I don't really care about whether a thing is a wheel or not. In fact, this is even worse in that you can easily mislead developers if some part of that list is built from source for some reason. Should we then name thingsnvidia_maybe_wheels_if_not_built_from_source
? I really hope not.
Regarding ext
, we can call that extended
, which is what I am trying to convey there. Definitely open to different naming, but I am trying to keep things short for a reason.
I'm being intentionally vague with that name, to allow for eventual elimination of that set of environments. A comment would suffice here IMO.
In general I find the practice of "name everything with very specific and forget about the length of names" actively harms development by making the amount of stuff I have to sift through that much greater. I still have to ask someone what anything means if I'm new to the codebase.
I also don't care about optimizing for AI digestibility. As of now, it's primarily humans that still have to read all this code and there's plenty to do there without having to consider AI. Again, this feels like solving a problem we (as developers of this library) do not have.
How should we proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal with this PR, and the pixi work in general, is to get devs the ability to run the test suite as soon as humanly possible.
That's a good goal.
How should we proceed?
Could you please fix up the .github/workflows/test-wheel-*.yml
files? ― I'm actually not sure what the new commands will look like. Seeing that would help me understand better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding ext, ...
A comment would suffice here IMO.
Yes, I agree. ext
is fine. I see from the changes in the .yml
files and the toolshed/
directory that it doesn't appear there.
I think we still need to update these:
|
I forgot to add: Please don't worry about testing the |
set -euo pipefail | ||
pushd cuda_pathfinder | ||
pip install --only-binary=:all: -v . --group "test_nvidia_wheels_cu${TEST_CUDA_MAJOR}" --group test_nvidia_wheels_host | ||
pip install --only-binary=:all: -v . --group "cu${TEST_CUDA_MAJOR}" --group host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I see that -ext
doesn't appear here, could this work?
pip install --only-binary=:all: -v . --group "test-cu${TEST_CUDA_MAJOR}"
And then in pyproject.toml
we pull in cu12-host
just like we pull in cu12-ext
?
That would be strictly better IMO than the status quo with two --group
arguments.
But TBH I'm scratching my head a bit ... how does the --group
feature work?
What I mean:
cu12 = {...}
cu12-ext = {{ include-group = "cu12" }, ...}
What surprises me: you have cu12
here, not cu12-ext
.
But I'm getting the sense the include-group
feature allows us to have just one --group
here while still having a nice structure in the pyproject.toml
file for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I will use the *-ext
groups wherever we are pip install
-ing things.
cuda_pathfinder/pyproject.toml
Outdated
"nvidia-cufftmp-cu12; sys_platform != 'win32'", | ||
"nvidia-libmathdx-cu12", | ||
] | ||
cu12-ext = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding ext, ...
A comment would suffice here IMO.
Yes, I agree. ext
is fine. I see from the changes in the .yml
files and the toolshed/
directory that it doesn't appear there.
…u12` or `--group test-cu13` to install all test dependencies for the given CTK major version.
I just added commit 0e132de to see what you think. It works for me. Does it also work for pixi?
It's really easy to remember now, e.g. |
It doesn't work for pixi, because the The way I originally had it was the best compromise I could come up with. Ideally pixi would support markers, so it could figure out which platforms to solve for given the platform I'm running on, but that's not the case today. |
Hm, I need to learn how pixi works, and how I can test locally myself; also so I don't break pixi in the future when making changes in pathfinder. At the moment I don't understand why It'd be great if we could meet and you show me how it works. |
…hat the pixi commands still work exactly as before.
…-win-64, and similar for cu13
…, although only superficially
Local testing at commit 9ac1364 (Make
|
/ok to test |
|
Local testing at commit 9ac1364 (Make Testing: NOTE: The linux-aarch64-specific aspect is that
|
I'm starting to test under Windows (still at commit 9ac1364). I installed
Success:
First trial:
This is the dreaded |
That usually means there's some kind of specification that is too broad that is including |
Taking a look. |
Oh, I see. You're on windows, nevermind what I said then! |
Without changing anything, this works:
See below. — Hypothesis(!):
|
…o commit 9ac1364 (with this pixi run -e cu12-win-64 --verbose -- pytest -v tests succeeds).
Commit fa384e3 (Avoid use of I.e.
|
/ok to test |
…13-linux, to resolve a regression observed under linux-64
Unfortunately, commit fa384e3 causes a regression under Fortunately it's an easy fix: commit 2715310 (avoid use of
|
… libs and headers are installed from wheels into site-packages.
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud Ready to merge from my viewpoint!
"nvidia-cublas-cu12", | ||
"nvidia-cuda-cccl-cu12", | ||
"nvidia-cuda-nvcc-cu12", | ||
"nvidia-cuda-nvrtc-cu12", | ||
"nvidia-cuda-runtime-cu12", | ||
"nvidia-cufft-cu12", | ||
"nvidia-curand-cu12", | ||
"nvidia-cusolver-cu12", | ||
"nvidia-cusparse-cu12", | ||
"nvidia-npp-cu12", | ||
"nvidia-nvfatbin-cu12", | ||
"nvidia-nvjitlink-cu12", | ||
"nvidia-nvjpeg-cu12", | ||
"nvidia-cudss-cu12", | ||
"nvidia-cufftmp-cu12; sys_platform != 'win32'", | ||
"nvidia-libmathdx-cu12", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the potential to now end up with different minor versions of these libraries whereas using cuda-toolkit[...]=12.*
would guaranteed that the minor version of cuda-toolkit
that was resolved would control the minor versions of all of the libraries.
cuda_pathfinder/pyproject.toml
Outdated
[tool.pixi.feature.test.tasks] | ||
test = { cmd = ["$CONDA_PREFIX/bin/pytest", "$PIXI_PROJECT_ROOT/tests"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this path correct on windows as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great you asked. I tested locally before, but now double-checking, I was using
pixi run -e cu13-win-64 --verbose -- pytest -v tests
which bypasses [tool.pixi.feature.test.tasks]
.
This path does indeed not work:
PS C:\Users\rgrossekunst\forked\cuda-python\cuda_pathfinder> pixi run -e cu13-win-64 test
✨ Pixi task (test in cu13-win-64): $CONDA_PREFIX/bin/pytest $PIXI_PROJECT_ROOT/tests
Error launching 'C:\Users\rgrossekunst\forked\cuda-python\cuda_pathfinder\.pixi\envs\cu13-win-64/bin/pytest': The system cannot find the path specified. (os error 3)
I changed it back now: commit 052d20c
…ONDA_PREFIX/bin` does not work under Windows).
Add a pixi setup to
cuda_pathfinder
.