-
Notifications
You must be signed in to change notification settings - Fork 225
test: fix overly restrictive cufile skipping logic #1271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
|
/ok to test |
1 similar comment
|
/ok to test |
|
Turns out these tests are never run on
|
|
The logic I have here is I believe correct, but I'm not sure if these tests are just broken, or this is a latent bug that we've been living with for a while. |
Tangential: Yeah, I ran into that twice with the ctk-next testing. SWQA was the first to run in environments where they are not skipped, but I didn't think of checking first if they actually ran in the cuda-python CI and started debugging painstakingly assuming they did. It'd be awesome to have some sort of automatic flagging, after all jobs finish, to report which tests
|
|
|
It's really an issue of incorrect skipping logic, not whether a thing runs in CI. This should have been running, because the runners have a compatible file system. This wasn't running, but it was unintentionally not running. |
|
Given all the failures happen on 13.0.2, and the 12.9.x tests all pass, that seems like a good place to start digging. |
|
Ah, well those are not running the cuda.bindings tests, so that wouldn't explain the failure. |
|
/ok to test |
1 similar comment
|
/ok to test |
|
Let's give Sourab some time. I pinged him offline and he's looking into it. |
Ping'd you both in an internal thread. |
|
What's blocking this PR? This PR is only about fixing the testing problem, and it shouldn't be blocked on any fixes/problems that were discovered here. This PR isn't changing any existing behavior, except that now the tests that were skipped when they should've been running (because the skipping criteria checking function was broken) are now xfailing. We can merge this PR without waiting for cufile fixes, and we can continue the discussion on that elsewhere. |
15cbb8a to
40450d2
Compare
|
/ok to test |
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blocking part is that the new xfail_handle_register decorator was not needed and it is puzzling. It is unclear to me what triggered this need. Can we please engage in the internal thread and understand it better before merging?
…em root mount can be different from subdirectories of the root
40450d2 to
0cd211f
Compare
|
/ok to test |
1 similar comment
|
/ok to test |
d27a08f to
0cd211f
Compare
|
/ok to test |
0cd211f to
27b3825
Compare
|
/ok to test |
27b3825 to
778239c
Compare
|
/ok to test |

This PR enables testing cufile on a larger variety of filesystems where the root filesystem (
/) might have a different mount point than the mount point of the checkout directory, by looking at all of/proc/mountsinstead of only the first matching mount point.