-
Notifications
You must be signed in to change notification settings - Fork 33
[CI] Move tests on aarch64 linux to GitHub Actions #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Queue is quite long at the moment, we aren't going to gain much for the time being 🥲 |
|
A lot of people might be trying to use it too. Let's give a couple of days and trigger another run then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have the 64 suffix. So, the long wait time is because it is trying to run a job on a runner that doesn't exist.
|
🤦 |
|
https://github.com/EnzymeAD/Reactant.jl/actions/runs/12820824033/job/35751035540#step:9:778 wut |
02e0207 to
628d420
Compare
|
current failing tests are due to some bug in the CUDA integration. the buildkite job was failing before this PR, but the one "CI / Julia 1.11 - integration - ubuntu-24.04-arm - aarch64 - packaged libReactant - assertions=false" could be a spurious error? |
|
I don't think that's spurious, I've got similar errors locally on my laptop the other day but didn't have the time to investigate. |
|
cc @avik-pal |
|
Same error happens also on |
|
My understanding is that it's happening at line 29 of Reactant.jl/test/integration/cuda.jl Lines 20 to 31 in 32762fb
!CUDA.functional() branch, which makes sense since we don't have a GPU in this setup.
|
|
yeah but we should still be able to compile the code successfully (and this works in a similar no gpu case in x86) |
|
No not really. Are you doing any world-age shenanigans? You might be executing in a world before CUDA.jl got loaded? |
|
well for one thing CUDA.isfunctional() is false [as there is no GPU on the machine]. but no CUDA should've already been loaded (and as an example we've already GPUCopiler.compile'd to generate LLVM from CUDA.jl] |
|
You may be in a scenario where GPUCompiler is loaded and thus most of the compiler functionality is there, but you are then calling something that CUDA.jl is supposed to implement, and you are executing in a world before CUDA.jl was loaded? That's really the only way you could get a the "not implemented" error |
There's a bug in 1.10/1.11 that can result in this happening (as observed in SciML code). JuliaLang/julia#57077 should fix it. |
|
Oh, cool, I can see if I can find an aarch64 machine to reproduce this on and see if that PR fixes it. Thanks for the heads up! |
|
I was able to successfully run the CUDA integration tests with JuliaLang/julia#57077 12 times in a row on a Grace Grace system, while they failed at the first try with Julia v1.11.2, so it looks like it was right that one. I guess this PR is ready to go then, failure is unrelated (also because it was happening with buildkite too). |
|
@giordano can you add a guard around the cuda tests if aarch64 and Julia version wouldn’t have the fix, don’t run them? |
Co-authored-by: Ian McInerney <mcianster@gmail.com>
628d420 to
62e2062
Compare
c57bf24 to
4271c29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
CI has some semeingly new red? |
|
|
ah hm, maybe we should lower the tolerance for the tan test or something? |
This comment was marked as outdated.
This comment was marked as outdated.
yeah, i agree |
Let's see how this fares. The idea is to reduce a little bit pressure on the
juliaecosystemrunners.