Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple CI jobs broken on master #749

Closed
4 of 5 tasks
sloede opened this issue Jul 19, 2023 · 11 comments
Closed
4 of 5 tasks

Multiple CI jobs broken on master #749

sloede opened this issue Jul 19, 2023 · 11 comments
Labels

Comments

@sloede
Copy link
Member

sloede commented Jul 19, 2023

It seems like multiple CI jobs are broken, e.g.,

Any idea what's going on here?

@sloede
Copy link
Member Author

sloede commented Jul 19, 2023

Ah, sorry, I should've checked more carefully - it seems like these tests have been broken for a while now, so nothing to get alarmed about. Still, at least some of those errors seem legit (e.g., the first one complaining about a missing symbol)

@luraess
Copy link
Contributor

luraess commented Jul 19, 2023

Yeah, seems most of the AMDGPU CI fails but still tests status shows success.

@giordano
Copy link
Member

test-openmpi-jll (ubuntu-latest, 1.6, x86) fails with ERROR: LoadError: UndefVarError: libhsa_runtime64 not defined

That's a crash in building AMDGPU.jl

test-mpitrampoline-intel-linux (1.6, 2.10.3) fails with a segfault in some multithreaded part

That's #725.

test-spack-mvapich (1) fails with yet another segfault

I think that's a bug in MVAPICH which may have been fixed in a later version.

Testing on Windows is failing in multiple places

#555

@giordano giordano added the CI label Jul 19, 2023
@giordano
Copy link
Member

That's a crash in building AMDGPU.jl

More specifically, the problem is that an old version of AMDGPU.jl (0.2.17) is picked up, despite

AMDGPU = "0.3, 0.4, 0.5"

This should have been fixed in newer versions.

@sloede
Copy link
Member Author

sloede commented Jul 19, 2023

Testing on Windows is failing in multiple places

#555

Are you sure the referenced issue is related to all these failures? Here, I also see premature aborts and errors in MPI_Waitall.

@luraess
Copy link
Contributor

luraess commented Jul 19, 2023

I'll check more carefully but to me it seems that none of the AMDGPU CI succeeds.

@giordano
Copy link
Member

That's a crash in building AMDGPU.jl

More specifically, the problem is that an old version of AMDGPU.jl (0.2.17) is picked up, despite

AMDGPU = "0.3, 0.4, 0.5"

This should have been fixed in newer versions.

AMDGPU.jl 0.3 isn't compatible with Julia v1.6. I don't see how we can possibly make it work. We should either drop support for Julia v1.6 or not use AMDGPU at all.

@sloede
Copy link
Member Author

sloede commented Jul 22, 2023

AMDGPU.jl 0.3 isn't compatible with Julia v1.6. I don't see how we can possibly make it work. We should either drop support for Julia v1.6 or not use AMDGPU at all.

Call me naive, but I don't see how anyone still uses MPI.jl + Julia v1.6 for productive workloads. For the sake of reducing maintenance efforts, I'd thus vote for dropping v1.6 support.

If this is not possible/desirable, then disable AMDGPU support for Julia v1.6 by adding a corresponding statement to the docs and removing the corresponding test matrix entries.

@luraess
Copy link
Contributor

luraess commented Jul 22, 2023

If this is not possible/desirable, then disable AMDGPU support for Julia v1.6 by adding a corresponding statement to the docs and removing the corresponding test matrix entries.

I'd say that AMDGPU Ext + MPI should be bound to Julia 1.9 & AMDGPU 0.5 as explained in #753

@giordano
Copy link
Member

AMDGPU is always a test dependency:

AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"

@giordano
Copy link
Member

giordano commented Nov 3, 2023

I think we have now cleared most of if not all the persistent failures by either addressing them or skipping consistently failing tests, which are already tracked by specific issues. If there are other relevant issues it'd be good to open dedicated issues, but I'm going to close this ticket, as CI should now be in a much better shape.

@giordano giordano closed this as completed Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants