enhance archdetect to support detection of NVIDIA GPUs + using that in EESSI init script#767
Conversation
…n EESSI init script
|
Instance
|
|
Instance
|
|
Instance
|
ocaisa
left a comment
There was a problem hiding this comment.
I think we can give people the override capabilities to allow mix and match CPU/GPU stacks (in terms of the CPU architectures)
…GPU_SOFTWARE_SUBDIR_OVERRIDE Co-authored-by: ocaisa <alan.ocais@cecam.org>
…ailed to run + take that into account in EESSI init script + allow overriding software subdirectory for accel/* via $EESSI_ACCEL_SOFTWARE_SUBDIR_OVERRIDE
…t, must be 'accel/nvidia/cc[0-9][0-9]'
…with 'No devices were found' if no GPUs are available in Slurm job
|
@ocaisa Don't merge this just yet (although it's ready for re-review + testing). We should go all the way here, and also set up some CI for this, by using fake |
|
Tested extensively: CPU-only system (zen2), no
|
|
@ocaisa I've added an extensive GitHub Actions workflow for verifying the NVIDIA GPU accelerator detection implemented in this PR, see 24f0620. There's one issue though: implementing these tests revealed that the EESSI init script now "chokes" when both:
For me, this is for enough reason to re-consider your (currently implemented) suggestion to let the Thoughts? |
|
I would temporarily disable |
|
Or give archdetect an environment variable or option that allows overriding returning an error code |
…ct to detect accelerator
|
@ocaisa Anything blocking this now? |
|
bot: build repo:eessi.io-2023.06-software arch:zen2 |
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
|
Staging PR merged, good to go! |
|
PR merged! Moved |
|
PR merged! Moved |
Example output: