Skip to content

Conversation

TopRichard
Copy link
Collaborator

@TopRichard TopRichard commented Jul 9, 2025

This PR uses a CUDA-ARM patch to workaround the previously seen error:

"__Int8x8_t" is undefined
  typedef __Int8x8_t int8x8_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(41): error: identifier "__Int16x4_t" is undefined
  typedef __Int16x4_t int16x4_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(42): error: identifier "__Int32x2_t" is undefined
  typedef __Int32x2_t int32x2_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
3): error: identifier "__Int64x1_t" is undefined
  typedef __Int64x1_t int64x1_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
4): error: identifier "__Float16x4_t" is undefined
  typedef __Float16x4_t float16x4_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
5): error: identifier "__Float32x2_t" is undefined
  typedef __Float32x2_t float32x2_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
6): error: identifier "__Poly8x8_t" is undefined
  typedef __Poly8x8_t poly8x8_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
7): error: identifier "__Poly16x4_t" is undefined
  typedef __Poly16x4_t poly16x4_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
8): error: identifier "__Uint8x8_t" is undefined
  typedef __Uint8x8_t uint8x8_t;

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(7
93): error: identifier "__builtin_aarch64_raddhnv2di_uuu" is undefined
    return __builtin_aarch64_raddhnv2di_uuu (__a, __b);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
00): error: identifier "__builtin_aarch64_addhn2v8hi" is undefined
    return __builtin_aarch64_addhn2v8hi (__a, __b, __c);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
07): error: identifier "__builtin_aarch64_addhn2v4si" is undefined
    return __builtin_aarch64_addhn2v4si (__a, __b, __c);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
14): error: identifier "__builtin_aarch64_addhn2v2di" is undefined
    return __builtin_aarch64_addhn2v2di (__a, __b, __c);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
21): error: identifier "__builtin_aarch64_addhn2v8hi_uuuu" is undefined
    return __builtin_aarch64_addhn2v8hi_uuuu (__a, __b, __c);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
28): error: identifier "__builtin_aarch64_addhn2v4si_uuuu" is undefined
    return __builtin_aarch64_addhn2v4si_uuuu (__a, __b, __c);
           ^

Error limit reached.
100 errors detected in the compilation of "tensorflow/core/kernels/reshape_util_gpu.cu.cc".

On x86_64 with cc80:

CPU tests:
Executed 847 out of 847 tests: 847 tests pass.

GPU tests
Executed 189 out of 189 tests: 189 tests pass.

On aarch64 with cc90 :

CPU tests:
Executed 847 out of 847 tests: 847 tests pass.

GPU tests
Executed 189 out of 189 tests: 188 tests pass and 1 fails locally

Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
2025-07-14 22:05:24.196765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 94876 MB memory:  -> device: 0, name: NVIDIA GH200 480GB, pci bus id: 0009:01:00.0, compute capability: 9.0
2025-07-14 22:05:24.216862: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 99485220864 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216882: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 89536700416 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216887: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 80583024640 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216890: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 72524718080 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216892: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 65272246272 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216903: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 58745020416 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216914: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 52870516736 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216916: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 47583465472 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216918: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 42825117696 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216921: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 38542606336 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216923: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 34688344064 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216925: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 31219509248 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216927: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 28097558528 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216929: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 25287802880 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216931: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 22759022592 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216934: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 20483119104 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216981: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 2241240576 on device 0 within provided limit.  limit=2147483648]
INFO:tensorflow:time(__main__.SparseToDenseTest.test2d): 0.62s
I0714 22:05:24.369109 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.test2d): 0.62s
[       OK ] SparseToDenseTest.test2d
[ RUN      ] SparseToDenseTest.test3d
INFO:tensorflow:time(__main__.SparseToDenseTest.test3d): 0.0s
I0714 22:05:24.372016 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.test3d): 0.0s
[       OK ] SparseToDenseTest.test3d
[ RUN      ] SparseToDenseTest.testBadDefault
2025-07-14 22:05:24.374394: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: default_value should be a scalar.
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadDefault): 0.0s
I0714 22:05:24.374547 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadDefault): 0.0s
[       OK ] SparseToDenseTest.testBadDefault
[ RUN      ] SparseToDenseTest.testBadNumValues
2025-07-14 22:05:24.376781: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: sparse_values has incorrect shape [3], should be [] or [2]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadNumValues): 0.0s
I0714 22:05:24.376892 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadNumValues): 0.0s
[       OK ] SparseToDenseTest.testBadNumValues
[ RUN      ] SparseToDenseTest.testBadShape
2025-07-14 22:05:24.378949: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: output_shape must be rank 1, got shape [2,1]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadShape): 0.0s
I0714 22:05:24.379058 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadShape): 0.0s
[       OK ] SparseToDenseTest.testBadShape
[ RUN      ] SparseToDenseTest.testBadValue
2025-07-14 22:05:24.381188: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: sparse_values has incorrect shape [2,1], should be [] or [2]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadValue): 0.0s
I0714 22:05:24.381294 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadValue): 0.0s
2025-07-14 22:05:24.378949: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: output_shape must be rank 1, got shape [2,1]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadShape): 0.0s
I0714 22:05:24.379058 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadShape): 0.0s
[       OK ] SparseToDenseTest.testBadShape
[ RUN      ] SparseToDenseTest.testBadValue
2025-07-14 22:05:24.381188: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: sparse_values has incorrect shape [2,1], should be [] or [2]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadValue): 0.0s
I0714 22:05:24.381294 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadValue): 0.0s
[       OK ] SparseToDenseTest.testBadValue
[ RUN      ] SparseToDenseTest.testComplex
INFO:tensorflow:time(__main__.SparseToDenseTest.testComplex): 0.1s
I0714 22:05:24.478588 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testComplex): 0.1s
[       OK ] SparseToDenseTest.testComplex
[ RUN      ] SparseToDenseTest.testEmptyNonZeros
INFO:tensorflow:time(__main__.SparseToDenseTest.testEmptyNonZeros): 0.0s
I0714 22:05:24.481788 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testEmptyNonZeros): 0.0s
[       OK ] SparseToDenseTest.testEmptyNonZeros
[ RUN      ] SparseToDenseTest.testFloatTypes0 (tf.bfloat16)
INFO:tensorflow:time(__main__.SparseToDenseTest.testFloatTypes0 (tf.bfloat16)): 0.0s
I0714 22:05:24.485352 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testFloatTypes0 (tf.bfloat16)): 0.0s
[       OK ] SparseToDenseTest.testFloatTypes0 (tf.bfloat16)
[ RUN      ] SparseToDenseTest.testFloatTypes1 (tf.float16)
Fatal Python error: Segmentation fault

@TopRichard TopRichard marked this pull request as draft July 9, 2025 18:52
@TopRichard
Copy link
Collaborator Author

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 9, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.07/pr_35/13085596

date job status comment
Jul 09 19:03:52 UTC 2025 submitted job id 13085596 will be eligible to start in about 20 seconds
Jul 09 19:03:59 UTC 2025 received job awaits launch by Slurm scheduler
Jul 09 19:04:22 UTC 2025 running job 13085596 is running
Jul 09 19:06:05 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13085596.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17520879110.tar.gzsize: 0 MiB (18163 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 19:06:05 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13085596.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 9, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.07/pr_35/13085903

date job status comment
Jul 09 19:24:15 UTC 2025 submitted job id 13085903 will be eligible to start in about 20 seconds
Jul 09 19:24:21 UTC 2025 received job awaits launch by Slurm scheduler
Jul 09 19:24:43 UTC 2025 running job 13085903 is running
Jul 09 19:26:36 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13085903.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17520891370.tar.gzsize: 0 MiB (18104 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 19:26:36 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13085903.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 9, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.07/pr_35/13086265

date job status comment
Jul 09 19:40:10 UTC 2025 submitted job id 13086265 will be eligible to start in about 20 seconds
Jul 09 19:40:22 UTC 2025 received job awaits launch by Slurm scheduler
Jul 09 19:40:35 UTC 2025 running job 13086265 is running
Jul 09 19:42:40 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13086265.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17520901000.tar.gzsize: 0 MiB (18099 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 19:42:40 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13086265.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 9, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.07/pr_35/13086405

date job status comment
Jul 09 20:24:47 UTC 2025 submitted job id 13086405 will be eligible to start in about 20 seconds
Jul 09 20:24:58 UTC 2025 received job awaits launch by Slurm scheduler
Jul 09 20:25:22 UTC 2025 running job 13086405 is running
Jul 09 20:27:16 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13086405.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17520927760.tar.gzsize: 0 MiB (18104 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 20:27:16 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13086405.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 9, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75015

date job status comment
Jul 09 20:40:10 UTC 2025 submitted job id 75015 awaits release by job manager
Jul 09 20:40:56 UTC 2025 released job awaits launch by Slurm scheduler
Jul 09 20:45:58 UTC 2025 running job 75015 is running
Jul 09 20:51:03 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75015.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-17520940310.tar.gzsize: 0 MiB (18096 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/init/easybuild/eb_hooks.py
Jul 09 20:51:03 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86_64_generic+default
P: perf: 373.396 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86_64_generic+default
P: perf: 387.924 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86_64_generic+default
P: latency: 2.71 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86_64_generic+default
P: latency: 2.72 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86_64_generic+default
P: latency: 4.53 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86_64_generic+default
P: latency: 4.55 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86_64_generic+default
P: latency: 0.68 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86_64_generic+default
P: latency: 0.72 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86_64_generic+default
P: bandwidth: 12428.62 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86_64_generic+default
P: bandwidth: 11405.27 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-75015.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic accel:nvidia/cc90

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 9, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75016

date job status comment
Jul 09 20:52:19 UTC 2025 submitted job id 75016 awaits release by job manager
Jul 09 20:53:06 UTC 2025 released job awaits launch by Slurm scheduler
Jul 09 20:54:08 UTC 2025 running job 75016 is running
Jul 09 20:55:09 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75016.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-17520944580.tar.gzsize: 0 MiB (18095 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 20:55:09 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75016.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

The failure is:

== Summary:
   * [FAILED]  cuDNN/8.9.2.26-CUDA-12.1.1
   * [SKIPPED] TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1
0:00:00  0 out of 2 easyconfigs doneERROR: Installation of cuDNN-8.9.2.26-CUDA-12.1.1.eb failed: "The End User License Agreement (EUL
A) for cuDNN is currently not accepted!\n(see https://docs.nvidia.com/deeplearning/cudnn/latest/reference/eula.html for more informat
ion)\nYou should either:\n- add --accept-eula-for=cuDNN to the 'eb' command;\n- update your EasyBuild configuration to always accept 
the EULA for cuDNN;\n- add 'accept_eula = True' to the easyconfig file you are using;\n"
Last EasyBuild log file copied from /tmp/eb-0sv_9why/easybuild-_r7fptuy.log to /eessi_bot_job
EasyBuild log file /tmp/eb-0sv_9why/easybuild-_r7fptuy.log copied to /project/def-users/SHARED/build-logs/jobs/75016/easybuild-_r7fpt
uy.log (with context appended)

@TopRichard
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic accel:nvidia/cc90

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 10, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75225

date job status comment
Jul 10 11:05:14 UTC 2025 submitted job id 75225 awaits release by job manager
Jul 10 11:05:19 UTC 2025 released job awaits launch by Slurm scheduler
Jul 10 11:11:21 UTC 2025 running job 75225 is running
Jul 10 11:13:23 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75225.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-17521459190.tar.gzsize: 0 MiB (18096 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 10 11:13:23 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75225.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 10, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75226

date job status comment
Jul 10 11:29:38 UTC 2025 submitted job id 75226 awaits release by job manager
Jul 10 11:30:27 UTC 2025 released job awaits launch by Slurm scheduler
Jul 10 11:35:29 UTC 2025 running job 75226 is running
Jul 10 15:55:18 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75226.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17521618690.tar.gzsize: 0 MiB (18096 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Jul 10 15:55:18 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75226.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

bot: help

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-jsc (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@TopRichard
Copy link
Collaborator Author

bot: help

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-jsc (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-compat

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Aug 22, 2025

Instance eessi-bot-jsc is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2025.06-software, eessi.io-2025.06-compat, eessi.io-2023.06-compat, eessi.io-2023.06-software

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Aug 22, 2025

Instance eessi-bot-vsc-ugent is configured to build on:

  • Node type gpu_a100:

    • OS: linux
    • CPU architecture: x86_64/amd/zen3
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software']
    • Accelerators: nvidia/cc80
  • Node type gpu_v100:

    • OS: linux
    • CPU architecture: x86_64/intel/cascaselake
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software']
    • Accelerators: nvidia/cc70

@laraPPr
Copy link
Contributor

laraPPr commented Aug 22, 2025

bot: build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software arch:cascaselake accel:nvidia/cc70

Curious, shouldn't this PR go to EESSI/software-layer?

The eb_hooks.py is not in the EESSI/software-layer anymore

Indeed, but the idea is to still manage which software is installed in EESSI via PRs to easystack files in EESSI/software-layer.

For testing purposes, to show that the updated eb_hooks.py works, a temporary easystack file can be included in a PR to this repo, but it should be removed before the final build /deploy is done (if there's something in the PR that needs deploying, like eb_hooks.py).

I think it might be better to always open a secondary pr where you do the actual testing to make sure that non of the builds get deployed from the pr and only the changed scripts like I did with #49 and #22. But I know that I said that I was gonna write out the policy but I have not gotten to it.

@boegel boegel marked this pull request as draft August 22, 2025 08:32
@laraPPr laraPPr marked this pull request as ready for review August 22, 2025 08:32
@laraPPr laraPPr marked this pull request as draft August 22, 2025 08:33
@boegel
Copy link
Contributor

boegel commented Aug 22, 2025

@laraPPr marking the PR as draft as long as the easystack file is in there probably helps, but indeed, we may need to come up with a better approach

@laraPPr
Copy link
Contributor

laraPPr commented Aug 22, 2025

It is gonna fail the test step but lets see for the build step.
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-vsc-ugent for:arch:x86_64/intel/cascadelake,accel:nvidia/cc80

@laraPPr
Copy link
Contributor

laraPPr commented Aug 22, 2025

The gent bot crashed because of a local problem. I'll update the reframe_config and try again later.

@laraPPr
Copy link
Contributor

laraPPr commented Aug 22, 2025

Ah no it does seem still alive but I made a mistake in the comment so lets see if this works:
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-vsc-ugent for:arch:x86_64/intel/cascadelake,accel:nvidia/cc70

@laraPPr
Copy link
Contributor

laraPPr commented Aug 22, 2025

Thirds the charm I hope
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70

@laraPPr
Copy link
Contributor

laraPPr commented Aug 22, 2025

No job is being created for some reason and I can't tell why

[20250822-T10:50:18] [handle_issue_comment_event]: handling command 'build architecture:x86_64/intel/cascadelake accelerator:nvidia/cc70 repository:eessi.io-2023.06-software instance:eessi-bot-vsc-ugent' resulted in '

  - no jobs were submitted'
´´´ 

@trz42
Copy link
Contributor

trz42 commented Aug 22, 2025

Debug building with new bot release...
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90

@trz42
Copy link
Contributor

trz42 commented Aug 22, 2025

No job was submitted, possibly because the node_type_map did not include the accel key. Retrying after adding

    "aarch64-nvidia-gh200": {
        "os": "linux",
        "cpu_subdir": "aarch64/nvidia/grace",
        "accel": "nvidia/cc90",
        "slurm_params": "--ntasks-per-node 18 --partition dc-gh --account foo",
        "repo_targets": ["eessi.io-2023.06-compat","eessi.io-2023.06-software","eessi.io-2025.06-compat","eessi.io-2025.06-software"]
    }

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Aug 22, 2025

New job on instance eessi-bot-jsc for repository eessi.io-2023.06-software
Building on: nvidia-grace and accelerator nvidia/cc90
Building for: aarch64/nvidia/grace and accelerator nvidia/cc90
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2025.08/pr_35/14013325

date job status comment
Aug 22 15:40:19 UTC 2025 submitted job id 14013325 awaits release by job manager
Aug 22 15:41:01 UTC 2025 released job awaits launch by Slurm scheduler
Aug 22 15:42:06 UTC 2025 running job 14013325 is running
Aug 22 15:45:15 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14013325.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc90-17558774160.tar.gzsize: 0 MiB (23902 bytes)
entries: 2
modules under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/software/linux/aarch64/nvidia/grace/.lmod/SitePackage.lua
Aug 22 15:45:15 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-14013325.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Contributor

trz42 commented Aug 22, 2025

Try cross-compiling for cc80...
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc80

@trz42
Copy link
Contributor

trz42 commented Aug 22, 2025

Supplying several values for the accel key in the app.cfg didn't work as intended. Trying a different cross-compiling approach...
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-jsc on:arch=aarch64/nvidia/grace for:arch=aarch64/nvidia/grace,accel=nvidia/cc80

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Aug 22, 2025

New job on instance eessi-bot-jsc for repository eessi.io-2023.06-software
Building on: nvidia-grace
Building for: aarch64/nvidia/grace and accelerator nvidia/cc80
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2025.08/pr_35/14013362

date job status comment
Aug 22 16:14:03 UTC 2025 submitted job id 14013362 awaits release by job manager
Aug 22 16:14:28 UTC 2025 released job awaits launch by Slurm scheduler
Aug 22 16:15:33 UTC 2025 running job 14013362 is running
Aug 22 16:17:39 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14013362.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc80-17558793740.tar.gzsize: 0 MiB (23902 bytes)
entries: 2
modules under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc80/software
no software packages in tarball
reprod directories under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/software/linux/aarch64/nvidia/grace/.lmod/SitePackage.lua
Aug 22 16:17:39 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-14013362.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Contributor

laraPPr commented Aug 22, 2025

bot: show_config

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 22, 2025

Instance eessi-bot-mc-aws is configured to build on:

  • Node type x86-64-generic:

    • OS: linux
    • CPU architecture: x86_64/generic
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type x86-64-haswell:

    • OS: linux
    • CPU architecture: x86_64/intel/haswell
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type x86-64-sapphirerapids:

    • OS: linux
    • CPU architecture: x86_64/intel/sapphirerapids
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type x86-64-skylake:

    • OS: linux
    • CPU architecture: x86_64/intel/skylake_avx512
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type x86-64-cascadelake:

    • OS: linux
    • CPU architecture: x86_64/intel/cascadelake
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type x86-64-icelake:

    • OS: linux
    • CPU architecture: x86_64/intel/icelake
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type x86-64-zen2:

    • OS: linux
    • CPU architecture: x86_64/amd/zen2
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type x86-64-zen3:

    • OS: linux
    • CPU architecture: x86_64/amd/zen3
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type x86-64-zen4:

    • OS: linux
    • CPU architecture: x86_64/amd/zen4
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type aarch64-generic:

    • OS: linux
    • CPU architecture: aarch64/generic
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type aarch64-neoverse_n1:

    • OS: linux
    • CPU architecture: aarch64/neoverse_n1
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type aarch64-neoverse_v1:

    • OS: linux
    • CPU architecture: aarch64/neoverse_v1
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type aarch64-graviton4:

    • OS: linux
    • CPU architecture: aarch64/aws/graviton4
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']

@eessi-bot-deucalion
Copy link

Instance eessi-bot-deucalion is configured to build on:

  • Node type a64fx:
    • OS: linux
    • CPU architecture: aarch64/a64fx
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-compat

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Aug 22, 2025

Instance eessi-bot-vsc-ugent is configured to build on:

  • Node type gpu_a100:

    • OS: linux
    • CPU architecture: x86_64/amd/zen3
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software']
    • Accelerators: nvidia/cc80
  • Node type gpu_v100:

    • OS: linux
    • CPU architecture: x86_64/intel/cascaselake
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software']
    • Accelerators: nvidia/cc70

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Aug 22, 2025

Instance eessi-bot-jsc is configured to build on:

  • Node type aarch64-nvidia-grace:

    • OS: linux
    • CPU architecture: aarch64/nvidia/grace
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
  • Node type aarch64-nvidia-gh200:

    • OS: linux
    • CPU architecture: aarch64/nvidia/grace
    • Repositories: ['eessi.io-2023.06-compat', 'eessi.io-2023.06-software', 'eessi.io-2025.06-compat', 'eessi.io-2025.06-software']
    • Accelerators: nvidia/cc90

@laraPPr
Copy link
Contributor

laraPPr commented Aug 22, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Aug 22, 2025

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2023.06-software
Building on: intel-cascadelake and accelerator nvidia/cc70
Building for: x86_64/intel/cascadelake and accelerator nvidia/cc70
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2025.08/pr_35/40715916

date job status comment
Aug 22 18:39:37 UTC 2025 submitted job id 40715916 awaits release by job manager
Aug 22 18:41:03 UTC 2025 released job awaits launch by Slurm scheduler
Aug 22 21:29:15 UTC 2025 running job 40715916 is running
Aug 22 21:31:17 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-40715916.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-cascadelake-accel-nvidia-cc70-17558982450.tar.gzsize: 0 MiB (23911 bytes)
entries: 2
modules under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/software/linux/x86_64/intel/cascadelake/.lmod/SitePackage.lua
Aug 22 21:31:17 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-40715916.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Contributor

laraPPr commented Aug 25, 2025

@TopRichard can you sync this pr with the main branch because I think it is not picking up these changes #59

@laraPPr
Copy link
Contributor

laraPPr commented Aug 25, 2025

@TopRichard apparently its a bigger issue so I'm moving my experementing to EESSI/software-layer#1147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants