Skip to content

Conversation

casparvl
Copy link
Contributor

@casparvl casparvl commented Aug 19, 2025

When introducing the new tarball naming to make sure tarball names were unique per GPU architecture, I did not realize that ${EESSI_ACCELERATOR_TARGET_OVERRIDE//\//-} would simply be completely empty for a CPU build, and thus the string formatting would be short on value. This caused tarball names to be incorrect, as they ended with -0.tar.gz, see e.g. https://github.com/EESSI/staging_bundles/pull/9 . That's problematic, since the bot assumes the item after the last - to be the timestamp. So, with the policy to only upload the last tarball every tarball would be uploaded, since all had identical timestamp (0) according to the bot.

Fixed in this PR by making the number of formatting specs conditional on the accelerator override being non-empty.

@casparvl
Copy link
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:zen2
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 19, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_65/83972

date job status comment
Aug 19 13:17:15 UTC 2025 submitted job id 83972 awaits release by job manager
Aug 19 13:17:23 UTC 2025 released job awaits launch by Slurm scheduler
Aug 19 13:22:25 UTC 2025 running job 83972 is running
Aug 19 13:25:28 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-83972.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 19 13:25:28 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86_64_amd_zen2+default
P: perf: 441.45 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86_64_amd_zen2+default
P: perf: 445.412 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86_64_amd_zen2+default
P: latency: 1.79 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 1.77 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 3.82 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 4.16 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 2.02 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86_64_amd_zen2+default
P: latency: 0.54 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86_64_amd_zen2+default
P: bandwidth: 7427.29 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86_64_amd_zen2+default
P: bandwidth: 7443.85 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-83972.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Aug 19, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.08/pr_65/14161256

date job status comment
Aug 19 13:17:15 UTC 2025 submitted job id 14161256 will be eligible to start in about 20 seconds
Aug 19 13:17:25 UTC 2025 received job awaits launch by Slurm scheduler
Aug 19 13:23:29 UTC 2025 running job 14161256 is running
Aug 19 13:24:51 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-14161256.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 19 13:24:51 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-14161256.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

raise ValueError(f"Invalid environment variable name: {var_name}")
list_string = os.getenv(var_name, '[]')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is just so there's something to include in generated tarball?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, remove any changes to eb_hooks.py after testing...

Also, this removal of whitespace is some vim feature that @smoors recommended to me, I think XD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(there's a reason I opened the PR in draft mode ;-) I'll put it on ready to review after removing this)

@boegel
Copy link
Contributor

boegel commented Aug 19, 2025

Note: tackling this via #67, in combination with #66...

@casparvl
Copy link
Contributor Author

Closing in favor of #67

@casparvl casparvl closed this Aug 20, 2025
@boegel boegel added bug Something isn't working 2025.06-software.eessi.io 2025.06 version of software.eessi.io 2023.06-software.eessi.io 2023.06 version of software.eessi.io labels Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io 2025.06-software.eessi.io 2025.06 version of software.eessi.io bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants