Skip to content

Add handling of the same-name libraries on different locations for link_nvidia_host_libraries.sh#972

Merged
bedroge merged 3 commits intoEESSI:2023.06-software.eessi.iofrom
Darkless012:2023.06-software.eessi.io
Mar 27, 2025
Merged

Add handling of the same-name libraries on different locations for link_nvidia_host_libraries.sh#972
bedroge merged 3 commits intoEESSI:2023.06-software.eessi.iofrom
Darkless012:2023.06-software.eessi.io

Conversation

@Darkless012
Copy link
Copy Markdown
Contributor

Check for duplicates when reading ldconfig directories.

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Mar 20, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Mar 20, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-toprichard
Copy link
Copy Markdown

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@eessi-bot-trz42
Copy link
Copy Markdown

Instance trz42-GH200-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

ocaisa
ocaisa previously requested changes Mar 20, 2025
if [[ "$target" != "/tmp/nvidia_libs/$lib"* && "$target" != *"/tmp/nvidia_libs/"* ]]; then
echo "Error: Symlink $lib_path points to $target, which is not in our mock directory"
# Verify it points to our mock library in /tmp/nvidia_libs or /tmp/nvidia_libs_duplicate
if [[ "$target" != "/tmp/nvidia_libs/$lib"* && "$target" != "/tmp/nvidia_libs_duplicate/$lib"* ]]; then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we know exactly which one it should point to? And there should be something in the output that says the other one was filtered

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since it is duplicate.
The real-world example was /lib was the same as /lib64 on Azure
in this case you don't care if the lib is linked from any location.

if [ "$existing_name" = "$lib_name" ]; then
log_verbose "Duplicate library found: $lib_name (existing: $existing_lib, new: $lib_path)"
# Prioritize libraries in standard locations if possible
if [[ "$lib_path" == "/usr/lib"* || "$lib_path" == "/lib"* ]]; then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this will catch all cases. Why not just take the first option? That is what the linker itself would do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken, I've removed this part.

Comment on lines +542 to +543
# Continue instead of fatal_error to make the script more robust
continue
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what happens if the creation of all symlinks fails? I don't think this helps with robustness, to me this is an error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted it back to fatal_error.

@boegel
Copy link
Copy Markdown
Contributor

boegel commented Mar 26, 2025

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Mar 26, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Mar 26, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 resulted in:

    • no jobs were submitted

@eessi-bot-trz42
Copy link
Copy Markdown

Updates by the bot instance trz42-GH200-jr (click for details)
  • account boegel has NO permission to send commands to the bot

@eessi-bot-toprichard
Copy link
Copy Markdown

Updates by the bot instance rt-Grace-jr (click for details)
  • account boegel has NO permission to send commands to the bot

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Mar 26, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.03/pr_972/52689

date job status comment
Mar 26 22:10:16 UTC 2025 submitted job id 52689 awaits release by job manager
Mar 26 22:10:55 UTC 2025 released job awaits launch by Slurm scheduler
Mar 26 22:17:58 UTC 2025 running job 52689 is running
Mar 26 22:25:06 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-52689.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1743027510.tar.gzsize: 0 MiB (8006 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen2/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Mar 26 22:25:06 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86_64_amd_zen2+default
P: perf: 441.827 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86_64_amd_zen2+default
P: perf: 445.013 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86_64_amd_zen2+default
P: latency: 1.81 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 1.84 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 4.02 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 4.17 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 0.6 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86_64_amd_zen2+default
P: latency: 0.58 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86_64_amd_zen2+default
P: bandwidth: 7367.41 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86_64_amd_zen2+default
P: bandwidth: 7287.66 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-52689.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Mar 26 22:33:11 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1743027510.tar.gz to S3 bucket succeeded

@boegel boegel added the bot:deploy Ask bot to deploy missing software installations to EESSI label Mar 26, 2025
@eessi-bot-toprichard
Copy link
Copy Markdown

Label bot:deploy has been set by user boegel, but this person does not have permission to trigger deployments

1 similar comment
@eessi-bot-trz42
Copy link
Copy Markdown

Label bot:deploy has been set by user boegel, but this person does not have permission to trigger deployments

@bedroge
Copy link
Copy Markdown
Collaborator

bedroge commented Mar 27, 2025

I'm going to merge this, the CI is still failing due to issues with lit, and that's not related to this PR. Not merging this will make all current builds pick up the old version of this file.

@bedroge bedroge dismissed ocaisa’s stale review March 27, 2025 09:14

This is already ingested, we need to merge it.

@bedroge bedroge merged commit 14a9218 into EESSI:2023.06-software.eessi.io Mar 27, 2025
50 of 70 checks passed
@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Mar 27, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.03/pr_972/52689'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.03.27

@eessi-bot
Copy link
Copy Markdown

eessi-bot Bot commented Mar 27, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.03.27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:deploy Ask bot to deploy missing software installations to EESSI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants