Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netCDF 4.9.0 tests fail when RPATH support is enabled #17983

Closed
bedroge opened this issue May 27, 2023 · 14 comments
Closed

netCDF 4.9.0 tests fail when RPATH support is enabled #17983

bedroge opened this issue May 27, 2023 · 14 comments
Labels
bug report EESSI Related to EESSI project
Milestone

Comments

@bedroge
Copy link
Contributor

bedroge commented May 27, 2023

When using --rpath, all tests for netCDF-4.9.0-gompi-2022a.eb fail due to:

error while loading shared libraries: libnetcdf.so.19: cannot open shared object file: No such file or directory

The binaries do actually have an RPATH section, but they point to the installation dir, while the tests run before the installation step. It also includes $ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64, but the library is located in ../liblib/ in the build dir...

@boegel boegel added this to the next release (4.7.3?) milestone Jun 7, 2023
@casparvl
Copy link
Contributor

casparvl commented Jun 7, 2023

I think this is not so much because you enabled rpath support, but because you filter LD_LIBRARY_PATH. Could you check that @bedroge ? If so, maybe adapt the title of the issue to make clear it's related to filtering LD_LIBRARY_PATH, not setting rpath.

@boegel
Copy link
Member

boegel commented Jun 7, 2023

This rings a bell, I could've sworn I ran into this (and maybe even fixed it somehow), but I'm not sure it was with netCDF...
Will take another look, and come back with more info (or ping me if I don't)

@bedroge
Copy link
Contributor Author

bedroge commented Jun 8, 2023

I think this is not so much because you enabled rpath support, but because you filter LD_LIBRARY_PATH. Could you check that @bedroge ? If so, maybe adapt the title of the issue to make clear it's related to filtering LD_LIBRARY_PATH, not setting rpath.

Unless it somehow does this automatically (but I don't think that's the case?), I'm not filtering LD_LIBRARY_PATH. Also reproduced this on another machine with just passing --rpath, and having more or less default settings for everything else.

@boegel boegel modified the milestones: 4.8.0, release after 4.8.0 Jul 6, 2023
@bedroge
Copy link
Contributor Author

bedroge commented Jul 24, 2023

I'm now seeing similar issues for other packages that use the CMakeMake easyblock and have runtest = 'test', e.g. Xerces and json-c. I think it's due to:

CMake by default sets the RPATH property on executables that link to shared libraries in the same project. This makes it possible to run executables (like tests) from the build directory without having to set LD_LIBRARY_PATH.
https://gitlab.kitware.com/cmake/cmake/-/issues/18413

I checked the build dir of a Xerces build (with EB RPATH support disabled), and can indeed confirm that the executables that were called for the tests had an RPATH pointing to, in this case, $builddir/easybuild_obj/src. This gets removed during the install step.

I guess that, with EB RPATH support enabled, this RPATH entry gets overwritten / ignored by the RPATH wrappers of EB, resulting in executables that cannot find the shared libraries in the build directory anymore. I'm not sure if it's easy to detect this, but it would be nice if these directories that CMake wants to add to the RPATH, can somebow be added to (and, later on, removed again from) the ones from EB. Alternatively, we would have to set $LD_LIBRARY_PATH for the tests, but I'm not sure if there's an easy way to figure out which directories would have to be added in that case: sometimes these shared libraries are in the root of the build directory, sometimes they're in a subdirectory (for Xerces they're in src, for netCDF in liblib).

@bedroge
Copy link
Contributor Author

bedroge commented Sep 26, 2023

@boegel
It seems related to this: easybuilders/easybuild-easyblocks#1031.

This settings instructs CMake to not do any RPATH stuff, and hence the tests (performed in the build directory) fail due to binaries not being able to locate shared libraries in the build directory.

I just did some tests with -DCMAKE_SKIP_RPATH=OFF instead of ON, and for me it seems to append paths only. So maybe something has changed in CMake, and it no longer strips all RPATHs?

@bedroge
Copy link
Contributor Author

bedroge commented Sep 26, 2023

Some more details: I did a json-c installation with --stop=build --rpath to make EB stop after the build, right before the test step. With the original settings/easyconfig, I see the following RPATH in the test binaries in the build directory:

  RPATH                /home/bob/easybuildinstall/software/json-c/0.16-GCCcore-12.2.0/lib:/home/bob/easybuildinstall/software/json-c/0.16-GCCcore-12.2.0/lib64:$ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64:/data/apps/software/binutils/2.39-GCCcore-12.2.0/lib64:/data/apps/software/GCCcore/12.2.0/lib64:/data/apps/software/GCCcore/12.2.0/lib:/data/apps/software/libarchive/3.6.1-GCCcore-12.2.0/lib/../lib64:/data/apps/software/XZ/5.2.7-GCCcore-12.2.0/lib/../lib64:/data/apps/software/cURL/7.86.0-GCCcore-12.2.0/lib/../lib64:/home/bob/.local/easybuild/software/OpenSSL/1.1/lib/../lib64:/data/apps/software/bzip2/1.0.8-GCCcore-12.2.0/lib/../lib64:/data/apps/software/ncurses/6.3-GCCcore-12.2.0/lib/../lib64:/data/apps/software/zlib/1.2.12-GCCcore-12.2.0/lib/../lib64:/data/apps/software/GCCcore/12.2.0/lib/gcc/x86_64-pc-linux-gnu/12.2.0:/data/apps/software/libarchive/3.6.1-GCCcore-12.2.0/lib:/data/apps/software/XZ/5.2.7-GCCcore-12.2.0/lib:/data/apps/software/cURL/7.86.0-GCCcore-12.2.0/lib:/home/bob/.local/easybuild/software/OpenSSL/1.1/lib:/data/apps/software/bzip2/1.0.8-GCCcore-12.2.0/lib:/data/apps/software/ncurses/6.3-GCCcore-12.2.0/lib:/data/apps/software/zlib/1.2.12-GCCcore-12.2.0/lib

Then I added configopts = '-DCMAKE_SKIP_RPATH=OFF ' to the easyconfig, and got the following RPATH:

  RPATH                /home/bob/easybuildinstall/software/json-c/0.16-GCCcore-12.2.0/lib:/home/bob/easybuildinstall/software/json-c/0.16-GCCcore-12.2.0/lib64:$ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64:/data/apps/software/binutils/2.39-GCCcore-12.2.0/lib64:/data/apps/software/GCCcore/12.2.0/lib64:/data/apps/software/GCCcore/12.2.0/lib:/data/apps/software/libarchive/3.6.1-GCCcore-12.2.0/lib/../lib64:/data/apps/software/XZ/5.2.7-GCCcore-12.2.0/lib/../lib64:/data/apps/software/cURL/7.86.0-GCCcore-12.2.0/lib/../lib64:/home/bob/.local/easybuild/software/OpenSSL/1.1/lib/../lib64:/data/apps/software/bzip2/1.0.8-GCCcore-12.2.0/lib/../lib64:/data/apps/software/ncurses/6.3-GCCcore-12.2.0/lib/../lib64:/data/apps/software/zlib/1.2.12-GCCcore-12.2.0/lib/../lib64:/data/apps/software/GCCcore/12.2.0/lib/gcc/x86_64-pc-linux-gnu/12.2.0:/data/apps/software/libarchive/3.6.1-GCCcore-12.2.0/lib:/data/apps/software/XZ/5.2.7-GCCcore-12.2.0/lib:/data/apps/software/cURL/7.86.0-GCCcore-12.2.0/lib:/home/bob/.local/easybuild/software/OpenSSL/1.1/lib:/data/apps/software/bzip2/1.0.8-GCCcore-12.2.0/lib:/data/apps/software/ncurses/6.3-GCCcore-12.2.0/lib:/data/apps/software/zlib/1.2.12-GCCcore-12.2.0/lib:/data/eb/build/jsonc/0.16/GCCcore-12.2.0/easybuild_obj

So, that seems to contain both the paths added by EB (just like with the original easyconfig) and the one (the last one) added by CMake.

Finally, when doing a full installation and checking a shared library in the installation directory:

  RPATH                /home/bob/easybuildinstall/software/json-c/0.16-GCCcore-12.2.0/lib:/home/bob/easybuildinstall/software/json-c/0.16-GCCcore-12.2.0/lib64:$ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64:/data/apps/software/binutils/2.39-GCCcore-12.2.0/lib64:/data/apps/software/GCCcore/12.2.0/lib64:/data/apps/software/GCCcore/12.2.0/lib:/data/apps/software/libarchive/3.6.1-GCCcore-12.2.0/lib/../lib64:/data/apps/software/XZ/5.2.7-GCCcore-12.2.0/lib/../lib64:/data/apps/software/cURL/7.86.0-GCCcore-12.2.0/lib/../lib64:/home/bob/.local/easybuild/software/OpenSSL/1.1/lib/../lib64:/data/apps/software/bzip2/1.0.8-GCCcore-12.2.0/lib/../lib64:/data/apps/software/ncurses/6.3-GCCcore-12.2.0/lib/../lib64:/data/apps/software/zlib/1.2.12-GCCcore-12.2.0/lib/../lib64:/data/apps/software/GCCcore/12.2.0/lib/gcc/x86_64-pc-linux-gnu/12.2.0:/data/apps/software/libarchive/3.6.1-GCCcore-12.2.0/lib:/data/apps/software/XZ/5.2.7-GCCcore-12.2.0/lib:/data/apps/software/cURL/7.86.0-GCCcore-12.2.0/lib:/home/bob/.local/easybuild/software/OpenSSL/1.1/lib:/data/apps/software/bzip2/1.0.8-GCCcore-12.2.0/lib:/data/apps/software/ncurses/6.3-GCCcore-12.2.0/lib:/data/apps/software/zlib/1.2.12-GCCcore-12.2.0/lib

So I don't see any signs of CMake stripping the EB-added paths in RPATH here.

@boegel
Copy link
Member

boegel commented Sep 26, 2023

We saw a similar problem with GMP, see #11188, but there CMake is not involved.
The workaround used there may still be useful here, but it does seems like there's a broader issue we're not dealing with properly...

@boegel boegel added the EESSI Related to EESSI project label Sep 26, 2023
@boegel
Copy link
Member

boegel commented Oct 3, 2023

@pescobar Does this ring any bells for you?

Have you seen failing test step for stuff like netCDF or json-c when installing with RPATH linking?

I'm trying to figure out why not everyone is seeing problems due to the -DCMAKE_SKIP_RPATH=ON that was added in easybuilders/easybuild-easyblocks#1031 (Nov'16, EasyBuild v3.0)...

@pescobar
Copy link
Member

pescobar commented Oct 3, 2023

@boegel our installation with RPATH enabled passes the check_step fine

== 2022-01-06 14:46:18,066 build_log.py:265 INFO ... (took 17 secs)
== 2022-01-06 14:46:18,067 build_log.py:265 INFO testing...
== 2022-01-06 14:46:18,067 easyblock.py:3701 INFO Starting test step
== 2022-01-06 14:46:18,067 easyconfig.py:1686 INFO Generating template values...
== 2022-01-06 14:46:18,067 mpi.py:119 INFO Using template MPI command 'mpirun -n %(nr_ranks)s %(cmd)s' for MPI family 'OpenMPI'
== 2022-01-06 14:46:18,068 mpi.py:304 INFO Using MPI command template 'mpirun -n %(nr_ranks)s %(cmd)s' (params: {'cmd': 'xxx_command_xxx', 'nr_ranks': 1})
== 2022-01-06 14:46:18,068 easyconfig.py:1705 INFO Template values: arch='x86_64', bitbucket_account='netcdf', builddir='/scratch/soft/netCDF/4.8.0/gompi-2021a', github_account='netcdf', installdir='/scicore/soft/apps/netCDF/4.8.0-gompi-2021a', module_name='netCDF/4.8.0-gompi-2021a
', mpi_cmd_prefix='mpirun -n 1', name='netCDF', nameletter='n', nameletterlower='n', namelower='netcdf', parallel='20', toolchain_name='gompi', toolchain_version='2021a', version='4.8.0', version_major='4', version_major_minor='4.8', version_minor='8', versionprefix='', versionsuff
ix=''

my colleagues built this module long ago using EasyBuild 4.5.1 (framework: 4.5.1, easyblocks: 4.5.1) on centos7

I tried objdump -x bin/* |grep RPATH in the netcdf binaries folder and RPATH seems to be enabled

edit: update log

@bedroge
Copy link
Contributor Author

bedroge commented Oct 3, 2023

@boegel our installation with RPATH enabled passes the check_step fine

== 2022-01-06 14:46:18,066 build_log.py:265 INFO ... (took 17 secs)
== 2022-01-06 14:46:18,067 build_log.py:265 INFO testing...
== 2022-01-06 14:46:18,067 easyblock.py:3701 INFO Starting test step
== 2022-01-06 14:46:18,067 easyconfig.py:1686 INFO Generating template values...
== 2022-01-06 14:46:18,067 mpi.py:119 INFO Using template MPI command 'mpirun -n %(nr_ranks)s %(cmd)s' for MPI family 'OpenMPI'
== 2022-01-06 14:46:18,068 mpi.py:304 INFO Using MPI command template 'mpirun -n %(nr_ranks)s %(cmd)s' (params: {'cmd': 'xxx_command_xxx', 'nr_ranks': 1})
== 2022-01-06 14:46:18,068 easyconfig.py:1705 INFO Template values: arch='x86_64', bitbucket_account='netcdf', builddir='/scratch/soft/netCDF/4.8.0/gompi-2021a', github_account='netcdf', installdir='/scicore/soft/apps/netCDF/4.8.0-gompi-2021a', module_name='netCDF/4.8.0-gompi-2021a
', mpi_cmd_prefix='mpirun -n 1', name='netCDF', nameletter='n', nameletterlower='n', namelower='netcdf', parallel='20', toolchain_name='gompi', toolchain_version='2021a', version='4.8.0', version_major='4', version_major_minor='4.8', version_minor='8', versionprefix='', versionsuff
ix=''

my colleagues built this module long ago using EasyBuild 4.5.1 (framework: 4.5.1, easyblocks: 4.5.1) on centos7

I tried objdump -x bin/* |grep RPATH in the netcdf binaries folder and RPATH seems to be enabled

edit: update log

I think netCDF 4.8.x also worked fine for me, don't know why though. I started encountering this issue with netCDF 4.9.0, do you have this version as well by any chance?

@boegel
Copy link
Member

boegel commented Oct 3, 2023

I think netCDF 4.8.x also worked fine for me, don't know why though. I started encountering this issue with netCDF 4.9.0, do you have this version as well by any chance?

We're only running the netCDF tests since v4.9.0

@boegel
Copy link
Member

boegel commented Oct 4, 2023

Based on some testing by @bedroge and some digging in the CMake commits in GitHub, it seems like the -DCMAKE_SKIP_RPATH=OFF is no longer needed since CMake v3.5.0 (due to Kitware/CMake@3ec9226), so guarding the use of -DCMAKE_SKIP_RPATH=OFF with a CMake version check seems like a good way forward...

@bedroge
Copy link
Contributor Author

bedroge commented Oct 4, 2023

Based on some testing by @bedroge and some digging in the CMake commits in GitHub, it seems like the -DCMAKE_SKIP_RPATH=OFF is no longer needed since CMake v3.5.0 (due to Kitware/CMake@3ec9226), so guarding the use of -DCMAKE_SKIP_RPATH=OFF with a CMake version check seems like a good way forward...

I've run several tests, also for METIS-5.0.2: I was able to reproduce the issue that easybuilders/easybuild-easyblocks#1031 tried to fix by removing the -DCMAKE_SKIP_RPATH=OFF part from the metis easyblock and using older CMake versions.

For instance, with CMake 2.8.x I found that CMake does add RPATHs to the binaries in the build directory, but then removes them when it's actually installing them to the installation directory: readelf -d <binary> shows that there is no RPATH section anymore, and also the installation log shows Removed runtime path from ..... This leads to an error in the sanity check, as it will also check for an RPATH when building with --rpath, and obviously this will then fail.

With more recent CMake versions (e.g. 3.18.4) this problem is gone though: CMake will add some additional paths to RPATH during the build stage, as that's required to pass the tests that are run after the build step but before the install step. These additional paths are removed again when the binaries are getting installed, and you end up with only the paths added by EasyBuild.

After @boegel found the commit that should fix this in CMake, I've tried the same thing with CMake 3.4.3 and 3.5.0, and indeed the issue was still there with 3.4.3, but it was gone when using 3.5.0: with that version, the behaviour is similar to version 3.18.4.

Conclusion: it looks like -DCMAKE_SKIP_RPATH=OFF can be safely removed for CMake >= 3.50, and that should solve all the failing test issues for netCDF, json-c, Xerces, and possibly more, while not reintroducing any issues for METIS. Note that there could still be issues for users building stuff with old CMake versions (< 3.5.0) and using --rpath. But these CMake versions depend on toolchains from 2016, so I guess they will be deprecated soon anyway,

@bedroge
Copy link
Contributor Author

bedroge commented Oct 5, 2023

Solved in easybuilders/easybuild-easyblocks#3012.

@bedroge bedroge closed this as completed Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report EESSI Related to EESSI project
Projects
None yet
Development

No branches or pull requests

4 participants