Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to enable/disable compression libraries #2712

Merged
merged 2 commits into from
Jul 26, 2023

Conversation

gsjaardema
Copy link
Contributor

Even though a library may exist on the system, the user may not want to have their netCDF library depend on it. This adds the capability to disable the various compression libraries. To maintain the same behavior as before, they are all enabled by default so they will be added as a dependency if found on the system. The user can do -DENABLE_BLOSC=NO on the configure to disable the filter.

I used the uppercase version of the library/filter in the ENABLE_ line; not sure if that is the best, but it makes it easier instead of trying to guess the upper/lower variations of the libraries/filters.

@DennisHeimbigner
Copy link
Collaborator

I don't understand what you mean by this sentence:

Even though a library may exist on the system, the user may not want to have their netCDF library depend on it.

Netcdf files can have dependencies on the filters it uses, but I cannot see how the netcdf-c library has any
dependency on any given filter. It just enables what it finds in HDF5_PLUGIN_PATH.
Can you elaborate and perhaps give an example of such a dependency?

@gsjaardema
Copy link
Contributor Author

Can you elaborate and perhaps give an example of such a dependency?

All I mean is that if I know my netCDF library is not going to use the blosc filters for example, I don't want to have libblosc.so as a dependency.

If I do an install with the current library, and then do a ldd libnetcdf.so, I get a list of about 30 or more libraries in the dependency list. If I disable blosc, sz, bz2, zstd, and use tinyxml instead of libxml2, then I can drop the number of dependent libraries in the ldd list down to a much smaller number. In other words, with the "appropriate" disables, I can have a libnetcdf.so that only depends on libhdf5, libhdf5_hl, libz, libc, libm, libdl.

Another driver for this is that if I happen to have certain modules enabled at build/configure time, then CMake may find libraries in non-standard locations. Unless I then have the rpath embedded in the library, it is possible for another user of my library to not have those modules enabled and then he will have unresolved externals even though he may not even use any of the functions in that library. We had that happen with libblosc.so which was found at configure time, but was not found at runtime by another application which linked to the libnetcdf.so

@DennisHeimbigner
Copy link
Collaborator

Interesting. The problem is to detect various librarys (e.g. libblosc) without causing them
to be linked in the netcdf-c library.At the same time, we want the library to be linked by the filters
in the plugin directory. This will take some research.

@DennisHeimbigner
Copy link
Collaborator

I just checked using Ubuntu 22 and the filter libraries are not included in the ldd output.
Are you using automake? On what platform?

@edwardhartnett
Copy link
Contributor

Is the BLOSC filter supported in any way by the netCDF configure?

@gsjaardema
Copy link
Contributor Author

I am using a CMake configure. Here is the ldd output I get from the standard build:

	linux-vdso.so.1
	libdl.so.2
	libhdf5_hl.so.310
	libhdf5.so.310
	libm.so.6
	libz.so.1
	libblosc.so.1
	libzstd.so.1
	libbz2.so.1.0
	libcurl.so.4
	libxml2.so.2
	libc.so.6
	/lib64/ld-linux-x86-64.so.2 (0x00007f7e45080000)
	libpthread.so.0
	liblz4.so.1
	libssh2.so.1
	libssl.so.1.1
	libcrypto.so.1.1
	libgssapi_krb5.so.2
	libkrb5.so.3
	libk5crypto.so.3
	libcom_err.so.3
	librt.so.1
	libicuuc.so.58
	liblzma.so.5
	libkrb5support.so.0
	libresolv.so.2
	libicudata.so.58
	libstdc++.so.6
	libgcc_s.so.1

Here is the ldd output when I disable blosc, zstd, bz2, szip, curl, and libxml:

	linux-vdso.so.1
	libdl.so.2
	libhdf5_hl.so.200
	libhdf5.so.200
	libm.so.6
	libz.so.1
	libc.so.6
	/lib64/ld-linux-x86-64.so.2 (0x00007f6ff45df000)

@gsjaardema
Copy link
Contributor Author

Interesting. The problem is to detect various librarys (e.g. libblosc) without causing them to be linked in the netcdf-c library.At the same time, we want the library to be linked by the filters in the plugin directory. This will take some research.

Not sure I understand this as once the filters are detected, they are enabled in hdf5filter.c which makes them an explicit dependency of the netCDF library.

The `FIND_PACKAGE` should not be called if the filter/compression library is not enabled.  It was causing some inconsistencied in link libraries and CMake configure output...
@gsjaardema
Copy link
Contributor Author

Is the BLOSC filter supported in any way by the netCDF configure?

I'm not sure. configure would probably also need some modifications similar to these, but I haven't looked at it at all.

@DennisHeimbigner
Copy link
Collaborator

Ok, I see the problem exists in both automake and cmake.
The goal of looking for certain filter libraries was to be
able to build the corresponding plugins in the plugins directory.
There was no need to make libnetcdf include them. In fact, I am surprised
that they are there (via ldd) because I would have thought the linker
was smart enough to see that libnetcdf was not referencing any symbol
in the filter library.

I suspect that it would require a lot of automake/cmake-foo to
fix this, so some variant of Greg's solution is probably the only way out.

@dopplershift
Copy link
Member

In fact, I am surprised that they are there (via ldd) because I would have thought the linker was smart enough to see that libnetcdf was not referencing any symbol in the filter library.

IIRC, you have to opt into that behavior with the linker. Otherwise, if you ask it to link a library, it's linking that library.

@DennisHeimbigner
Copy link
Collaborator

Anyone know how to enable that linker capability in automake/cmake?

Copy link
Member

@WardF WardF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Greg, this makes sense! It looks like we'll need to do the same with autotools based builds, but we should be able to sort that out.

@WardF
Copy link
Member

WardF commented Jun 14, 2023

I'll take a look at the autotools (and it looks like Dennis is as well); if it's easy enough, we'll add that to this PR; I'd prefer to keep the build systems in sync unless it's problematic. But we'll get this in either way, in short order :)

@DennisHeimbigner
Copy link
Collaborator

For GNU, the linker flag of interest seems to be "-Wl,--as-needed"

@WardF
Copy link
Member

WardF commented Jun 14, 2023

@DennisHeimbigner that's good to know and will be good to include in general, but if we are adding user options to cmake, I'll wire in the equivalent --disable options to configure.ac.

@DennisHeimbigner
Copy link
Collaborator

I am reluctant to add a whole additional set of --disable options to our build.
I would be happier if we used the --without mechanism so we can just provide a list
of filters to disable.

@WardF
Copy link
Member

WardF commented Jun 26, 2023

After talking with Dennis about this this morning, we're going to take a look to see if there's a mechanism we can use in cmake and autoconf that doesn't require adding to the ever-increasing list of options. I'm not positive we'll be able to avoid it, but certainly we can't be the first project that will want to specify different linkages in a project with multiple libraries. We're also going to avoid an approach that is specific to a linker/compiler, since then we'd just be reduced to playing whack-a-mole for an arbitrary number of compilers.

If it comes down to it, we will add these options; we might also add the configure-time option as a comma-delimited list instead of a series of individual options.

Regardless, we'll resolve this with a mechanism for users to control which functionality is enabled/disabled, beyond 'does the dependent library exist on the system.'

@WardF
Copy link
Member

WardF commented Jul 18, 2023

Bumping for my own purposes; at this point I'm at a loss as to what we can do besides add these options. I'll ensure they are in parity between cmake and autoconf-based builds. We will have discussions around how to prevent going further down this route in the future.

@DennisHeimbigner
Copy link
Collaborator

There might be another possibility: create a completely separate build for the plugins directory so that
the current netcdf-c plugin does not even check for the compression libraries.

@gsjaardema
Copy link
Contributor Author

I don't know if this belongs here or is a separate PR, but being able to disable CURL might also be a good option. I can disable DAP, but CURL seems to just look if it can find the libraries and is then enabled.

@gsjaardema
Copy link
Contributor Author

gsjaardema commented Jul 18, 2023

Note that in general I like what CMake/netCDF do for configure. But we have a couple systems where it would be nice to eliminate as many dependencies as possible basically get down to libnetcdf.a -lz if needed... Or more likely libhdf5_hl.a libhdf5.a also.

@WardF
Copy link
Member

WardF commented Jul 18, 2023

Options should let users specify desired functionality at configure time, and desired functionality should dictate which libraries are checked for (and subsequently linked against). This is standard practice, and working towards that is perfectly reasonable, I agree, @gsjaardema. We just need to make sure we keep the build systems in sync, and to slow the rate of new options as much as is possible, although it's likely we will continue to need to add new options as we add new functionality.

@DennisHeimbigner
Copy link
Collaborator

Sorry, I do not like this. This is going to get out of hand very quickly.
We are going to end up with a major proliferation of new options.
If you do a ./configure -help, you will see that we already have an
unmanageable list of options now.

@DennisHeimbigner
Copy link
Collaborator

WRT to curl, we already have a --enable-remote-functionality that we can use to cut off looking for libcurl.

@WardF
Copy link
Member

WardF commented Jul 18, 2023

@DennisHeimbigner I agree that it's expanding and growing, but this is not just a netCDF issue. I think we can help mitigate this by putting in some work to reorganize and group the options presented in configure --help or as presented in cmake. But I think we're looking at O(n) in terms of features. And upon reflection, I'd argue there is nothing inherently bad about having lots of options; it's a reflection of a complex package. The issue is trying to parse the many options presented, and we can do some work to make it easier for developers to parse the output of 'configure --help' to see what they're able to do.

Also, there are a number of options that can be removed; it won't completely mitigate the current issue of readability, but I see an awful lot of options that I'll take responsibility for adding that are never used, or are flagged experimental. '--enable-doxygen-pdf-output', for example.

Looking at this, we can accomodate this issue on a one-in/one-out basis. When I go in to ensure parity tomorrow, I'll remove enough old experimental/useless options I've added so that we still see a net reduction in lines printed by configure -h. We can also take a look at what we can do with the overall readability.

@DennisHeimbigner
Copy link
Collaborator

"Get off my lawn!" :-)

@gsjaardema
Copy link
Contributor Author

image

@edwardhartnett
Copy link
Contributor

Wow @gsjaardema I didn't know emacs could do that! But I will certainly start using this feature!

@gsjaardema
Copy link
Contributor Author

Wow @gsjaardema I didn't know emacs could do that! But I will certainly start using this feature!

It does rely on the -enable-doxygen-pdf-output option being enabled on the buffer....

@edwardhartnett
Copy link
Contributor

OK, all fun and games aside, I do agree with @gsjaardema that it's important to be able to limit the dependencies. THis is really important on HPC systems.

Using netCDF on HPC systems is almost universal - I doubt there's a HPC system in operation which doesn't have netCDF installed. And this is where many of our really important data sets are written.

On HPC systems, they are must less casual about dependencies then we are in the linux workstation community, where we just don't even care if the library links to 10 other libraries that it never uses.

However, I agree with @WardF that having more and more configure options is not feasible. But we have some options which turn off major areas of functionality, like --disable-dap and --disable-netcdf-4, and in those cases, the dependencies associated with those features should not appear in nc-config or anywhere else.

@DennisHeimbigner
Copy link
Collaborator

Do we need this capability for cmake only? Or do we need it for automake as well?

@gsjaardema
Copy link
Contributor Author

Do we need this capability for cmake only? Or do we need it for automake as well?

I would be using it only with CMake.

@WardF
Copy link
Member

WardF commented Jul 19, 2023

Part of the work I'm doing (I lost a lot more of the day than expected to the auto mechanic) is to add parity between cmake and autoconf based builds, for this PR.

@DennisHeimbigner
Copy link
Collaborator

I did an experiment. It appears that if the code in netcdf-c/CMakeLists.txt that looks for
the various plugin libraries is moved to plugins/CMakeLists.txt, then libnetcdf.so does
not include those libraries. At least according to ldd.
I did this under Ubuntu22. Can anyone confirm?

@WardF
Copy link
Member

WardF commented Jul 20, 2023

It looks like the branch this is being pulled from, patch-57, is based on netCDF 4.8.1. Not a big deal, it just means I can't bring autoconf into parity in-place. Standing up a branch on my local dev machine that I can fold this in to, and also make a few additional changes.

@WardF WardF merged commit 4a092c7 into Unidata:main Jul 26, 2023
97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants