Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to turn on szip compression for variables in netCDF-4 files #1546

Closed
edwardhartnett opened this issue Nov 21, 2019 · 49 comments · Fixed by #1589
Closed

Allow users to turn on szip compression for variables in netCDF-4 files #1546

edwardhartnett opened this issue Nov 21, 2019 · 49 comments · Fixed by #1589
Assignees
Milestone

Comments

@edwardhartnett
Copy link
Contributor

HDF5 offers native access to zlib and szlib compression. However, at the time I wrote netCDF-4, szlib had an unclear license. So Russ and I decided to make it a read-only capability in netCDF-4. That is, you can read HDF5 files written with szlib, but you could not create them.

(Probably we were being too cautious anyway. HDF5 uses it and no one cares.)

The szip compression library is beloved by NASA Goddard. It does better then zlib with arrays of floating point data.

Anyway, Elena has explained that they have changed their license and are now fully free software. So we can allow users to turn this form of compression on, pretty easily. (I actually used to have a function to do this, but I took it out when we got concerned over licensing.)

This is part of #1545

@ArchangeGabriel
Copy link
Contributor

Interesting, didn’t now that they turned to free software licensing (can you give a link for this though, I still read “free for non-commercial use”?). We were relying on libaec to provide szip support on Arch, maybe we should re-evaluate that then.

@epourmal
Copy link

epourmal commented Dec 2, 2019

There is SZIP implementation from German Climate Computing Center that has BSD type license. It is fully compatible with SZIP supported by The HDF Group. You can get the source from here.

@WardF WardF self-assigned this Dec 2, 2019
@WardF WardF added this to the 4.7.4 milestone Dec 2, 2019
@ArchangeGabriel
Copy link
Contributor

@epourmal Yeah, just as I implied above, this was the de-facto replacement when SZIP was non-free. But if they are now free, re-evaluating SZIP vs libaec might be interesting.

@edwardhartnett
Copy link
Contributor Author

So here's the question: how do we handle it when people do not have libaec installed? Some alternatives include:
1 - Fail configure and demand they install it or build with --disable-netcdf-4.
2 - Build, but only include nc_def_var_szip() if libaec is installed?
3 - Include libaec in the builds (both autoconf and cmake), and build and install it if not already present.

3 seems safest - all users are guaranteed to be able to read files compressed with libaec.

@epourmal
Copy link

I agree, 3 is the safest. You can also include the source of szip with netCDF-4 source code.

@epourmal
Copy link

Forgot to mention that in HDF5 szip is a required filter, i.e., H5Dwrite will fail if the filter is not found for some reason. Do you build HDF5 libraries when you build netCDF-4 or some preinstalled binaries are used (probably both)? You will need to check that HDF5 was built with szip.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Dec 17, 2019

OK, so szip cannot be installed after HDF5 is built, however, other filters can?

To answer your question, we use an installed HDF5 library. We don't know if the user just built it, or installed it with apt-get, or what. We know nothing except what we can check. ;-)

@dopplershift
Copy link
Member

Does libaec build on windows at all? Conda-forge has windows builds for libaec disabled. I have severe reservations of making this a major feature in netCDF if a major client platform won't be able to use it.

@dopplershift
Copy link
Member

Actually, I'm going to take that a step further: I have HUGE problems with making it possible to use the netcdf-c library to write files that you can't open on Windows.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Dec 18, 2019

@dopplershift this horse has already left the barn. ;-)

With the recent addition of filters, netcdf-c can use any HDF5 filter, which means users can already write netcdf files with any filter. So it is quite possible to write netcdf-4 files which cannot be used on Windows, if there are filters that can't build on windows (and there may be).

I agree that full windows compatibility is a requirement for the compression features I am adding to netcdf-c. I will add that all compression features must be available in the F77 and F90 APIs as well. The Fortran APIs are how all modeling and climate groups use netcdf to write the giant data sets that are most concerned with compression.

My goal in the current effort is to add compression that can be assured on all platforms. That is, I want to not just use some filters, but actually ensure they are packaged with the netCDF distributions (automake and cmake) so that all users are known to have them. (If we find them already installed, we do nothing. If missing, we install them when netCDF is installed.)

This includes Windows. So we should select compression filters that work well, that can build under autotools and cmake, and that work on Windows. Any compression filters that don't meet those criteria will probably not make the cut for what I am trying to accomplish with this compression work.

But not netcdf-java. I rather doubt that all the filters exist in Java. So netcdf-java may have to fall back on using the C library for reading, as well as writing HDF5 some files. This is inevitable in any case. Eventually we will start using some new HDF5 features which John Caron did not know about, and so did not code into netcdf-java.

However, we must also not let the widespread penetration of netcdf hold us back from improving it.

As with the releases of netcdf-4.0 and netcdf-3.0, we must honor our commitments for full and complete backward compatibility in API and data formats. All existing netCDF code should continue to work, and all existing netCDF files continue to be fully readable by all future versions of the library.

@dopplershift
Copy link
Member

With the recent addition of filters, netcdf-c can use any HDF5 filter, which means users can already write netcdf files with any filter.

I'm aware of this. I'm pushing back against adding more APIs to the C library that simplify the process of creating such files and exacerbates this problem. Weren't you just arguing how hard it is to use the filter API so that few people do it and that's why we need the simple API?

However, we must also not let the widespread penetration of netcdf hold us back from improving it.

That's an easy statement to make when you're not the one fielding support requests over on the netcdf-java repo. I'm all for making improvements to better serve our community. But such decisions need to be made in regards to what's good for the netCDF data model, file format(s), and above all our community. netcdf-c is but one implementation that Unidata maintains, and additions and changes to on-disk format need to be made considering the entire portfolio of netCDF implementations. Confusion when one version of the library can read a netCDF file and one can't is NOT good for the community.

So netcdf-java may have to fall back on using the C library for reading

Again, easy to say since you're not responsible for handling the support requests for clients of netcdf-java and having to ship Java packages that now would REQUIRE shipping multiple platforms' worth of compiled libraries. It's one thing to put some of that burden on people running the TDS (and even then just to write a certain format)--it's a whole other when that means we need to worry about bundling this with every copy of IDV, Panoply, or any other user of the netCDF-java library.

I love aspirational ideas and we should be considering every transformative change we can make to better serve the needs of the community. I think you have this covered really well right now @edwardhartnett; so I'm going to be here to represent reality. And the reality is, we support both netCDF-java and netCDF-c and we need to consider both, in terms of support risk and technical effort, when we're talking about new features that impact how data files are written and what clients will be able to read them. We're already doing that with the zarr work, and work on compression should be no different.

@DennisHeimbigner
Copy link
Collaborator

At least twice now, we (Ward and I) have rejected the idea
of netcdf-c including a bunch of additional filter implementations
because of security and maintenance issues. I for one
still stand by that position.

One partial approach would be to have a new include file,
netcdf_known_filters.h say. This file would contain wrappers
for nc_def_var_filter for specific filter s -- bzip, etc.
These wrappers would accept filter specific arguments and translate them
to proper form for nc_def_var_filter.
This would allow users to use a given filter with more semantically
specific parameters.
Notes:

  1. this file would not be part of netcdf.h, so we do not pollute
    the core API.
  2. Users could propose additions to this set of wrappers.
  3. Presence of a wrapper in netcdf_known_filters.h would not
    mean that the implementation of that filter is available.
    The user is still responsible for finding and installing an
    implementation.
    None of this does addresses the problem of having files
    that cannot be completely read because it uses missing filters.
    We could build a program that takes a .nc file and prints
    out the filter ids of any filters used by that file (and which variables use them).
    Remember that until libaec, we had this problem already
    with the szip filter because it was supported, but no one had
    a non-proprietary implementation.

@edwardhartnett
Copy link
Contributor Author

Well this has been a very interesting discussion!

@dopplershift do you object to Dennis' proposal?

@DennisHeimbigner would you be willing to allow me to extend this idea to the Fortran APIs?

I just participated in a NOAA telecom. Zlib is not meeting our operational constraints and we are now looking at other filters, starting with libaec/szlib. Speed of reading is a concern, because zlib is slow to uncompress data. Parallel I/O helps with that on HPC, but many programs run sequentially, or on non-HPC systems.

@DennisHeimbigner
Copy link
Collaborator

Having Fortran follow C is fine I am sure.

@DennisHeimbigner
Copy link
Collaborator

One other thing. Is there any agreement in the community about
in picking a replacement for zlib?

@dopplershift
Copy link
Member

@edwardhartnett I think Dennis' proposal is incomplete because:

None of this does addresses the problem of having files
that cannot be completely read because it uses missing filters.

I'm not supporting any concept that leads to a proliferation of files that only netcdf-c can read. You're talking about adding features for the explicit purpose of allowing one of the major data creators in our field to create a new variant of files (which I fully support). What I'm trying to say is, any path we decide upon MUST, in no uncertain terms, include a plan that the netcdf-java team is comfortable with (cc @lesserwhirls). Full Stop. I'm not sure I can make that requirement any clearer.

@dopplershift
Copy link
Member

To put another way, at this point I think it's best that the netCDF team, which includes both netcdf-c and netcdf-java developers, figure out the best way to proceed here.

@epourmal
Copy link

One other thing. Is there any agreement in the community about
in picking a replacement for zlib?

We experimented with zlib and some VIIRS NPP files. HDF5 shuffling filter and zlib gave us a pretty good compression ratio and encoding/decoding speed on integer data.

It is probably a not a good idea to replace zlib. Users should have access to variety of compression methods since there is no "universal" compression solution and goals are different; we cannot know if someone wants to minimize storage space, or minimize time for encoding or decoding, or encoding only, or decoding only, or ...?

In our experiments we were trying to optimize for file sizes, and we got the best result when we appled different compression methods to different datasets (szip, shuffle+gzip, and no compression for some datasets because of HDF5 overhead).

@DennisHeimbigner
Copy link
Collaborator

I did not mean to imply that we obsolete zlib. Rather if the community
settles on a single compression that had wide-spread use, wemight
think about supporting that compressor in netcdf-c.

@DennisHeimbigner
Copy link
Collaborator

WRT Java. My summary of the issue is that for new filters, we either

  1. force use of JNA or
  2. make an attempt to modify the Java HDF5 implementation
    to support additional filters.
    Are there other approaches?

@WardF
Copy link
Member

WardF commented Dec 19, 2019

To put another way, at this point I think it's best that the netCDF team, which includes both netcdf-c and netcdf-java developers, figure out the best way to proceed here.

We can discuss this at the next netCDF team meeting; we'll schedule when we can include @lesserwhirls as well. One of the issues that was raised with the plugin functionality was that end users would now have the ability to create data that is not broadly readable. This was a trade-off between flexibility for end users, and broad interoperability via the plugin/filter functionality. I agree with the sentiment expressed in this thread, that the core functionality provided by libnetcdf emphasize cross-platform compatibility.

@lesserwhirls
Copy link
Contributor

WRT Java. My summary of the issue is that for new filters, we either

  1. force use of JNA or
  2. make an attempt to modify the Java HDF5 implementation
    to support additional filters.
    Are there other approaches?

I think this has it in terms of options for reading. In my ideal word, we would not need to force the use of JNA for reading any blob of bits called netcdf, ever. Even better would be that any blob of bits called netcdf would be fully reproducible, read and write, with more than one implementation and no single point of failure. I say that knowing that it's, perhaps, controversial, and only my opinion. For writing, we're kind of stuck with JNA, at least for using HDF5 as the persistence format for the netCDF enhanced data model. I don't anticipate this will be the case with, say, using Zarr for the same purpose (reading or writing).

The netCDF-Java HDF5 code already has support for a few filters, and adding to, or making those pluggable as a service provider, would not be complicated. The big question would be if a suitable compression library exists on the java side. Ideally, any compression schemes that would go into the core C library would produce data that could be readable by netCDF-java and other libraries, and vice versa...also (and stronger than ideally), available freely at no cost, and with minimal restrictions (can't get around ITAR, for example). Certainly we could write out compressed data using jpeg 2000 on the java side and call it netcdf, but we don't.

If we take the route having a canonical list of filters (which I think should be very clearly defined for netCDF-4, and not simply a reference to whatever the version of HDF you are using supports), I'd strongly recommend that be done in a more visible way than a header file in this repository. Certainly some sort of header file describing the filters and parameters would be needed at the C level, but I think that's a C implementation detail. I would say proposals for the addition of a new filter would not come in the form of a PR against netcdf_known_filters.h.

@edwardhartnett
Copy link
Contributor Author

Please discuss among yourselves and let me know how you would like to proceed. ;-)

@czender
Copy link
Contributor

czender commented Dec 19, 2019

As Ward and Dennis know (because they supported it), I submitted a proposal to the NSF CSSI program aimed at addressing some of these issues. A similar proposal that I submitted last year was rejected, and we'll have to wait until June-September to see if this new one is funded. The basic idea is a Community Codec Repository (CCR) where netCDF filter code lives, and that can be enabled to be built/installed on the user's machine during netCDF build-time (with --enable-ccr) or installed as its own package independently. This minimizes risk/maintenance for Unidata, and is intended that users can count on interoperability with any CCR codec with minimal-to-no effort. The proposal in full is here.

I agree with Ed and others that netCDF needs more modern (faster/better) lossless compression, and lossy compression as well. Lack of this hinders climate/weather modeling/research (and causes needless GHG emissions to store noise). Today I obtained the GitHub site http://github.com/ccr to host the CCR. If you are interested, please join the nascent efforts to architect the CCR so that it meets your project's needs. Feedback and code contributions are welcome there.

@epourmal
Copy link

It would be good to coordinate this effort with The HDF Group.

We do have our Codec Repository for regression testing with HDF5 and we do provide binaries for the codecs with HDF5 binaries (see our download pages). The repository is open and can be moved to a better place.

Let's coordinate and cooperate :-)

@czender
Copy link
Contributor

czender commented Dec 20, 2019

@epourmal I get "Invalid username or password." when I use my Atlassian login on that codec repository. How do I gain access?

@epourmal
Copy link

Please try again; we fixed permissions

@czender
Copy link
Contributor

czender commented Dec 20, 2019

Thanks, I have access now. I agree that we should coordinate and co-operate, because the interoperability of compressed datasets is of paramount importance to users and therefore adoption of non-DEFLATE algorithms. That is why Aleksandar Jelenak (@ajelenak-thg) of the HDF Group agreed to be a collaborator (like Ward and Dennis) on my submitted proposal to NSF. As you know, A. currently co-chairs NASA's Dataset Interoperability Working Group.

@epourmal
Copy link

Thank you, Charlie! I guess THG needs a better collaboration internally too :-)
If you at Winter ESIP, let's meet and discuss.

@edwardhartnett
Copy link
Contributor Author

OK, this has been a great discussion, about the general questions of newer filters for netCDF.

But let's set aside talk of all the other compression methods, and return to the specific topic of this issue: do we reinstate the nc_def_var_szip() function, to match nc_inq_var_szip(), which is already there?

The question of support for other, newer, compression filters is separate and may be continued elsewhere (for example in #1545), but the current issue is just about szip, which is currently partially supported by netCDF.

From netcdf.h:

/* Find out szip settings of a var. */
EXTERNL int
nc_inq_var_szip(int ncid, int varid, int *options_maskp, int *pixels_per_blockp);

Shall I submit the PR re-instating the nc_def_var_szip() function?

@DennisHeimbigner
Copy link
Collaborator

I would say yes now that libaec is available.

@edwardhartnett
Copy link
Contributor Author

OK, it turns out that HDF5 (as with zlib) offers special built-in support for szip.

What this means is we can either use szip with the filter API, or use the built in HDF5 szip functions. Both will work.

BUT! The filter method will only work when shared libraries are being built. NOAA HPC sysadmins have a (perhaps unreasonable, perhaps not) aversion to shared library builds. The NOAA GFS is built all static. Other HPC users of netCDF have told me the same.

So in this case, we want to use the built-in HDF5 functions. By doing so, we get static build functionality for free, whereas with the filter API we have to take extra steps to get static builds.

@epourmal
Copy link

epourmal commented Jan 3, 2020 via email

@edwardhartnett
Copy link
Contributor Author

OK, turns out this is already in place, and uses H5Pset_szip(), just as we want. The following code is in nc4hdf5.c:

    /* If the user wants to deflate the data, set that up now. */
    if (var->deflate) {
        if (H5Pset_deflate(plistid, var->deflate_level) < 0)
            BAIL(NC_EHDFERR);
    } else if(var->filterid) {
        /* Handle szip case here */
        if(var->filterid == H5Z_FILTER_SZIP) {
            int options_mask;
            int bits_per_pixel;
            if(var->nparams != 2)
                BAIL(NC_EFILTER);
            options_mask = (int)var->params[0];
            bits_per_pixel = (int)var->params[1];
            if(H5Pset_szip(plistid, options_mask, bits_per_pixel) < 0)
                BAIL(NC_EFILTER);
        } else {
            herr_t code = H5Pset_filter(plistid, var->filterid, H5Z_FLAG_MANDATORY, var->nparams, var->params);
            if(code < 0) {
                BAIL(NC_EFILTER);
            }
        }
    }

Looks like @DennisHeimbigner put this in as part of the filter work. So that makes this issue easier, I will just add the nc_def_var_szip() function back in...

@dopplershift
Copy link
Member

@DennisHeimbigner can you look to see what options we have to implement this in netcdf-java?

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Jan 3, 2020

If you are willing to include the filter code as part of your client
application, then there are 3 functions in include/etcdf_filter.h
that allow you to specify that filter to be used for nc_def_var_filter.
See Appendix B of tje file docs/filters.md.
This should work with static builds.

These functions are:
EXTERNL int nc_filter_register(NC_FILTER_INFO* filter_info);
EXTERNL int nc_filter_unregister(int format, int id);
EXTERNL int nc_filter_inq(int format, int id, NC_FILTER_INFO* filter_info);

@DennisHeimbigner
Copy link
Collaborator

It occurs to me that alternatively, one could compile the filter
and add it to the static library archive file (libnetcdf.a)

@Dave-Allured
Copy link
Contributor

Rather than bundling, I suggest deferring all management of filter libraries to HDF5 and user responsibility. Currently netcdf-C contains no direct dependencies on any filter libraries, not even zlib. I think this would be the easiest and simplest way forward for szip as well as arbitrary filters.

If a user has the necessary libraries installed, then their installed netcdf can read the related compressed files. If a certain library is missing, then they get a meaningful error message at the appropriate time. Then they have the usual remedy, install and/or update their libraries. This should work the same way in principle for both static and shared builds.

Certainly Unidata should continue to publish a recommended set of libraries to support the preferred set of format variations and filter options.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Jan 3, 2020

I agree that it is preferable to leave it to HDF5.
But as it stands, this statement is false

This should work the same way in principle for both static and shared builds.

because there is no way fir HDF5 to find and load the filter code
statically that I am aware of.

@edwardhartnett
Copy link
Contributor Author

@DennisHeimbigner your points are valid for other filters, but none of this is necessary for the szip filter (the topic of this issue) - it already ships with HDF5, so is always present. (Also there is a way to build the filter code statically, but I have not yet figured it out or tried it.)

What you suggest (that netcdf-c contain the filter code) has merit and I did originally support that idea. But then, working with filters for a bit, and reading concerns from you, @WardF and @dopplershift, I realized that HDF5 should actually fill this role. The filter code is much closer to HDF5 code than it is to netCDF code.

To take the LZ4 filter as an example, it is only one C file. It would be trivial to include it in every HDF5 build, and (just as with zlib and szip) check for it during configure, and built the filter if liblz4 is found.

I am going to suggest this to the HDF5 team (howdy @epourmal!) Of course this means I will volunteer to submit changes to their build systems to support this - which will be easy. If that can be achieved, then netcdf-c faces a situation where all HDF5 installs can be required to have LZ4 (like zlib) or just support it if it is present (like szip).

@Dave-Allured, what you propose is essentially the current situation. Continuing with the LZ4 example: if the user does not install the LZ4 library and filter, they cannot read LZ4 compressed data, and they get a filter error. Using ncdump -h -s they can find out the number of the missing filter. Then they can figure out how to install the library and filter and get the data. They will not need to recompile HDF5 or netcdf-c to do so, if they are using shared libraries. (@DennisHeimbigner please correct me if I am wrong about the filter API.)

What I would prefer is if HDF5 always included the LZ4 filter code, and then netcdf-c can require (as it does with zlib) or optionally support (as it does with szip) LZ4. In other words, the user must install zlib, and probably should install szip and lz4 libraries, before building HDF5 (and we can check this at netcdf-c configure time).

This would allow everyone to have LZ4 reliably on Linux, Max, and Windows (since all these packages support cmake builds on windows).

netcdf-java would then have to write or find an LZ4 decompressor (or switch to using HDF5 C library, which is probably inevitable anyway).

Any old versions of netCDF and HDF5 would have to upgrade to read the LZ4 compressed data (but that does not violate the backward compatibility guarantee).

So getting HDF5 to always ship the LZ4 filter code (and build it when liblz4 is found) would present a lot of advantages.

However, this is not an issue for szip. ;-)

@Dave-Allured
Copy link
Contributor

Dave-Allured commented Jan 3, 2020

@DennisHeimbigner wrote:

because there is no way for HDF5 to find and load the filter code
statically that I am aware of.

Agreed. My remark "in principle" was oversimplified, sorry. I meant that regardless of static or shared, if the user's set of installed libraries is missing something, they will still get a controlled runtime message along the lines of "unknown filter", either way. Then their remedy is also about the same, either way.

Ed's detailed reply of just now reinforces what I was trying to say, including the part about optional libraries in static builds.

@czender
Copy link
Contributor

czender commented Jan 3, 2020

Expecting users to track down and install libraries for missing filters will doom filters to the limbo of the unreliable. Most filters of interest are freely redistributable and we should aim to provide a user-friendlier mechanism to guarantee their presence.

@Dave-Allured
Copy link
Contributor

Expecting users to track down and install libraries for missing filters will doom filters to the limbo of the unreliable.

And yet, that is just what we expect these days for almost all other library dependencies in open source software, including the netcdf ecosystem. I just see filters as an extension of the same thing. I will now refrain from a litany of the pros and cons, trying to respect the current main topic.

My most important point is, defer filter management to HDF5. They have been doing it well for decades!

Most filters of interest are freely redistributable and we should aim to provide a user-friendlier mechanism

Several are already in place. Simple improvements could be made in some cases.

  • List of sanctioned filters in build documentation (exists)
  • Pointers to filter source repos in build documentation (missing)
  • Improved install documentation
  • Binary filter bundles (exists)
  • Package managers (debian, anaconda, macports, etc.)

to guarantee their presence.

I substitute the resources above, plus user responsibility, for "guarantee". This becomes a matter of opinion about user behavior. We can just disagree for now.

@Dave-Allured
Copy link
Contributor

... your points are valid for other filters, but none of this is necessary for the szip filter (the topic of this issue) - it already ships with HDF5, so is always present.

Ed, I searched recent HDF5 source distributions -- the C sources only. I was not able to find the szip library, only the HDF5 interface functions to that library; plus some precompiled binaries. It would seem that you really do need to separately install the szip library, at least for a full build from source. Am I missing something?

@DennisHeimbigner
Copy link
Collaborator

Dave is correct. The HDF code base does not contain any compressor
code, including zip and szip, It relies on externally supplied libraries.
All it provides is the wrapper code.

@dopplershift
Copy link
Member

Expecting users to track down and install libraries for missing filters will doom filters to the limbo of the unreliable. Most filters of interest are freely redistributable and we should aim to provide a user-friendlier mechanism to guarantee their presence.

I agree, we need to make sure we provide an easy way for users to get their hands on the C and Java code to enable the filters they need.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Jan 4, 2020

To clarify, there are two bits of code involved in a compression filter like zlib, szip, and LZ4.

There is an external library (libz, libszip, liblz4). Then there is the HDF5 code that is needed to use the library and act as a filter (a.k.a the "filter code"). So the filter code is different from the external library. (And the filter code is small: the LZ4 filter code is only one file, ~250 lines of code, including comments.)

No one is suggesting that the netcdf or HDF5 teams take any responsibility for the external libraries. It will always be true that users must locate and install zlib before building HDF5. The same for szip, lz4, and any other compression method that uses an external library.

The filter code that uses zlib is already part of HDF5, so it builds with HDF5. The same for the szip filter code. If szip is found when HDF5 is built, then the filter for szip is built. But for other filters, like LZ4, the HDF5 filter code does not come with HDF5, it must be found, built, and installed separately by the user. Dealing with the HDF5 filter code is the extra step we want to eliminate.

What I am proposing is that the filter code for LZ4 be treated the same way as the szip filter - shipped and installed with HDF5 (when liblz4 is available at HDF5 build time). Then, at netcdf-c configure time, we can check whether HDF5 supports LZ4, and include support in netcdf-c, or else return the NC_ENOTBUILT error when the user tries to use LZ4.

No changes are needed in the HDF5 library to support this. We just need to change the HDF5 install systems a little, to install the LZ4 filter, when liblz4 is found on the machine.

This way, as with zlib and szip, the user will only have to be responsible installing the compression libraries and then building HDF5. There would be no additional step (as there is now) where the user has to get the LZ4 filter code, build it, and install it in a non-standard place to get it working.

I agree with @Dave-Allured that system admins and users are quite used to and capable of assembling the libraries needed. And package management systems like yum and apt can and will automate all this for most users.

But the filter code is something else - something non-standard. Let's try and solve that part for the user, for LZ4 (and perhaps other additional filters, if that seems desirable).

The filter code is really HDF5 code, and should ship and install with HDF5. But we should not be afraid of taking action ourselves, even if the HDF5 team does not ship the filter. These are small bits of code, and, where available, I certainly would not be terrified to take on support for them. They are mostly wrapper codes, and would require little or no additional work.

The idea of the Community Codec Repo (CCR) is that it will provide all the filter code, and the integration glue needed to get it working easily in netcdf-c. If the HDF5 team is reluctant to ship the LZ4 filter code, we can put it in CCR. The user will then have an extra step (to download and install CCR), but it will be a standard install, and will take care of all the HDF5 non-standard filter stuff automatically.

To the extent that we can package these solutions with HDF5 and netcdf-c, we will not have to put them in the CCR. Ideally, with LZ4 support in both HDF5 and netcdf-c, there would be no role for the CCR with LZ4.

@Dave-Allured
Copy link
Contributor

Ed, thanks for clarifying. I misinterpreted the generic term "filter code", thus I confused the discussion of API's and filter interlude code vs. external libraries.

I would prefer a user-facing filter API that is generic and stable for all filters. The current nc_def_var_filter fits that, but there is room for improving ease of use. Unidata's objections to expanding the number of filter-specific API's is understandable. So for the time being, I support the idea of new specific filter API's as external code, as you are now doing with the CCR.

@Dave-Allured
Copy link
Contributor

On the other hand, adding a handful of preferred filters with specific API's would be pretty reasonable, as you have previously said.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 3, 2023
Release Notes       {#RELEASE_NOTES}
=============

\brief Release notes file for the netcdf-c package.

This file contains a high-level description of this package's
evolution. Releases are in reverse chronological order (most recent
first). Note that, as of netcdf 4.2, the `netcdf-c++` and
`netcdf-fortran` libraries have been separated into their own
libraries.

## 4.9.3 - TBD

## 4.9.2 - March 14, 2023

This is the maintenance release which adds support for HDF5 version
1.14.0, in addition to a handful of other changes and bugfixes.

* Fix 'make distcheck' error in run_interop.sh. See [Github
  #????](https://github.com/Unidata/netcdf-c/pull/????).
* Update `nc-config` to remove inclusion from automatically-detected
  `nf-config` and `ncxx-config` files, as the wrong files could be
  included in the output.  This is in support of [GitHub
  #2274](Unidata/netcdf-c#2274).

* Update H5FDhttp.[ch] to work with HDF5 version 1.13.2 and later. See
  [Github #2635](Unidata/netcdf-c#2635).

* [Bug Fix] Update DAP code to enable CURLOPT_ACCEPT_ENCODING by
  default. See [Github
  #2630](Unidata/netcdf-c#2630).

* [Bug Fix] Fix byterange failures for certain URLs. See [Github
  #2649](Unidata/netcdf-c#2649).

* [Bug Fix] Fix 'make distcheck' error in run_interop.sh. See [Github
  #2631](Unidata/netcdf-c#2631).

* [Enhancement] Update `nc-config` to remove inclusion from
  automatically-detected `nf-config` and `ncxx-config` files, as the
  wrong files could be included in the output.  This is in support of
  [GitHub #2274](Unidata/netcdf-c#2274).

* [Enhancement] Update H5FDhttp.[ch] to work with HDF5 version
  1.14.0. See [Github
  #2615](Unidata/netcdf-c#2615).

## 4.9.1 - February 2, 2023

## Known Issues

* A test in the `main` branch of `netcdf-cxx4` is broken by this rc; this will
  bear further investigation, but not being treated as a roadblock for the
  release candidate.

* The new document, `netcdf-c/docs/filter_quickstart.md` is in
  rough-draft form.

* Race conditions exist in some of the tests when run concurrently with large
  numbers of processors

## What's Changed from v4.9.0 (automatically generated)

* Fix nc_def_var_fletcher32 operation by \@DennisHeimbigner in
  Unidata/netcdf-c#2403

* Merge relevant info updates back into `main` by \@WardF in
  Unidata/netcdf-c#2387

* Add manual GitHub actions triggers for the tests. by \@WardF in
  Unidata/netcdf-c#2404

* Use env variable USERPROFILE instead of HOME for windows and mingw. by
  \@DennisHeimbigner in Unidata/netcdf-c#2405

* Make public a limited API for programmatic access to internal .rc tables by
  \@DennisHeimbigner in Unidata/netcdf-c#2408

* Fix typo in CMakeLists.txt by \@georgthegreat in
  Unidata/netcdf-c#2412

* Fix choice of HOME dir by \@DennisHeimbigner in
  Unidata/netcdf-c#2416

* Check for libxml2 development files by \@WardF in
  Unidata/netcdf-c#2417

* Updating Doxyfile.in with doxygen-1.8.17, turned on WARN_AS_ERROR, added
  doxygen build to CI run by \@edwardhartnett in
  Unidata/netcdf-c#2377

* updated release notes by \@edwardhartnett in
  Unidata/netcdf-c#2392

* increase read block size from 1 KB to 4 MB by \@wkliao in
  Unidata/netcdf-c#2319

* fixed RELEASE_NOTES.md by \@edwardhartnett in
  Unidata/netcdf-c#2423

* Fix pnetcdf tests in cmake by \@WardF in
  Unidata/netcdf-c#2437

* Updated CMakeLists to avoid corner case cmake error by \@WardF in
  Unidata/netcdf-c#2438

* Add `--disable-quantize` to configure by \@WardF in
  Unidata/netcdf-c#2439

* Fix the way CMake handles -DPLUGIN_INSTALL_DIR by \@DennisHeimbigner in
  Unidata/netcdf-c#2430

* fix and test quantize mode for NC_CLASSIC_MODEL by \@edwardhartnett in
  Unidata/netcdf-c#2445

* Guard _declspec(dllexport) in support of #2446 by \@WardF in
  Unidata/netcdf-c#2460

* Ensure that netcdf_json.h does not interfere with ncjson. by
  \@DennisHeimbigner in Unidata/netcdf-c#2448

* Prevent cmake writing to source dir by \@magnusuMET in
  Unidata/netcdf-c#2463

* more quantize testing and adding pre-processor constant NC_MAX_FILENAME to
  nc_tests.h by \@edwardhartnett in
  Unidata/netcdf-c#2457

* Provide a default enum const when fill value does not match any enum
  constant by \@DennisHeimbigner in
  Unidata/netcdf-c#2462

* Fix support for reading arrays of HDF5 fixed size strings by
  \@DennisHeimbigner in Unidata/netcdf-c#2466

* fix musl build by \@magnusuMET in
  Unidata/netcdf-c#1701

* Fix AWS SDK linking errors by \@dzenanz in
  Unidata/netcdf-c#2470

* Address jump-misses-init issue. by \@WardF in
  Unidata/netcdf-c#2488

* Remove stray merge conflict markers by \@WardF in
  Unidata/netcdf-c#2493

* Add support for Zarr string type to NCZarr by \@DennisHeimbigner in
  Unidata/netcdf-c#2492

* Fix some problems with PR 2492 by \@DennisHeimbigner in
  Unidata/netcdf-c#2497

* Fix some bugs in the blosc filter wrapper by \@DennisHeimbigner in
  Unidata/netcdf-c#2461

* Add option to control accessing external servers by \@DennisHeimbigner in
  Unidata/netcdf-c#2491

* Changed attribute case in documentation by \@WardF in
  Unidata/netcdf-c#2482

* Adding all-error-codes.md back in to distribution documentation. by \@WardF in
  Unidata/netcdf-c#2501

* Update hdf5 version in github actions. by \@WardF in
  Unidata/netcdf-c#2504

* Minor update to doxygen function documentation by \@gsjaardema in
  Unidata/netcdf-c#2451

* Fix some addtional errors in NCZarr by \@DennisHeimbigner in
  Unidata/netcdf-c#2503

* Cleanup szip handling some more by \@DennisHeimbigner in
  Unidata/netcdf-c#2421

* Check for zstd development headers in autotools by \@WardF in
  Unidata/netcdf-c#2507

* Add new options to nc-config by \@WardF in
  Unidata/netcdf-c#2509

* Cleanup built test sources in nczarr_test by \@DennisHeimbigner in
  Unidata/netcdf-c#2508

* Fix inconsistency in netcdf_meta.h by \@WardF in
  Unidata/netcdf-c#2512

* Small fix in nc-config.in by \@WardF in
  Unidata/netcdf-c#2513

* For loop initial declarations are only allowed in C99 mode by \@gsjaardema in
  Unidata/netcdf-c#2517

* Fix some dependencies in tst_nccopy3 by \@WardF in
  Unidata/netcdf-c#2518

* Update plugins/Makefile.am by \@WardF in
  Unidata/netcdf-c#2519

* Fix prereqs in ncdump/tst_nccopy4 in order to avoid race conditions. by
  \@WardF in Unidata/netcdf-c#2520

* Move construction of VERSION file to end of the build by \@DennisHeimbigner in
  Unidata/netcdf-c#2527

* Add draft filter quickstart guide by \@WardF in
  Unidata/netcdf-c#2531

* Turn off extraneous debug output by \@DennisHeimbigner in
  Unidata/netcdf-c#2537

* typo fix by \@wkliao in Unidata/netcdf-c#2538

* replace 4194304 with READ_BLOCK_SIZE by \@wkliao in
  Unidata/netcdf-c#2539

* Rename variable to avoid function name conflict by \@ibaned in
  Unidata/netcdf-c#2550

* Add Cygwin CI and stop installing unwanted plugins by \@DWesl in
  Unidata/netcdf-c#2529

* Merge subset of v4.9.1 files back into main development branch by \@WardF in
  Unidata/netcdf-c#2530

* Add a Filter quickstart guide document by \@WardF in
  Unidata/netcdf-c#2524

* Fix race condition in ncdump (and other) tests. by \@DennisHeimbigner in
  Unidata/netcdf-c#2552

* Make dap4 reference dap instead of hard-wired to be disabled. by \@WardF in
  Unidata/netcdf-c#2553

* Suppress nczarr_test/tst_unknown filter test by \@DennisHeimbigner in
  Unidata/netcdf-c#2557

* Add fenceposting for HAVE_DECL_ISINF and HAVE_DECL_ISNAN by \@WardF in
  Unidata/netcdf-c#2559

* Add an old static file. by \@WardF in
  Unidata/netcdf-c#2575

* Fix infinite loop in file inferencing by \@DennisHeimbigner in
  Unidata/netcdf-c#2574

* Merge Wellspring back into development branch by \@WardF in
  Unidata/netcdf-c#2560

* Allow ncdump -t to handle variable length string attributes by \@srherbener in
  Unidata/netcdf-c#2584

* Fix an issue I introduced with make distcheck by \@WardF in
  Unidata/netcdf-c#2590

* make UDF0 not require NC_NETCDF4 by \@jedwards4b in
  Unidata/netcdf-c#2586

* Expose user-facing documentation related to byterange DAP functionality.  by
  \@WardF in Unidata/netcdf-c#2596

* Fix Memory Leak by \@DennisHeimbigner in
  Unidata/netcdf-c#2598

* CI: Change autotools CI build to out-of-tree build. by \@DWesl in
  Unidata/netcdf-c#2577

* Update github action configuration scripts. by \@WardF in
  Unidata/netcdf-c#2600

* Update the filter quickstart guide.  by \@WardF in
  Unidata/netcdf-c#2602

* Fix symbol export on Windows by \@WardF in
  Unidata/netcdf-c#2604

## New Contributors
* \@georgthegreat made their first contribution in Unidata/netcdf-c#2412
* \@dzenanz made their first contribution in Unidata/netcdf-c#2470
* \@DWesl made their first contribution in Unidata/netcdf-c#2529
* \@srherbener made their first contribution in Unidata/netcdf-c#2584
* \@jedwards4b made their first contribution in Unidata/netcdf-c#2586

**Full Changelog**: Unidata/netcdf-c@v4.9.0...v4.9.1

### 4.9.1 - Release Candidate 2 - November 21, 2022

#### Known Issues

* A test in the `main` branch of `netcdf-cxx4` is broken by this rc;
  this will bear further investigation, but not being treated as a
  roadblock for the release candidate.

* The new document, `netcdf-c/docs/filter_quickstart.md` is in rough-draft form.

#### Changes

* [Bug Fix] Fix a race condition when testing missing filters. See
  [Github #2557](Unidata/netcdf-c#2557).

* [Bug Fix] Fix some race conditions due to use of a common file in multiple
  shell scripts . See [Github
  #2552](Unidata/netcdf-c#2552).


### 4.9.1 - Release Candidate 1 - October 24, 2022

* [Enhancement][Documentation] Add Plugins Quick Start Guide.  See
  [GitHub #2524](Unidata/netcdf-c#2524) for
  more information.

* [Enhancement] Add new entries in `netcdf_meta.h`, `NC_HAS_BLOSC` and
  `NC_HAS_BZ2`. See [Github
  #2511](Unidata/netcdf-c#2511) and [Github
  #2512](Unidata/netcdf-c#2512) for more
  information.

* [Enhancement] Add new options to `nc-config`: `--has-multifilters`,
  `--has-stdfilters`, `--has-quantize`, `--plugindir`.  See [Github
  #2509](Unidata/netcdf-c#2509) for more
  information.

* [Bug Fix] Fix some errors detected in PR 2497. [PR
  #2497](Unidata/netcdf-c#2497) . See [Github
  #2503](Unidata/netcdf-c#2503).

* [Bug Fix] Split the remote tests into two parts: one for the
  remotetest server and one for all other external servers. Also add a
  configure option to enable the latter set. See [Github
  #2491](Unidata/netcdf-c#2491).

* [Bug Fix] Fix blosc plugin errors. See [Github
  #2461](Unidata/netcdf-c#2461).

* [Bug Fix] Fix support for reading arrays of HDF5 fixed size
  strings. See [Github
  #2466](Unidata/netcdf-c#2466).

* [Bug Fix] Fix some errors detected in [PR
  #2492](Unidata/netcdf-c#2492) . See [Github
  #2497](Unidata/netcdf-c#2497).

* [Enhancement] Add support for Zarr (fixed length) string type in
  nczarr. See [Github
  #2492](Unidata/netcdf-c#2492).

* [Bug Fix] Split the remote tests into two parts: one for the
  remotetest server and one for all other external servers. Also add a
  configure option to enable the latter set. See [Github
  #2491](Unidata/netcdf-c#2491).

* [Bug Fix] Fix support for reading arrays of HDF5 fixed size
  strings. See [Github
  #2462](Unidata/netcdf-c#2466).

* [Bug Fix] Provide a default enum const when fill value does not
  match any enum constant for the value zero. See [Github
  #2462](Unidata/netcdf-c#2462).

* [Bug Fix] Fix the json submodule symbol conflicts between libnetcdf
  and the plugin specific netcdf_json.h. See [Github
  #2448](Unidata/netcdf-c#2448).

* [Bug Fix] Fix quantize with CLASSIC_MODEL files. See [Github
  #2405](Unidata/netcdf-c#2445).

* [Enhancement] Add `--disable-quantize` option to `configure`.

* [Bug Fix] Fix CMakeLists.txt to handle all acceptable boolean values
  for -DPLUGIN_INSTALL_DIR. See [Github
  #2430](Unidata/netcdf-c#2430).

* [Bug Fix] Fix tst_vars3.c to use the proper szip flag. See [Github
  #2421](Unidata/netcdf-c#2421).

* [Enhancement] Provide a simple API to allow user access to the
  internal .rc file table: supports get/set/overwrite of entries of
  the form "key=value". See [Github
  #2408](Unidata/netcdf-c#2408).

* [Bug Fix] Use env variable USERPROFILE instead of HOME for windows
  and mingw. See [Github
  #2405](Unidata/netcdf-c#2405).

* [Bug Fix] Fix the nc_def_var_fletcher32 code in hdf5 to properly
  test value of the fletcher32 argument. See [Github
  #2403](Unidata/netcdf-c#2403).

## 4.9.0 - June 10, 2022

* [Enhancement] Add quantize functions nc_def_var_quantize() and
  nc_inq_var_quantize() to enable lossy compression. See [Github
  #1548](Unidata/netcdf-c#1548).

* [Enhancement] Add zstandard compression functions nc_def_var_zstandard() and
  nc_inq_var_zstandard(). See [Github
  #2173](Unidata/netcdf-c#2173).

* [Enhancement] Have netCDF-4 logging output one file per processor when used
  with parallel I/O. See [Github
  #1762](Unidata/netcdf-c#1762).

* [Enhancement] Improve filter installation process to avoid use of an extra
  shell script. See [Github
  #2348](Unidata/netcdf-c#2348).

* [Bug Fix] Get "make distcheck" to work See [Github
  #2343](Unidata/netcdf-c#2343).

* [Enhancement] Allow the read/write of JSON-valued Zarr attributes to allow for
  domain specific info such as used by GDAL/Zarr. See [Github
  #2278](Unidata/netcdf-c#2278).

* [Enhancement] Turn on the XArray convention for NCZarr files by
  default. WARNING, this means that the mode should explicitly specify "nczarr"
  or "zarr" even if "xarray" or "noxarray" is specified. See [Github
  #2257](Unidata/netcdf-c#2257).

* [Enhancement] Update the documentation to match the current filter
  capabilities See [Github
  #2249](Unidata/netcdf-c#2249).

* [Enhancement] Update the documentation to match the current filter
  capabilities. See [Github
  #2249](Unidata/netcdf-c#2249).

* [Enhancement] Support installation of pre-built standard filters into
  user-specified location. See [Github
  #2318](Unidata/netcdf-c#2318).

* [Enhancement] Improve filter support. More specifically (1) add
  nc_inq_filter_avail to check if a filter is available, (2) add the notion of
  standard filters, (3) cleanup szip support to fix interaction with NCZarr. See
  [Github #2245](Unidata/netcdf-c#2245).

* [Enhancement] Switch to tinyxml2 as the default xml parser implementation. See
  [Github #2170](Unidata/netcdf-c#2170).

* [Bug Fix] Require that the type of the variable in nc_def_var_filter is not
  variable length. See [Github
  #/2231](Unidata/netcdf-c#2231).

* [File Change] Apply HDF5 v1.8 format compatibility when writing to previous
  files, as well as when creating new files.  The superblock version remains at
  2 for newly created files.  Full backward read/write compatibility for
  netCDF-4 is maintained in all cases.  See [Github
  #2176](Unidata/netcdf-c#2176).

* [Enhancement] Add ability to set dataset alignment for netcdf-4/HDF5
  files. See [Github #2206](Unidata/netcdf-c#2206).

* [Bug Fix] Improve UTF8 support on windows so that it can use utf8
  natively. See [Github #2222](Unidata/netcdf-c#2222).

* [Enhancement] Add complete bitgroom support to NCZarr. See [Github
  #2197](Unidata/netcdf-c#2197).

* [Bug Fix] Clean up the handling of deeply nested VLEN types. Marks
  nc_free_vlen() and nc_free_string as deprecated in favor of
  ncaux_reclaim_data(). See [Github
  #2179](Unidata/netcdf-c#2179).

* [Bug Fix] Make sure that netcdf.h accurately defines the flags in the
  open/create mode flags. See [Github
  #2183](Unidata/netcdf-c#2183).

* [Enhancement] Improve support for msys2+mingw platform. See [Github
  #2171](Unidata/netcdf-c#2171).

* [Bug Fix] Clean up the various inter-test dependencies in ncdump for
  CMake. See [Github #2168](Unidata/netcdf-c#2168).

* [Bug Fix] Fix use of non-aws appliances. See [Github
  #2152](Unidata/netcdf-c#2152).

* [Enhancement] Added options to suppress the new behavior from [Github
  #2135](Unidata/netcdf-c#2135).  The options for
  `cmake` and `configure` are, respectively `-DENABLE_LIBXML2` and
  `--(enable/disable)-libxml2`. Both of these options defaul to 'on/enabled'.
  When disabled, the bundled `ezxml` XML interpreter is used regardless of
  whether `libxml2` is present on the system.

* [Enhancement] Support optional use of libxml2, otherwise default to ezxml. See
  [Github #2135](Unidata/netcdf-c#2135) -- H/T to
  [Egbert Eich](https://github.com/e4t).

* [Bug Fix] Fix several os related errors. See [Github
  #2138](Unidata/netcdf-c#2138).

* [Enhancement] Support byte-range reading of netcdf-3 files stored in private
  buckets in S3. See [Github
  #2134](Unidata/netcdf-c#2134)

* [Enhancement] Support Amazon S3 access for NCZarr. Also support use of the
  existing Amazon SDK credentials system. See [Github
  #2114](Unidata/netcdf-c#2114)

* [Bug Fix] Fix string allocation error in H5FDhttp.c. See [Github
  #2127](Unidata/netcdf-c#2127).

* [Bug Fix] Apply patches for ezxml and for selected oss-fuzz detected
  errors. See [Github #2125](Unidata/netcdf-c#2125).

* [Bug Fix] Ensure that internal Fortran APIs are always defined. See [Github
  #2098](Unidata/netcdf-c#2098).

* [Enhancement] Support filters for NCZarr. See [Github
  #2101](Unidata/netcdf-c#2101)

* [Bug Fix] Make PR 2075 long file name be idempotent. See [Github
  #2094](Unidata/netcdf-c#2094).


## 4.8.1 - August 18, 2021

* [Bug Fix] Fix multiple bugs in libnczarr. See [Github
  #2066](Unidata/netcdf-c#2066).

* [Enhancement] Support windows network paths (e.g. \\svc\...). See [Github
  #2065](Unidata/netcdf-c#2065).

* [Enhancement] Convert to a new representation of the NCZarr meta-data
  extensions: version 2. Read-only backward compatibility is provided. See
  [Github #2032](Unidata/netcdf-c#2032).

* [Bug Fix] Fix dimension_separator bug in libnczarr. See [Github
  #2035](Unidata/netcdf-c#2035).

* [Bug Fix] Fix bugs in libdap4. See [Github
  #2005](Unidata/netcdf-c#2005).

* [Bug Fix] Store NCZarr fillvalue as a singleton instead of a 1-element
  array. See [Github #2017](Unidata/netcdf-c#2017).

* [Bug Fixes] The netcdf-c library was incorrectly determining the scope of
  dimension; similar to the type scope problem. See [Github
  #2012](Unidata/netcdf-c#2012) for more information.

* [Bug Fix] Re-enable DAP2 authorization testing. See [Github
  #2011](Unidata/netcdf-c#2011).

* [Bug Fix] Fix bug with windows version of mkstemp that causes failure to
  create more than 26 temp files. See [Github
  #1998](Unidata/netcdf-c#1998).

* [Bug Fix] Fix ncdump bug when printing VLENs with basetype char. See [Github
  #1986](Unidata/netcdf-c#1986).

* [Bug Fixes] The netcdf-c library was incorrectly determining the scope of
  types referred to by nc_inq_type_equal. See [Github
  #1959](Unidata/netcdf-c#1959) for more information.

* [Bug Fix] Fix bug in use of XGetopt when building under Mingw. See [Github
  #2009](Unidata/netcdf-c#2009).

* [Enhancement] Improve the error reporting when attempting to use a filter for
  which no implementation can be found in HDF5_PLUGIN_PATH. See [Github
  #2000](Unidata/netcdf-c#2000) for more information.

* [Bug Fix] Fix `make distcheck` issue in `nczarr_test/` directory. See [Github
  #2007](Unidata/netcdf-c#2007).

* [Bug Fix] Fix bug in NCclosedir in dpathmgr.c. See [Github
  #2003](Unidata/netcdf-c#2003).

* [Bug Fix] Fix bug in ncdump that assumes that there is a relationship between
  the total number of dimensions and the max dimension id. See [Github
  #2004](Unidata/netcdf-c#2004).

* [Bug Fix] Fix bug in JSON processing of strings with embedded quotes. See
  [Github #1993](Unidata/netcdf-c#1993).

* [Enhancement] Add support for the new "dimension_separator" enhancement to
  Zarr v2. See [Github #1990](Unidata/netcdf-c#1990) for
  more information.

* [Bug Fix] Fix hack for handling failure of shell programs to properly handle
  escape characters. See [Github
  #1989](Unidata/netcdf-c#1989).

* [Bug Fix] Allow some primitive type names to be used as identifiers depending
  on the file format. See [Github
  #1984](Unidata/netcdf-c#1984).

* [Enhancement] Add support for reading/writing pure Zarr storage format that
  supports the XArray _ARRAY_DIMENSIONS attribute. See [Github
  #1952](Unidata/netcdf-c#1952) for more information.

* [Update] Updated version of bzip2 used in filter testing/functionality, in
  support of [Github #1969](Unidata/netcdf-c#1969).

* [Bug Fix] Corrected HDF5 version detection logic as described in [Github
  #1962](Unidata/netcdf-c#1962).

## 4.8.0 - March 30, 2021

* [Enhancement] Bump the NC_DISPATCH_VERSION from 2 to 3, and as a side effect,
  unify the definition of NC_DISPATCH_VERSION so it only needs to be defined in
  CMakeLists.txt and configure.ac. See [Github
  #1945](Unidata/netcdf-c#1945) for more information.

* [Enhancement] Provide better cross platform path name management. This
  converts paths for various platforms (e.g. Windows, MSYS, etc.) so that they
  are in the proper format for the executing platform. See [Github
  #1958](Unidata/netcdf-c#1958) for more information.

* [Bug Fixes] The nccopy program was treating -d0 as turning deflation on rather
  than interpreting it as "turn off deflation". See [Github
  #1944](Unidata/netcdf-c#1944) for more information.

* [Enhancement] Add support for storing NCZarr data in zip files. See [Github
  #1942](Unidata/netcdf-c#1942) for more information.

* [Bug Fixes] Make fillmismatch the default for DAP2 and DAP4; too many servers
  ignore this requirement.

* [Bug Fixes] Fix some memory leaks in NCZarr, fix a bug with long strides in
  NCZarr. See [Github #1913](Unidata/netcdf-c#1913) for
  more information.

* [Enhancement] Add some optimizations to NCZarr, dosome cleanup of code cruft,
  add some NCZarr test cases, add a performance test to NCZarr. See [Github
  #1908](Unidata/netcdf-c#1908) for more information.

* [Bug Fix] Implement a better chunk cache system for NCZarr. The cache now uses
  extendible hashing plus a linked list for provide a combination of
  expandibility, fast access, and LRU behavior. See [Github
  #1887](Unidata/netcdf-c#1887) for more information.

* [Enhancement] Provide .rc fields for S3 authentication: HTTP.S3.ACCESSID and
  HTTP.S3.SECRETKEY.

* [Enhancement] Give the client control over what parts of a DAP2 URL are URL
  encoded (i.e. %xx). This is to support the different decoding rules that
  servers apply to incoming URLS. See [Github
  #1884](Unidata/netcdf-c#1884) for more information.

* [Bug Fix] Fix incorrect time offsets from `ncdump -t`, in some cases when the
  time `units` attribute contains both a **non-zero** time-of-day, and a time
  zone suffix containing the letter "T", such as "UTC".  See [Github
  #1866](Unidata/netcdf-c#1866) for more information.

* [Bug Fix] Cleanup the NCZarr S3 build options. See [Github
  #1869](Unidata/netcdf-c#1869) for more information.

* [Bug Fix] Support aligned access for selected ARM processors.  See [Github
  #1871](Unidata/netcdf-c#1871) for more information.

* [Documentation] Migrated the documents in the NUG/ directory to the dedicated
  NUG repository found at https://github.com/Unidata/netcdf

* [Bug Fix] Revert the internal filter code to simplify it. From the user's
  point of view, the only visible change should be that (1) the functions that
  convert text to filter specs have had their signature reverted and renamed and
  have been moved to netcdf_aux.h, and (2) Some filter API functions now return
  NC_ENOFILTER when inquiry is made about some filter. Internally, the dispatch
  table has been modified to get rid of the complex structures.

* [Bug Fix] If the HDF5 byte-range Virtual File Driver is available )HDf5 1.10.6
  or later) then use it because it has better performance than the one currently
  built into the netcdf library.

* [Bug Fix] Fixed byte-range support with cURL > 7.69. See
  [Unidata/netcdf-c#1798].

* [Enhancement] Added new test for using compression with parallel I/O:
  nc_test4/tst_h_par_compress.c. See
  [Unidata/netcdf-c#1784].

* [Bug Fix] Don't return error for extra calls to nc_redef() for netCDF/HDF5
  files, unless classic model is in use. See
  [Unidata/netcdf-c#1779].

* [Enhancement] Added new parallel I/O benchmark program to mimic NOAA UFS data
  writes, built when --enable-benchmarks is in configure. See
  [Unidata/netcdf-c#1777].

* [Bug Fix] Now allow szip to be used on variables with unlimited dimension
  [Unidata/netcdf-c#1774].

* [Enhancement] Add support for cloud storage using a variant of the Zarr
  storage format. Warning: this feature is highly experimental and is subject to
  rapid evolution
  [https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in].

* [Bug Fix] Fix nccopy to properly set default chunking parameters when not
  otherwise specified. This can significantly improve performance in selected
  cases. Note that if seeing slow performance with nccopy, then, as a
  work-around, specifically set the chunking
  parameters. [Unidata/netcdf-c#1763].

* [Bug Fix] Fix some protocol bugs/differences between the netcdf-c library and
  the OPeNDAP Hyrax server. Also cleanup checksum handling
  [Unidata/netcdf-c#1712 [Bug Fix] IMPORTANT: Ncgen
  was not properly handling large data sections. The problem manifests as
  incorrect ordering of data in the created file. Aside from examining the file
  with ncdump, the error can be detected by running ncgen with the -lc flag (to
  produce a C file). Examine the file to see if any variable is written in pieces
  as opposed to a single call to nc_put_vara. If multiple calls to nc_put_vara are
  used to write a variable, then it is probable that the data order is
  incorrect. Such multiple writes can occur for large variables and especially
  when one of the dimensions is unlimited.

* [Bug Fix] Add necessary __declspec declarations to allow compilation of netcdf
  library without causing errors or (_declspec related) warnings
  [Unidata/netcdf-c#1725].

* [Enhancement] When a filter is applied twice with different
parameters, then the second set is used for writing the dataset
[Unidata/netcdf-c#1713].

* [Bug Fix] Now larger cache settings are used for sequential HDF5 file creates/opens on parallel I/O capable builds; see [Github #1716](Unidata/netcdf-c#1716) for more information.

* [Bug Fix] Add functions to libdispatch/dnotnc4.c to support
   dispatch table operations that should work for any dispatch
   table, even if they do not do anything; functions such as
   nc_inq_var_filter [Unidata/netcdf-c#1693].

* [Bug Fix] Fixed a scalar annotation error when scalar == 0; see [Github
  #1707](Unidata/netcdf-c#1707) for more information.

* [Bug Fix] Use proper CURLOPT values for VERIFYHOST and VERIFYPEER; the
  semantics for VERIFYHOST in particular changed. Documented in NUG/DAP2.md. See
  [Unidata/netcdf-c#1684].

* [Bug Fix][cmake] Correct an issue with parallel filter test logic in
  CMake-based builds.

* [Bug Fix] Now allow nc_inq_var_deflate()/nc_inq_var_szip() to be called for
  all formats, not just HDF5. Non-HDF5 files return NC_NOERR and report no
  compression in use. This reverts behavior that was changed in the 4.7.4
  release. See [Unidata/netcdf-c#1691].

* [Bug Fix] Compiling on a big-endian machine exposes some missing forward
  delcarations in dfilter.c.

* [File Change] Change from HDF5 v1.6 format compatibility, back to v1.8
  compatibility, for newly created files.  The superblock changes from version 0
  back to version 2.  An exception is when using libhdf5 deprecated versions
  1.10.0 and 1.10.1, which can only create v1.6 compatible format.  Full
  backward read/write compatibility for netCDF-4 is maintained in all cases.
  See [Github #951](Unidata/netcdf-c#951).

## 4.7.4 - March 27, 2020

* [Windows] Bumped packaged HDF5 to 1.10.6, HDF4 to 4.2.14, and libcurl to
  7.60.0.

* [Enhancement] Support has been added for HDF5-1.12.0.  See
  [Unidata/netcdf-c#1528].

* [Bug Fix] Correct behavior for the command line utilities when directly
  accessing a directory using utf8 characters. See
  [Github #1669] (Unidata/netcdf-c#1669),
  [Github #1668] (Unidata/netcdf-c#1668) and
  [Github #1666] (Unidata/netcdf-c#1666) for more information.

* [Bug Fix] Attempts to set filters or chunked storage on scalar vars will now
  return NC_EINVAL. Scalar vars cannot be chunked, and only chunked vars can
  have filters. Previously the library ignored these attempts, and always
  storing scalars as contiguous storage. See
  [Unidata/netcdf-c#1644].

* [Enhancement] Support has been added for multiple filters per variable.  See
  [Unidata/netcdf-c#1584].

* [Enhancement] Now nc_inq_var_szip retuns 0 for parameter values if szip is not
  in use for var. See [Unidata/netcdf-c#1618].

* [Enhancement] Now allow parallel I/O with filters, for HDF5-1.10.3 and
  later. See [Unidata/netcdf-c#1473].

* [Enhancement] Increased default size of cache buffer to 16 MB, from 4
  MB. Increased number of slots to 4133. See
  [Unidata/netcdf-c#1541].

* [Enhancement] Allow zlib compression to be used with parallel I/O writes, if
  HDF5 version is 1.10.3 or greater. See
  [Unidata/netcdf-c#1580].

* [Enhancement] Restore use of szip compression when writing data (including
  writing in parallel if HDF5 version is 1.10.3 or greater). See
  [Unidata/netcdf-c#1546].

* [Enhancement] Enable use of compact storage option for small vars in
  netCDF/HDF5 files. See [Unidata/netcdf-c#1570].

* [Enhancement] Updated benchmarking program bm_file.c to better handle very
  large files. See [Unidata/netcdf-c#1555].

* [Enhancement] Added version number to dispatch table, and now check version
  with nc_def_user_format(). See
  [Unidata/netcdf-c#1599].

* [Bug Fix] Fixed user setting of MPI launcher for parallel I/O HDF5 test in
  h5_test. See [Unidata/netcdf-c#1626].

* [Bug Fix] Fixed problem of growing memory when netCDF-4 files were opened and
  closed. See [Unidata/netcdf-c#1575 and
  Unidata/netcdf-c#1571].

* [Enhancement] Increased size of maximum allowed name in HDF4 files to
  NC_MAX_NAME. See [Unidata/netcdf-c#1631].

## 4.7.3 - November 20, 2019

* [Bug Fix]Fixed an issue where installs from tarballs will not properly compile
  in parallel environments.

* [Bug Fix] Library was modified so that rewriting the same attribute happens
  without deleting the attribute, to avoid a limit on how many times this may be
  done in HDF5. This fix was thought to be in 4.6.2 but was not. See
  [Unidata/netcdf-c#350].

* [Enhancement] Add a dispatch version number to netcdf_meta.h and
  libnetcdf.settings, in case we decide to change dispatch table in future. See
  [Unidata/netcdf-c#1469].

* [Bug Fix] Now testing that endianness can only be set on atomic ints and
  floats. See [Unidata/netcdf-c#1479].

* [Bug Fix] Fix for subtle error involving var and unlimited dim of the same
  name, but unrelated, in netCDF-4. See
  [Unidata/netcdf-c#1496].

* [Enhancement] Update for attribute documentation. See
  [Unidata/netcdf-c#1512].

* [Bug Fix][Enhancement] Corrected assignment of anonymous (a.k.a. phony)
  dimensions in an HDF5 file. Now when a dataset uses multiple dimensions of the
  same size, netcdf assumes they are different dimensions. See [GitHub
  #1484](Unidata/netcdf-c#1484) for more information.

## 4.7.2 - October 22, 2019

* [Bug Fix][Enhancement] Various bug fixes and enhancements.

* [Bug Fix][Enhancement] Corrected an issue where protected memory was being
  written to with some pointer slight-of-hand.  This has been in the code for a
  while, but appears to be caught by the compiler on OSX, under circumstances
  yet to be completely nailed down.  See
  [GitHub #1486] (Unidata/netcdf-c#1486) for more information.

* [Enhancement] [Parallel IO] Added support for parallel functions in MSVC. See
  [Github #1492](Unidata/netcdf-c#1492) for more
  information.

* [Enhancement] Added a function for changing the ncid of an open file.  This
  function should only be used if you know what you are doing, and is meant to
  be used primarily with PIO integration. See
  [GitHub #1483] (Unidata/netcdf-c#1483) and
  [GitHub #1487] (Unidata/netcdf-c#1487) for more information.

## 4.7.1 - August 27, 2019

* [Enhancement] Added unit_test directory, which contains unit tests for the
  libdispatch and libsrc4 code (and any other directories that want to put unit
  tests there). Use --disable-unit-tests to run without unit tests (ex. for code
  coverage analysis). See
  [GitHub #1458] (Unidata/netcdf-c#1458).

* [Bug Fix] Remove obsolete _CRAYMPP and LOCKNUMREC macros from code. Also
  brought documentation up to date in man page. These macros were used in
  ancient times, before modern parallel I/O systems were developed. Programmers
  interested in parallel I/O should see nc_open_par() and nc_create_par(). See
  [GitHub #1459](Unidata/netcdf-c#1459).

* [Enhancement] Remove obsolete and deprecated functions nc_set_base_pe() and
  nc_inq_base_pe() from the dispatch table. (Both functions are still supported
  in the library, this is an internal change only.) See [GitHub
  #1468](Unidata/netcdf-c#1468).

* [Bug Fix] Reverted nccopy behavior so that if no -c parameters are given, then
  any default chunking is left to the netcdf-c library to decide. See [GitHub
  #1436](Unidata/netcdf-c#1436).

## 4.7.0 - April 29, 2019

* [Enhancement] Updated behavior of `pkgconfig` and `nc-config` to allow the use
  of the `--static` flags, e.g. `nc-config --libs --static`, which will show
  information for linking against `libnetcdf` statically. See
  [Github #1360] (Unidata/netcdf-c#1360) and
  [Github #1257] (Unidata/netcdf-c#1257) for more information.

* [Enhancement] Provide byte-range reading of remote datasets. This allows
  read-only access to, for example, Amazon S3 objects and also Thredds Server
  datasets via the HTTPService access method.  See
  [GitHub #1251] (Unidata/netcdf-c#1251).

* Update the license from the home-brewed NetCDF license to the standard
  3-Clause BSD License.  This change does not result in any new restrictions; it
  is merely the adoption of a standard, well-known and well-understood license
  in place of the historic NetCDF license written at Unidata.  This is part of a
  broader push by Unidata to adopt modern, standardized licensing.

## 4.6.3 - February 28, 2019

* [Bug Fix] Correctly generated `netcdf.pc` generated either by `configure` or
  `cmake`.  If linking against a static netcdf, you would need to pass the
  `--static` argument to `pkg-config` in order to list all of the downstream
  dependencies.  See
  [Github #1324](Unidata/netcdf-c#1324) for more information.

* Now always write hidden coordinates attribute, which allows faster file opens
  when present. See
  [Github #1262](Unidata/netcdf-c#1262) for more information.

* Some fixes for rename, including fix for renumbering of varids after a rename
  (#1307), renaming var to dim without coordinate var. See
  [Github #1297] (Unidata/netcdf-c#1297).

* Fix of NULL parameter causing segfaults in put_vars functions. See
   [Github #1265] (Unidata/netcdf-c#1265) for more information.

* Fix of --enable-benchmark benchmark tests
   [Github #1211] (Unidata/netcdf-c#1211)

* Update the license from the home-brewed NetCDF license to the standard
  3-Clause BSD License.  This change does not result in any new restrictions; it
  is merely the adoption of a standard, well-known and well-understood license
  in place of the historic NetCDF license written at Unidata.  This is part of a
  broader push by Unidata to adopt modern, standardized licensing.

* [BugFix] Corrected DAP-related issues on big-endian machines. See
  [Github #1321] (Unidata/netcdf-c#1321),
  [Github #1302] (Unidata/netcdf-c#1302) for more information.

* [BugFix][Enhancement] Various and sundry bugfixes and performance
  enhancements, thanks to \@edhartnett, \@gsjarrdema, \@t-b, \@wkliao, and all
  of our other contributors.

* [Enhancement] Extended `nccopy -F` syntax to support multiple variables with a
  single invocation. See
  [Github #1311](Unidata/netcdf-c#1311) for more information.

* [BugFix] Corrected an issue where DAP2 was incorrectly converting signed
  bytes, resulting in an erroneous error message under some circumstances. See
  [GitHub #1317] (Unidata/netcdf-c#1317) for more
  information.  See
  [Github #1319] (Unidata/netcdf-c#1319) for related information.

* [BugFix][Enhancement] Modified `nccopy` so that `_NCProperties` is not copied
  over verbatim but is instead generated based on the version of `libnetcdf`
  used when copying the file.  Additionally, `_NCProperties` are displayed
  if/when associated with a netcdf3 file, now. See
  [GitHub#803] (Unidata/netcdf-c#803) for more information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants