-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
request increase in NC_MAX_DIMS, NC_MAX_VARS #148
Comments
Sorry, forgot to specify the values that are currently recommended for use in exodus:
The |
Option 1, increasing the values, is not going to be an option; as you point out, this could break backwards compatibility, and we have a firm commitment to our community to never do that. The second option would (at first glance) work, but we don't have the resources to pursue a change like this in support of a single project. We would be happy to review any pull requests adding this functionality and discuss merging it; if I sound cautious, it's that I'm almost certain that I'm overlooking one potential impact or another. The most straightforward solution would be to use the netcdf-4 interface, which was engineered to support these larger models. Additionally, there are other advantages to the netcdf-4 interface, such as chunking and compression. What are your use cases where you still want to use the classic format, out of curiosity? |
Will work on a pull request. The main use case for needing to continue supporting the classic format is basically laziness on the behalf of other consumers and producers of exodus files -- some of them do not like the added build times and complexity associated with the use of hdf5 with netcdf -- additional libraries, much longer compile time, more complicated link lines, ... Also, for smaller files, there can be a size overhead for hdf5. These are not insurmountable and with the passage of time, the netcdf-4 with hdf5 availability is definitely much more pervasive than it was a few years ago. |
A couple possibilities, let me know which (if any) is preferred or worth pursuing:
I think I like the first option best. I don't see any currently existing function that can be used for setting options---looks like an api function is used, or passed as cmode to ex_create and ex_open for other "options". If the number of dims or vars in the file exceeds the NC_MAX_VARS and NC_MAX_DIMS, then on ex_open, the library would output a warning stating that the file is non-standard unless the NC_IGNORE_MAX_VARS_DIMS passed as cmode to ex_open. |
I am missing something here. I can understand the desire to increase NC_MAX_DIMS and NC_MAX_VARS, but I do not understand why an ignore flag is needed.What should the library do in this case? |
I assume you all are aware of this comment in ddim.c: |
I've been thinking about this a bit today myself, and here is my current line of thinking. While this functionality would benefit the exodus project, this is a flag we would never suggest a person use, as there are really no guarantees or safeguards in place that the resultant file will be readable. At the very least, there would also need to be some mechanism in place to add metadata to a file which captures the information re: was max_var_dim ignored, is the file 'non-standard', etc. This leads me to my next thought:
There are multiple types of "standard" netCDF files, but each has been standardized and codified. I feel an obligation to do the same for any additional file types we want to support. This leads me to two questions:
My conclusion to both is no, we don't. As I stated above, this is not a flag I can imagine recommending anybody ever use; if they need to bypass these limits, we already have a solution in place via the netCDF4 extended model/file format. Furthermore, the file they created with this flag would lose several of the cornerstone properties inherent to netCDF; it is neither standard nor is it machine portable. I've been giving this a lot of thought throughout the day, and I cannot convince myself why we would add functionality that we would actively recommend against people use. However, that doesn't mean any work you've done is wasted, or that you're on the wrong track. I think instead of patching this support in to the netcdf library via a pull request, you should change the target library name to something akin to
I hope my reasoning in this makes sense. |
I noticed that in dvar.c, the code was allocating [NC_MAX_DIMS], p.s. I think Ward's immediately preceding comment is correct. |
Your reasoning on forking netcdf works for a few cases, but not for several of the projects that read exodus file and may also read other netcdf files. In that case, they would have to decide to either link with the netcdf-exodus, or link with the "standard" netcdf. Note that although it is "one project" that this is for, that one project is used as a library in many other projects. I definitely understand the backwards compatibility desire, but that is also the reason I just can require netcdf-4 for all my users. We have a couple decades of use that don't require any other library than netcdf and to suddenly force them to all start building and linking with hdf5 with/without libdl and libz and ... messes up my backwards compatibility. Let me think about this a little more... |
I'd like to provide a little context for my reasoning, and address the notion that my earlier comment expressed a "desire" for backwards compatibility. NetCDF has an iron-clad commitment to our community to always maintain 100% backwards compatibility. I believe it may also be an ongoing obligation we have to our funding source(es). The success of netCDF as a standard for use in data modeling, storage and archiving is in no small part due to this commitment. There would be many potential benefits to breaking with backwards compatibility (simplifying the code, optimizing the architecture, dropping little-used portions of the code) but to do any of this would require breaking a decades-old promise to our community. As a result, any changes that break backwards compatibility in any way are simply not on the table. To achieve these ends, we would have to define a new data model/file format in such a way that it was mostly a netcdf3 file but had a well-defined variance, i.e. it may not adhere to the
I sympathize with what you're trying to do, and am genuinely interested in helping you find a solution; the more people using netCDF the better! However, assuming the technical debt incurred by using a non-standard version of the netCDF library by making that version 'standard' at the expense of the rest of our community isn't a solution to this issue. I'll chime in if other solutions present themselves after some more thought, but I wanted to make sure the factors I have to consider for any solution are clearly communicated. |
I understand the issues and the strong backward compatibility guarantee and very much like netcdf's adherance to this. However, I don't think that the maximum dimensions and variables is a part of the data model or file format is it? Also, why would the file not be machine independent? I'm not trying to be difficult, I'm just trying to see if there is a solution to this. From the documentation, there is the following discussion on compatibility (This is in reference to the netcdf-4 vs netcdf-3 format; I've tried to apply the same reasoning to the dims/vars change)
I don't think the new data model/file format is the answer (maybe the CDF-2 format could not have limits?) I will try to come up with a better explanation of what I am trying to accomplish and see if we can work from that. Note that we have been writing (and reading) "non-standard" netcdf files for about 25 years with minimal issues... |
Just thought I would summarize what we typically have done for netcdf and exodus and why I am finding this inadequate in today's environment.
Recently, things have changed and our old workflow is not working as well as we would like...
For these and other reasons, the old method of locally-modifying the netCDF library does not work as well as it has in the past. I was hoping that there was some way that we could make some small backwardly-compatible modifications to netcdf which would allow us to use the default netcdf library installation. For the most part, exodus files are not examined with generic netcdf tools (However, we have used NCO operators in the past...); they are usually only read/written through the exodus library API. For this reason, I had suggested that there be an API function which disables the MAX_DIMS check for netcdf-3 classic and the MAX_VARS check for netcdf-3 and netcdf-4 classic formats. I understand that this could cause issues if someone explicitly called this API function, created a netcdf file with more dimensions than the default, and then tried to operate on that file with a generic netcdf application. However, this seemed to be the lesser of the evils of other potential solutions:
None of these give me a warm feeling. The best option as I see it is to continue with the status-quo and embed a modified copy of netcdf in the seacas source distribution. The benefit of this is that if our users find a generic application that does not read one of our files correclty due to the large dims and vars count, we can compile it against our library and it should then work.. Hopefully this explains the issue more clearly. I am still hopeful that there is non-local solution. |
Hi, I'm the maintainer of Exodus in Debian/Ubuntu (and a co-maintainer of NetCDF). Exodus was added for complete support of VisIT, which is being included in Debian. As such, I'm the proximate cause of this bugreport :-) From our perspective, NetCDF will always be shipped with netcdf4 / HDF5 support, so the increased complexity is not a users issue. Perhaps adding code to exodus insist that models exceeding MC_MAX_VARS / NC_MAX_DIMS write to netcdf4/HDF5, and fail otherwise, which would solve the problem for us. There is still the issue AIUI of older exodus / classic (CDF2? ) datasets that exceed the limits and could break if viewed with generic tools like NCO / ncdump. If these files exist, even if we no longer generate them (and despite them breaking netcdf compatability promises) we should add test cases to ensure the netcdf tools we ship (nco, etc) can handle these files. |
A couple comments
I agree that it would be good to identify which tools break with the large dims and vars sizes (and verify that they work correctly in netcdf-4-non-classic mode) so that users can be warned about using them with exodus files. ..Greg "A supercomputer is a device for turning compute-bound problems into I/O-bound problems" From: amckinstry <notifications@github.commailto:notifications@github.com> Hi, I'm the maintainer of Exodus in Debian/Ubuntu (and a co-maintainer of NetCDF). Exodus was added for complete support of VisIT, which is being included in Debian. As such, I'm the proximate cause of this bugreport :-) From our perspective, NetCDF will always be shipped with netcdf4 / HDF5 support, so the increased complexity is not a users issue. Perhaps adding code to exodus insist that models exceeding MC_MAX_VARS / NC_MAX_DIMS write to netcdf4/HDF5, and fail otherwise, which would solve the problem for us. There is still the issue AIUI of older exodus / classic (CDF2? ) datasets that exceed the limits and could break if viewed with generic tools like NCO / ncdump. If these files exist, even if we no longer generate them (and despite them breaking netcdf compatability promises) we should add test cases to ensure the netcdf tools we ship (nco, etc) can handle these files. Reply to this email directly or view it on GitHubhttps://github.com//issues/148#issuecomment-156163339. |
The test to try is to modify netcdf.h so that those limits are as big as possible |
BTW, has anyone contacted Charlie Zender re the effect of this on NCO? |
All of the netcdf unit tests will run with the NC_MAX_DIMS and NC_MAX_VARS values set to large values; we (exodus) has been doing that for 20+ years. At one time, we had to do "unlimit" prior to running the tests, but other than that all netcdf unit tests pass with modified defines. ..Greg "A supercomputer is a device for turning compute-bound problems into I/O-bound problems" From: DennisHeimbigner <notifications@github.commailto:notifications@github.com> The test to try is to modify netcdf.h so that those limits are as big as possible Reply to this email directly or view it on GitHubhttps://github.com//issues/148#issuecomment-156206450. |
I haven't yet. I have run nco that was built against a netcdf with increased limits and its tests passed and the files that I fed it worked OK. I haven't tried running nco with the default values for the limits and giving it a file that exceeds those limits. Based on a quick grep of the source code, I would expect those to fail. However I also noticed at least one array that was dimensioned as NC_MAX_DIMS that should have been dimensioned to NC_MAX_VAR_DIMS. Since those values are equal in the default netcdf, everything worked ok in the past. ..Greg "A supercomputer is a device for turning compute-bound problems into I/O-bound problems" From: DennisHeimbigner <notifications@github.commailto:notifications@github.com> BTW, has anyone contacted Charlie Zender re the effect of this on NCO? Reply to this email directly or view it on GitHubhttps://github.com//issues/148#issuecomment-156206934. |
The question for NCO is: are they allocating space using some form of |
Yes they are. If you grep the source code, there are many arrays statically allocated as "my_array[NC_MAX_DIMS]" "A supercomputer is a device for turning compute-bound problems into I/O-bound problems" From: DennisHeimbigner <notifications@github.commailto:notifications@github.com> The question for NCO is: are they allocating space using some form of Reply to this email directly or view it on GitHubhttps://github.com//issues/148#issuecomment-156212621. |
Below is the list. Some of these look like they should be NC_MAX_VAR_DIMS instead of NC_MAX_DIMS. For example, ..Greg Here is the list: s974786:nco((HEAD detached at 4.5.2))> grep -R NC_MAX_DIMS . ./src/nco_c++/nco_utl.cc: // int dmn_id[NC_MAX_DIMS]; // [id] Dimension IDs"A supercomputer is a device for turning compute-bound problems into I/O-bound problems” From: "Sjaardema, Gregory D" <gdsjaar@sandia.govmailto:gdsjaar@sandia.gov> Yes they are. If you grep the source code, there are many arrays statically allocated as “my_array[NC_MAX_DIMS]” "A supercomputer is a device for turning compute-bound problems into I/O-bound problems” From: DennisHeimbigner <notifications@github.commailto:notifications@github.com> The question for NCO is: are they allocating space using some form of — |
@gdsjaar Thanks for the additional context, and I saw the pull requests you issued as well :). I'll review those shortly, but I did see @dmh's response that it looks like a good change, so that's great :). I didn't want to make it appear that I was ignoring the additional discussion here or there. My last couple of days have been consumed with the recent 4.4.0-rc4/rc5 release. I have two main issues I need to reflect on at this point; I'll avoid another long-winded message for now but they are:
|
So, I did a quick review of the netcdf client software in Debian, looking at this. cmor, magics, metview, pydap, r-cran-rnetcdf, r-cran-ncdf4, scipy, ncview, ncl, grads, gmt, gdal, cdftools, etsf-io, dx: ok The Problematic codes are: |
As you are probably aware, Sandia's exodus database uses netcdf, but for exodus, we need to increase the values of NC_MAX_DIMS and NC_MAX_VARS to store reasonably-sized models. In the past, this "locally-modified" netcdf library was OK; however, recently, the use of exodus has spread to more projects and people have started to provide binary packages for Ubuntu and other systems.
It is difficult (or impossible) in these cases to separate the "local netcdf" from a system-wide netcdf library and we therefore end up either modifying the default system-wide netcdf library, or we end up using the default system-wide netcdf library with default dimensions and that limits the complexity of the models that can be stored in exodus.
I have a couple proposals for changes.
Neither of these solutions is optimal. The api function could cause memory issues in an application which attempted to read a file with large dim and var counts. The first option would possibly break backward compatibility (In that an application that statically allocates data based on these values could possibly run out of memory when linked to a new library where it ran with no problem on the older library)
There may be another solution to this issue that someone smarter than me can come up with; if so, please let me know.
I realize that if I use the non-classic netcdf-4 interface, the limits are not checked; however, we still have use cases where we want to use classic.
Thanks,
..Greg
The text was updated successfully, but these errors were encountered: