New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDF5 documentation lacking and a question #463

Closed
czender opened this Issue Aug 18, 2017 · 67 comments

Comments

Projects
None yet
6 participants
@czender

czender commented Aug 18, 2017

A user applying NCO arithmetic to large data files in the "new" CDF5 format is encountering issues that result (silently) in bad answers. I'm trying to track down the problem. I have not found the netCDF documentation on CDF5 format limitations on this page and so one purpose of this issue to request the addition of CDF5 limits there (assuming that's the right place for it). The table also needs reformatting.

This PnetCDF page says

Amount of Individual Requests is Limited to 2GB: In PnetCDF, a single get/put request is limited to the amount of 2GiB....A solution for PnetCDF users is to break the request to multiple get/put calls so that each is less than amount of 2 GiB.

Forgetting about MPI, and only considering serial netCDF environments, does the 2GiB put/get request limit above apply to any netCDF file formats, including the CDF5 format? NCO does not limit its request sizes, so I wonder if that could be the problem...

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Aug 18, 2017

Member

Need to clarify a bit. Is a CDF5 file being accessed using NC_PNETCDF or by using
NC_CDF5 mode flag?

Member

DennisHeimbigner commented Aug 18, 2017

Need to clarify a bit. Is a CDF5 file being accessed using NC_PNETCDF or by using
NC_CDF5 mode flag?

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Aug 18, 2017

czender commented Aug 18, 2017

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Aug 19, 2017

Contributor

Do you mean "silent" the unexpected values appear in files for write and vise versa for read?
And no error codes were returned...?

My understanding of NetCDF is it first finds the number of contiguous requests (in file layout) from arguments start and count, and then runs a loop to write one request at a time. I assume your case is to write a contiguous chunk of size > 2GB. I have never tried this before. It is likely some internal codes need to be adjusted to handle this kind of request.

Contributor

wkliao commented Aug 19, 2017

Do you mean "silent" the unexpected values appear in files for write and vise versa for read?
And no error codes were returned...?

My understanding of NetCDF is it first finds the number of contiguous requests (in file layout) from arguments start and count, and then runs a loop to write one request at a time. I assume your case is to write a contiguous chunk of size > 2GB. I have never tried this before. It is likely some internal codes need to be adjusted to handle this kind of request.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Aug 19, 2017

Yes, it would explain the problem my users are experiencing if requests for more than 2 GB from a CDF5 file do not return the requested amount. There is no problem doing this from a netCDF4 file. Again, this is just a hypothesis, I'm trying to track down a mysterious bug, and this could be it. Need confirmation.

czender commented Aug 19, 2017

Yes, it would explain the problem my users are experiencing if requests for more than 2 GB from a CDF5 file do not return the requested amount. There is no problem doing this from a netCDF4 file. Again, this is just a hypothesis, I'm trying to track down a mysterious bug, and this could be it. Need confirmation.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Aug 21, 2017

I have now replicated workflows that strongly suggest there are undocumented differences between put/get results for large (4-8 GB) variables between CDF5 and NETCDF4 files. I don't have proof that this is a netCDF issue rather than an NCO issue, though @wkliao suggests that CDF5 get/put in the netCDF library was never tested for requests > 2 GB (and PnetCDF explicitly does not support single requests > 2 GB). This issue prevents the DOE ACME MPAS model runs at high resolution (that are archived in CDF5 format) from being properly analyzed, so it affects users now. Is anyone (@wkliao ?) interested in verifying whether CDF5 has put/get limits?

My circumstantial methods and evidence are that a billion doubles all equal to one do average to one for NETCDF4 data, and don't for CDF5 data...

ncap2 -5 -v -O -s 'one=1;defdim("dmn_1e9",1000000000);dta_1e9[dmn_1e9]=1.0;two=2;' ~/nco/data/in.nc ~/foo_big.nc5
ncap2 -4 -v -O --cnk_dmn=dmn_1e9,100000000 -s 'one=1;defdim("dmn_1e9",1000000000);dta_1e9[dmn_1e9]=1.0;two=2' ~/nco/data/in.nc ~/foo_big.nc4
ncwa -O ~/foo_big.nc5 ~/foo_avg.nc5
ncwa -O ~/foo_big.nc4 ~/foo_avg.nc4
ncks -H ~/foo_avg.nc5
ncks -H ~/foo_avg.nc4

which results in

zender@skyglow:~$ ncks -H ~/foo_avg.nc5
netcdf foo_avg {

  variables:
    double dta_1e9 ;

    int one ;

    int two ;

  data:
    dta_1e9 = 0.536870912 ; 

    one = 1 ; 

    two = 2 ; 

} // group /
zender@skyglow:~$ ncks -H ~/foo_avg.nc4
netcdf foo_avg {

  variables:
    double dta_1e9 ;

    int one ;

    int two ;

  data:
    dta_1e9 = 1 ; 

    one = 1 ; 

    two = 2 ; 

} // group /

The first ncap2 command above requires the latest NCO snapshot to support CDF5. Note that these commands both place a few other variables, named "one" and "two", around the large variable "dta_1e9" in the output file, so dta_1e9 is not the only variable. When dta_1e9 is the only variable, the CDF5-based workflow yields the correct answer! So, if my hypothesis is correct, CDF5 variables larger than some threshold size (possibly 2 GB?) are not written and/or read correctly when the nc_put/get_var() is one request for the entire variable, and there are other variables in the dataset.

The behavior is identical on netCDF 4.4.1.1 and today's daily snapshot of 4.5.1-development.

czender commented Aug 21, 2017

I have now replicated workflows that strongly suggest there are undocumented differences between put/get results for large (4-8 GB) variables between CDF5 and NETCDF4 files. I don't have proof that this is a netCDF issue rather than an NCO issue, though @wkliao suggests that CDF5 get/put in the netCDF library was never tested for requests > 2 GB (and PnetCDF explicitly does not support single requests > 2 GB). This issue prevents the DOE ACME MPAS model runs at high resolution (that are archived in CDF5 format) from being properly analyzed, so it affects users now. Is anyone (@wkliao ?) interested in verifying whether CDF5 has put/get limits?

My circumstantial methods and evidence are that a billion doubles all equal to one do average to one for NETCDF4 data, and don't for CDF5 data...

ncap2 -5 -v -O -s 'one=1;defdim("dmn_1e9",1000000000);dta_1e9[dmn_1e9]=1.0;two=2;' ~/nco/data/in.nc ~/foo_big.nc5
ncap2 -4 -v -O --cnk_dmn=dmn_1e9,100000000 -s 'one=1;defdim("dmn_1e9",1000000000);dta_1e9[dmn_1e9]=1.0;two=2' ~/nco/data/in.nc ~/foo_big.nc4
ncwa -O ~/foo_big.nc5 ~/foo_avg.nc5
ncwa -O ~/foo_big.nc4 ~/foo_avg.nc4
ncks -H ~/foo_avg.nc5
ncks -H ~/foo_avg.nc4

which results in

zender@skyglow:~$ ncks -H ~/foo_avg.nc5
netcdf foo_avg {

  variables:
    double dta_1e9 ;

    int one ;

    int two ;

  data:
    dta_1e9 = 0.536870912 ; 

    one = 1 ; 

    two = 2 ; 

} // group /
zender@skyglow:~$ ncks -H ~/foo_avg.nc4
netcdf foo_avg {

  variables:
    double dta_1e9 ;

    int one ;

    int two ;

  data:
    dta_1e9 = 1 ; 

    one = 1 ; 

    two = 2 ; 

} // group /

The first ncap2 command above requires the latest NCO snapshot to support CDF5. Note that these commands both place a few other variables, named "one" and "two", around the large variable "dta_1e9" in the output file, so dta_1e9 is not the only variable. When dta_1e9 is the only variable, the CDF5-based workflow yields the correct answer! So, if my hypothesis is correct, CDF5 variables larger than some threshold size (possibly 2 GB?) are not written and/or read correctly when the nc_put/get_var() is one request for the entire variable, and there are other variables in the dataset.

The behavior is identical on netCDF 4.4.1.1 and today's daily snapshot of 4.5.1-development.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Aug 21, 2017

Member

Since the CDF5 code in libsrc came from pnetcdf originally,
it would not be surprising if the 2gb limit snuck in.
We are going to need a simple C program to test this.
I tried but ran out of memory.

Member

DennisHeimbigner commented Aug 21, 2017

Since the CDF5 code in libsrc came from pnetcdf originally,
it would not be surprising if the 2gb limit snuck in.
We are going to need a simple C program to test this.
I tried but ran out of memory.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Aug 21, 2017

Contributor

I tested a short netcdf program to mimic the I/O @czender described (the codes are shown below). It creates 3 variables, namely var, one, and two. var is a 1D array of type double of 2^9 elements. Variables one and two are scalars. The 3 variables are first written to a new file and then read back to calculate the average. However, I could not reproduce the problem with this program.

@czender could you use this program and modify it close to what you are doing with the NCO operations?

#include <stdio.h>
#include <stdlib.h>
#include <netcdf.h>

#define DIM 1073741824

#define ERR {if(err!=NC_NOERR){printf("Error at line %d in %s: %s\n", __LINE__,__FILE__, nc_strerror(err));nerrs++;}}

int main(int argc, char *argv[])
{
    int err, nerrs=0, ncid, dimid, varid[3], int_buf1, int_buf2;
    size_t i;
    double *buf, avg=0.0;

    err = nc_create("test.nc", NC_CLOBBER|NC_CDF5, &ncid); ERR
    err = nc_def_dim(ncid, "dim", DIM, &dimid); ERR
    err = nc_def_var(ncid, "var", NC_DOUBLE, 1, &dimid, &varid[0]); ERR
    err = nc_def_var(ncid, "one", NC_INT, 0, NULL, &varid[1]); ERR
    err = nc_def_var(ncid, "two", NC_INT, 0, NULL, &varid[2]); ERR
    err = nc_set_fill(ncid, NC_NOFILL, NULL); ERR
    err = nc_enddef(ncid); ERR

    buf = (double*) malloc(DIM * sizeof(double));
    for (i=0; i<DIM; i++) buf[i] = 1.0;

    err = nc_put_var_double(ncid, varid[0], buf); ERR
    int_buf1 = 1;
    err = nc_put_var_int(ncid, varid[1], &int_buf1); ERR
    int_buf2 = 2;
    err = nc_put_var_int(ncid, varid[2], &int_buf2); ERR
    err = nc_close(ncid); ERR

    err = nc_open("test.nc", NC_NOWRITE, &ncid); ERR
    err = nc_inq_varid(ncid, "var", &varid[0]); ERR
    err = nc_inq_varid(ncid, "one", &varid[1]); ERR
    err = nc_inq_varid(ncid, "two", &varid[2]); ERR
    for (i=0; i<DIM; i++) buf[i] = 0.0;
    err = nc_get_var_double(ncid, varid[0], buf); ERR
    int_buf1 = int_buf2 = 0;
    err = nc_get_var_int(ncid, varid[1], &int_buf1); ERR
    err = nc_get_var_int(ncid, varid[2], &int_buf2); ERR
    err = nc_close(ncid); ERR

    printf("get var one = %d\n",int_buf1);
    printf("get var two = %d\n",int_buf2);
    for (i=0; i<DIM; i++) avg += buf[i];
    avg /= DIM;
    printf("avg = %f\n",avg);
    free(buf);

    return (nerrs > 0);
}
% ./tst_cdf5 
get var one = 1
get var two = 2
avg = 1.000000

% ls -lh
total 8.1G
-rw------- 1 wkliao users 2.2K Aug 21 16:16 Makefile
-rw-r--r-- 1 wkliao users 8.1G Aug 21 16:29 test.nc
-rwxr-xr-x 1 wkliao users 1.1M Aug 21 16:27 tst_cdf5
-rw------- 1 wkliao users 1.8K Aug 21 16:27 tst_cdf5.c

% ncdump -h test.nc
netcdf test {
dimensions:
	dim = 1073741824 ;
variables:
	double var(dim) ;
	int one ;
	int two ;
}
Contributor

wkliao commented Aug 21, 2017

I tested a short netcdf program to mimic the I/O @czender described (the codes are shown below). It creates 3 variables, namely var, one, and two. var is a 1D array of type double of 2^9 elements. Variables one and two are scalars. The 3 variables are first written to a new file and then read back to calculate the average. However, I could not reproduce the problem with this program.

@czender could you use this program and modify it close to what you are doing with the NCO operations?

#include <stdio.h>
#include <stdlib.h>
#include <netcdf.h>

#define DIM 1073741824

#define ERR {if(err!=NC_NOERR){printf("Error at line %d in %s: %s\n", __LINE__,__FILE__, nc_strerror(err));nerrs++;}}

int main(int argc, char *argv[])
{
    int err, nerrs=0, ncid, dimid, varid[3], int_buf1, int_buf2;
    size_t i;
    double *buf, avg=0.0;

    err = nc_create("test.nc", NC_CLOBBER|NC_CDF5, &ncid); ERR
    err = nc_def_dim(ncid, "dim", DIM, &dimid); ERR
    err = nc_def_var(ncid, "var", NC_DOUBLE, 1, &dimid, &varid[0]); ERR
    err = nc_def_var(ncid, "one", NC_INT, 0, NULL, &varid[1]); ERR
    err = nc_def_var(ncid, "two", NC_INT, 0, NULL, &varid[2]); ERR
    err = nc_set_fill(ncid, NC_NOFILL, NULL); ERR
    err = nc_enddef(ncid); ERR

    buf = (double*) malloc(DIM * sizeof(double));
    for (i=0; i<DIM; i++) buf[i] = 1.0;

    err = nc_put_var_double(ncid, varid[0], buf); ERR
    int_buf1 = 1;
    err = nc_put_var_int(ncid, varid[1], &int_buf1); ERR
    int_buf2 = 2;
    err = nc_put_var_int(ncid, varid[2], &int_buf2); ERR
    err = nc_close(ncid); ERR

    err = nc_open("test.nc", NC_NOWRITE, &ncid); ERR
    err = nc_inq_varid(ncid, "var", &varid[0]); ERR
    err = nc_inq_varid(ncid, "one", &varid[1]); ERR
    err = nc_inq_varid(ncid, "two", &varid[2]); ERR
    for (i=0; i<DIM; i++) buf[i] = 0.0;
    err = nc_get_var_double(ncid, varid[0], buf); ERR
    int_buf1 = int_buf2 = 0;
    err = nc_get_var_int(ncid, varid[1], &int_buf1); ERR
    err = nc_get_var_int(ncid, varid[2], &int_buf2); ERR
    err = nc_close(ncid); ERR

    printf("get var one = %d\n",int_buf1);
    printf("get var two = %d\n",int_buf2);
    for (i=0; i<DIM; i++) avg += buf[i];
    avg /= DIM;
    printf("avg = %f\n",avg);
    free(buf);

    return (nerrs > 0);
}
% ./tst_cdf5 
get var one = 1
get var two = 2
avg = 1.000000

% ls -lh
total 8.1G
-rw------- 1 wkliao users 2.2K Aug 21 16:16 Makefile
-rw-r--r-- 1 wkliao users 8.1G Aug 21 16:29 test.nc
-rwxr-xr-x 1 wkliao users 1.1M Aug 21 16:27 tst_cdf5
-rw------- 1 wkliao users 1.8K Aug 21 16:27 tst_cdf5.c

% ncdump -h test.nc
netcdf test {
dimensions:
	dim = 1073741824 ;
variables:
	double var(dim) ;
	int one ;
	int two ;
}
@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Aug 22, 2017

Thanks @wkliao. This is a good starting point. I made a few changes to more closely follow the NCO code path. However, I still get the same results you do. Will keep trying...

czender commented Aug 22, 2017

Thanks @wkliao. This is a good starting point. I made a few changes to more closely follow the NCO code path. However, I still get the same results you do. Will keep trying...

@WardF WardF self-assigned this Aug 30, 2017

@WardF WardF added this to the 4.5.0 milestone Aug 30, 2017

@WardF WardF added the type/bug label Aug 30, 2017

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Aug 31, 2017

@wkliao I notice something I do not understand about CDF5: I have a netCDF4 file with two variables whose on-disk sizes compute as 9 GB and 3 GB, respectively. By "computes as" I mean multiplying the dimension sizes times the size of NC_DOUBLE. And the netCDF4 (uncompressed) filesize is, indeed, 12 GB, as I expect. Yet when I convert that netCDF4 file to CDF5, the total size changes from 12 GB to 9 GB. In other words, it looks like CDF5 makes use of compression, or does not allocate wasted space for _FillValues, or something like that. If you understand what I'm talking about, please explain why CDF5 consumes less filespace than expected...

czender commented Aug 31, 2017

@wkliao I notice something I do not understand about CDF5: I have a netCDF4 file with two variables whose on-disk sizes compute as 9 GB and 3 GB, respectively. By "computes as" I mean multiplying the dimension sizes times the size of NC_DOUBLE. And the netCDF4 (uncompressed) filesize is, indeed, 12 GB, as I expect. Yet when I convert that netCDF4 file to CDF5, the total size changes from 12 GB to 9 GB. In other words, it looks like CDF5 makes use of compression, or does not allocate wasted space for _FillValues, or something like that. If you understand what I'm talking about, please explain why CDF5 consumes less filespace than expected...

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Aug 31, 2017

Contributor

The file size should be 12GB. Do you have a test program that can reproduce this?

Contributor

wkliao commented Aug 31, 2017

The file size should be 12GB. Do you have a test program that can reproduce this?

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Aug 31, 2017

Uh oh. It's hard to wrap my head around all this. The mysterious issues only appear with huge files that are hard to manipulated. The issues when processing with NCO may be due to NCO, but in order to verify I have to use another toolkit. CDO does not yet support CDF5. And nccopy either fails (with -V) to extract the variables I want, or (with -v) extracts all the variables so I can't mimic the NCO workflow...

zender@skyglow:~$ nccopy -V one_dmn_rec_var,two_dmn_rec_var ~/nco/data/in.nc ~/foo.nc
NetCDF: Variable not found
Location: file nccopy.c; line 979
zender@skyglow:~$ nccopy -v one_dmn_rec_var,two_dmn_rec_var ~/nco/data/in.nc ~/foo.nc
zender@skyglow:~$ ncdump ~/foo.nc | m
netcdf foo {
dimensions:
        dgn = 1 ;
        bnd = 2 ;
        lat = 2 ;
        lat_grd = 3 ;
        lev = 3 ;
        rlev = 3 ;
        ilev = 4 ;
        lon = 4 ;
        lon_grd = 5 ;
        char_dmn_lng80 = 80 ;
        char_dmn_lng26 = 26 ;
        char_dmn_lng04 = 4 ;
        date_dmn = 5 ;
        fl_dmn = 3 ;
        lsmlev = 6 ;
        wvl = 2 ;
        time_udunits = 3 ;
        lon_T42 = 128 ;
        lat_T42 = 64 ;
        lat_times_lon = 8 ;
        gds_crd = 8 ;
        gds_ncd = 8 ;
        vrt_nbr = 2 ;
        lon_cal = 10 ;
        lat_cal = 10 ;
        Lon = 4 ;
        Lat = 2 ;
        time = UNLIMITED ; // (10 currently)
variables:
        int date_int(date_dmn) ;
                date_int:long_name = "Date (as array of ints: YYYY,MM,DD,HH,MM)" ;
        float dgn(dgn) ;
                dgn:long_name = "degenerate coordinate (dgn means degenerate, i.e., of size 1)" ;
...

Until I find another way to subset variables from a CDF5 file, I'm stuck. @WardF would you please instruct me how to use nccopy to subset certain variables from a CDF5 file? I think the above output demonstrates that nccopy (yes, the latest 4.5.x) has some breakage with the -v and -V options.

czender commented Aug 31, 2017

Uh oh. It's hard to wrap my head around all this. The mysterious issues only appear with huge files that are hard to manipulated. The issues when processing with NCO may be due to NCO, but in order to verify I have to use another toolkit. CDO does not yet support CDF5. And nccopy either fails (with -V) to extract the variables I want, or (with -v) extracts all the variables so I can't mimic the NCO workflow...

zender@skyglow:~$ nccopy -V one_dmn_rec_var,two_dmn_rec_var ~/nco/data/in.nc ~/foo.nc
NetCDF: Variable not found
Location: file nccopy.c; line 979
zender@skyglow:~$ nccopy -v one_dmn_rec_var,two_dmn_rec_var ~/nco/data/in.nc ~/foo.nc
zender@skyglow:~$ ncdump ~/foo.nc | m
netcdf foo {
dimensions:
        dgn = 1 ;
        bnd = 2 ;
        lat = 2 ;
        lat_grd = 3 ;
        lev = 3 ;
        rlev = 3 ;
        ilev = 4 ;
        lon = 4 ;
        lon_grd = 5 ;
        char_dmn_lng80 = 80 ;
        char_dmn_lng26 = 26 ;
        char_dmn_lng04 = 4 ;
        date_dmn = 5 ;
        fl_dmn = 3 ;
        lsmlev = 6 ;
        wvl = 2 ;
        time_udunits = 3 ;
        lon_T42 = 128 ;
        lat_T42 = 64 ;
        lat_times_lon = 8 ;
        gds_crd = 8 ;
        gds_ncd = 8 ;
        vrt_nbr = 2 ;
        lon_cal = 10 ;
        lat_cal = 10 ;
        Lon = 4 ;
        Lat = 2 ;
        time = UNLIMITED ; // (10 currently)
variables:
        int date_int(date_dmn) ;
                date_int:long_name = "Date (as array of ints: YYYY,MM,DD,HH,MM)" ;
        float dgn(dgn) ;
                dgn:long_name = "degenerate coordinate (dgn means degenerate, i.e., of size 1)" ;
...

Until I find another way to subset variables from a CDF5 file, I'm stuck. @WardF would you please instruct me how to use nccopy to subset certain variables from a CDF5 file? I think the above output demonstrates that nccopy (yes, the latest 4.5.x) has some breakage with the -v and -V options.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Aug 31, 2017

Member

I will take a look at -v/-V and see what's going on. The original files are obviously quite large, I'll see if I can recreate this locally with a file on hand.

Member

WardF commented Aug 31, 2017

I will take a look at -v/-V and see what's going on. The original files are obviously quite large, I'll see if I can recreate this locally with a file on hand.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Aug 31, 2017

The in.nc file on which the above commands were performed to demonstrate nccopy -v/-V weirdness, is tiny. Same behavior should occur with any file you like.

czender commented Aug 31, 2017

The in.nc file on which the above commands were performed to demonstrate nccopy -v/-V weirdness, is tiny. Same behavior should occur with any file you like.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Aug 31, 2017

Member

@DennisHeimbigner This feels like an issue we've seen and (I thought) addressed, recently. Does this ring any bells for you? Maybe the fix is on a branch that I neglected to merge. Going to look now.

Member

WardF commented Aug 31, 2017

@DennisHeimbigner This feels like an issue we've seen and (I thought) addressed, recently. Does this ring any bells for you? Maybe the fix is on a branch that I neglected to merge. Going to look now.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Aug 31, 2017

Member

Ok. Similar issue, although the issue claims it is 64-bit offset only and this is not the case. I'll update the original issue.

I can copy in.nc (via nccopy) from classic to netcdf-4 classic; the commands Charlie outline above work on this new file, but fail on the old one.

Member

WardF commented Aug 31, 2017

Ok. Similar issue, although the issue claims it is 64-bit offset only and this is not the case. I'll update the original issue.

I can copy in.nc (via nccopy) from classic to netcdf-4 classic; the commands Charlie outline above work on this new file, but fail on the old one.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Aug 31, 2017

Member

Found a code stanza in nccopy.c starting at line 1451. The comment seems of interest here.

/* For performance, special case netCDF-3 input or output file with record
     * variables, to copy a record-at-a-time instead of a
     * variable-at-a-time. */
    /* TODO: check that these special cases work with -v option */
    if(nc3_special_case(igrp, inkind)) {
	size_t nfixed_vars, nrec_vars;
	int *fixed_varids;
	int *rec_varids;
	NC_CHECK(classify_vars(igrp, &nfixed_vars, &fixed_varids, &nrec_vars, &rec_varids));
	NC_CHECK(copy_fixed_size_data(igrp, ogrp, nfixed_vars, fixed_varids)); //FAILURE IS DOWNSTREAM FROM HERE
	NC_CHECK(copy_record_data(igrp, ogrp, nrec_vars, rec_varids));
    } else if (nc3_special_case(ogrp, outkind)) {

Member

WardF commented Aug 31, 2017

Found a code stanza in nccopy.c starting at line 1451. The comment seems of interest here.

/* For performance, special case netCDF-3 input or output file with record
     * variables, to copy a record-at-a-time instead of a
     * variable-at-a-time. */
    /* TODO: check that these special cases work with -v option */
    if(nc3_special_case(igrp, inkind)) {
	size_t nfixed_vars, nrec_vars;
	int *fixed_varids;
	int *rec_varids;
	NC_CHECK(classify_vars(igrp, &nfixed_vars, &fixed_varids, &nrec_vars, &rec_varids));
	NC_CHECK(copy_fixed_size_data(igrp, ogrp, nfixed_vars, fixed_varids)); //FAILURE IS DOWNSTREAM FROM HERE
	NC_CHECK(copy_record_data(igrp, ogrp, nrec_vars, rec_varids));
    } else if (nc3_special_case(ogrp, outkind)) {

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Aug 31, 2017

Member

Have you tried to disable this optimization to see if it then starts working ok?

Member

DennisHeimbigner commented Aug 31, 2017

Have you tried to disable this optimization to see if it then starts working ok?

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Aug 31, 2017

@wkliao This 9 GB CDF5 file contains two variables whose uncompressed sizes are 9 GB and 3 GB and so should require 12 GB of disk space to store. Inspection with ncdump/ncks shows there are data in both variables. When I convert it to netCDF4, the resulting file is, indeed, 12 GB. Can you tell me anything about whether the CDF5 file is legal, or corrupt, or when/where/how in the writing process it may have been truncated?

czender commented Aug 31, 2017

@wkliao This 9 GB CDF5 file contains two variables whose uncompressed sizes are 9 GB and 3 GB and so should require 12 GB of disk space to store. Inspection with ncdump/ncks shows there are data in both variables. When I convert it to netCDF4, the resulting file is, indeed, 12 GB. Can you tell me anything about whether the CDF5 file is legal, or corrupt, or when/where/how in the writing process it may have been truncated?

@czender czender closed this Aug 31, 2017

@czender czender reopened this Aug 31, 2017

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Aug 31, 2017

Member

@DennisHeimbigner Yes I found the issue, it is unrelated to optimization (in terms of the nccopy issue, not what @czender has observed with file sizes). I'm working on a fix right now.

Member

WardF commented Aug 31, 2017

@DennisHeimbigner Yes I found the issue, it is unrelated to optimization (in terms of the nccopy issue, not what @czender has observed with file sizes). I'm working on a fix right now.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Aug 31, 2017

Member

Ok, I think I have a fix for the nccopy -V/-v issue @czender references above. It is in the branch gh425. I have not run any regression tests yet so I can't say it's the fix, but I will pick it up tomorrow and continue. I can say that nccopy -V/-v is working as expected for this specific test case, however. Pushing out to github, will see what travis says.

Member

WardF commented Aug 31, 2017

Ok, I think I have a fix for the nccopy -V/-v issue @czender references above. It is in the branch gh425. I have not run any regression tests yet so I can't say it's the fix, but I will pick it up tomorrow and continue. I can say that nccopy -V/-v is working as expected for this specific test case, however. Pushing out to github, will see what travis says.

WardF added a commit that referenced this issue Aug 31, 2017

Corrected an issue with netcdf3 files that would prevent the -v and -…
…V flags from working properly when using nccopy. See #425 and #463 for more information.
@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Aug 31, 2017

Contributor

Regarding to file size issue, firstly all classical formats, including CDF-5, do no compression.
One possibility for what you encountered is when this CDF-5 file was created, not all elements of the second variable were written and the fill mode was turned off (If the file was created by a PnetCDF program, please note the fill mode is off by default in PnetCDF). In this case, because the second variable was not fully written, the file size can be less than 12GB.

There is a PnetCDF utility program called ncoffsets. Command "ncoffsets file.nc" prints the starting and ending file offsets of individual variables defined in a classical file. When used for the above CDF-5 file, it should print the ending offset 12GB for the second variable, but command "ls -l" can still show a number less than that. Please give it a try and let me know.

Contributor

wkliao commented Aug 31, 2017

Regarding to file size issue, firstly all classical formats, including CDF-5, do no compression.
One possibility for what you encountered is when this CDF-5 file was created, not all elements of the second variable were written and the fill mode was turned off (If the file was created by a PnetCDF program, please note the fill mode is off by default in PnetCDF). In this case, because the second variable was not fully written, the file size can be less than 12GB.

There is a PnetCDF utility program called ncoffsets. Command "ncoffsets file.nc" prints the starting and ending file offsets of individual variables defined in a classical file. When used for the above CDF-5 file, it should print the ending offset 12GB for the second variable, but command "ls -l" can still show a number less than that. Please give it a try and let me know.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Sep 4, 2017

@WardF please let me know when the nccopy fixes are in master so I can check whether nccopy and NCO give the same answers when subsetting huge CDF5 files.

czender commented Sep 4, 2017

@WardF please let me know when the nccopy fixes are in master so I can check whether nccopy and NCO give the same answers when subsetting huge CDF5 files.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Sep 11, 2017

@wkliao need to know if you think intercepting nc_put_var?_*() to split single CDF5 write requests for data buffers larger than N into multiple write requests of buffers smaller than N will avoid this bug. And, if so, what is N? And do you still think that CDF5 reads are not affected?

czender commented Sep 11, 2017

@wkliao need to know if you think intercepting nc_put_var?_*() to split single CDF5 write requests for data buffers larger than N into multiple write requests of buffers smaller than N will avoid this bug. And, if so, what is N? And do you still think that CDF5 reads are not affected?

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 11, 2017

Member

When running python tests against libnetcdf built the patch from @wkliao, I see the following (on 64-bit systems only).

This will need to be sorted out before merging this fix in or saying that it 'fixes' the problem.

netcdf4-python version: 1.3.0
HDF5 lib version:       1.8.19
netcdf lib version:     4.5.1-development
numpy version           1.11.0
...............................foo_bar
.http://remotetest.unidata.ucar.edu/thredds/dodsC/testdods/testData.nc => /tmp/occookieKmOOvt
..............................................python: /home/tester/netcdf-c/libsrc/nc3internal.c:794: NC_endef: Assertion `ncp->begin_rec >= ncp->old->begin_rec' failed.

It's possible it is a problem with the python test; I'll ask our in-house python guys and see what they say :)

Member

WardF commented Sep 11, 2017

When running python tests against libnetcdf built the patch from @wkliao, I see the following (on 64-bit systems only).

This will need to be sorted out before merging this fix in or saying that it 'fixes' the problem.

netcdf4-python version: 1.3.0
HDF5 lib version:       1.8.19
netcdf lib version:     4.5.1-development
numpy version           1.11.0
...............................foo_bar
.http://remotetest.unidata.ucar.edu/thredds/dodsC/testdods/testData.nc => /tmp/occookieKmOOvt
..............................................python: /home/tester/netcdf-c/libsrc/nc3internal.c:794: NC_endef: Assertion `ncp->begin_rec >= ncp->old->begin_rec' failed.

It's possible it is a problem with the python test; I'll ask our in-house python guys and see what they say :)

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 11, 2017

Member

I've determined the python test which is failing is tst_cdf5.py.

Member

WardF commented Sep 11, 2017

I've determined the python test which is failing is tst_cdf5.py.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 11, 2017

Member

The test is as follows; does anything leap out?

 def setUp(self):
        self.netcdf_file = FILE_NAME
        nc = Dataset(self.netcdf_file,'w',format='NETCDF3_64BIT_DATA')
        # create a 64-bit dimension
        d = nc.createDimension('dim',dimsize) # 64-bit dimension
        # create an 8-bit unsigned integer variable
        v = nc.createVariable('var',np.uint8,'dim')
        v[:ndim] = arrdata
        nc.close()
Member

WardF commented Sep 11, 2017

The test is as follows; does anything leap out?

 def setUp(self):
        self.netcdf_file = FILE_NAME
        nc = Dataset(self.netcdf_file,'w',format='NETCDF3_64BIT_DATA')
        # create a 64-bit dimension
        d = nc.createDimension('dim',dimsize) # 64-bit dimension
        # create an 8-bit unsigned integer variable
        v = nc.createVariable('var',np.uint8,'dim')
        v[:ndim] = arrdata
        nc.close()
@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 12, 2017

Contributor

@czender

The bug appears when defining more than one large variables in a new file. So split a large put request to smaller ones will not fix the bug.

If you are developing a workaround in NCO, then I suggest to check the number of large variables and create a new file that contains only one large variable and make sure the large variable is defined the last.

I still believe the bug affects writes only, as the fixes I developed are in the subroutines only called by the file header writer. However, it is better to have a test program to check.

Contributor

wkliao commented Sep 12, 2017

@czender

The bug appears when defining more than one large variables in a new file. So split a large put request to smaller ones will not fix the bug.

If you are developing a workaround in NCO, then I suggest to check the number of large variables and create a new file that contains only one large variable and make sure the large variable is defined the last.

I still believe the bug affects writes only, as the fixes I developed are in the subroutines only called by the file header writer. However, it is better to have a test program to check.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Sep 12, 2017

@wklian does "large" in your message above mean 2 GiB or 4 GiB or ...?

czender commented Sep 12, 2017

@wklian does "large" in your message above mean 2 GiB or 4 GiB or ...?

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 12, 2017

Contributor

It is mentioned on one of my previous posts. Here it is copy and pasted.

Large variables here means their size each is > 2^31-3 bytes for CDF-1 and 2^32-3 bytes for CDF-2. See NetCDF Format Limitations

Contributor

wkliao commented Sep 12, 2017

It is mentioned on one of my previous posts. Here it is copy and pasted.

Large variables here means their size each is > 2^31-3 bytes for CDF-1 and 2^32-3 bytes for CDF-2. See NetCDF Format Limitations

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Sep 12, 2017

I don't understand. I'm talking about writing a CDF5 file with netCDF 4.4.x. Not CDF1 or CDF2. What is the largest variable I can safely write as the last variable in a CDF5 file?

czender commented Sep 12, 2017

I don't understand. I'm talking about writing a CDF5 file with netCDF 4.4.x. Not CDF1 or CDF2. What is the largest variable I can safely write as the last variable in a CDF5 file?

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 12, 2017

Contributor

Sorry. Let me re-phrase, when using netCDF 4.4.x to create a new CDF-5 file, the file can only contain one large variable at most and it must be defined last. The large variable is of size > 2^32-3 bytes.

To be honest, I really do not recommend a workaround for netCDF 4.4.x, because the above suggestion has never fully been tested. This suggestion is based on my understanding to the root of the bug.

Contributor

wkliao commented Sep 12, 2017

Sorry. Let me re-phrase, when using netCDF 4.4.x to create a new CDF-5 file, the file can only contain one large variable at most and it must be defined last. The large variable is of size > 2^32-3 bytes.

To be honest, I really do not recommend a workaround for netCDF 4.4.x, because the above suggestion has never fully been tested. This suggestion is based on my understanding to the root of the bug.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Sep 12, 2017

That's OK. I'm not writing a workaround. I'm writing a diagnostic WARNING message for those who, in the future, with NCO 4.6.9+, attempt to write a CDF5 file with netCDF 4.4.x that may trigger the bug.

czender commented Sep 12, 2017

That's OK. I'm not writing a workaround. I'm writing a diagnostic WARNING message for those who, in the future, with NCO 4.6.9+, attempt to write a CDF5 file with netCDF 4.4.x that may trigger the bug.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Sep 12, 2017

Member

Ward- you should be able to create a c program equivalent to that python program
to see if it fails also.

Member

DennisHeimbigner commented Sep 12, 2017

Ward- you should be able to create a c program equivalent to that python program
to see if it fails also.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 21, 2017

Member

I've added a couple of configure-time options for disabling cdf5 support. Eventually this will be set automatically for 32-bit platforms. I may turn it off by default for the next release candidate just for the sake of expediency, but I don't want to cause any problems for dependent packages. @czender does NCO assume cdf5 support? Or does it query the netcdf library for it?

Member

WardF commented Sep 21, 2017

I've added a couple of configure-time options for disabling cdf5 support. Eventually this will be set automatically for 32-bit platforms. I may turn it off by default for the next release candidate just for the sake of expediency, but I don't want to cause any problems for dependent packages. @czender does NCO assume cdf5 support? Or does it query the netcdf library for it?

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Sep 21, 2017

Thanks for asking. NCO assumes CDF5 when linked to 4.4.x or greater, i.e.,

if(NC_LIB_VERSION >= 440){ CDF5 stuff }else{ WARN no CDF5 support}

We could shift to, e.g., an #ifdef HAVE_CDF5 method if given a heads-up.

czender commented Sep 21, 2017

Thanks for asking. NCO assumes CDF5 when linked to 4.4.x or greater, i.e.,

if(NC_LIB_VERSION >= 440){ CDF5 stuff }else{ WARN no CDF5 support}

We could shift to, e.g., an #ifdef HAVE_CDF5 method if given a heads-up.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 21, 2017

Member

@czender nc-config has a new flag --has-cdf5, would querying that work?

Member

WardF commented Sep 21, 2017

@czender nc-config has a new flag --has-cdf5, would querying that work?

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 21, 2017

Member

By the time we're done, having CDF5 support would be the case in 99% of installations. However, being able to toggle it lets us craft a release while still working to sort out the issues outlined above. I'm going to refresh my memory on the above and also go review the @wkliao pull requests so that we can move forward.

Member

WardF commented Sep 21, 2017

By the time we're done, having CDF5 support would be the case in 99% of installations. However, being able to toggle it lets us craft a release while still working to sort out the issues outlined above. I'm going to refresh my memory on the above and also go review the @wkliao pull requests so that we can move forward.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Sep 21, 2017

Umm, sort of. We can cue from nc-config --has-cdf5 with v. 4.5.x. Earlier versions will not have this and nc-config will crash if asked about CDF5. So NCO would need to implement a multi-staged rule where autoconf/cmake first finds the version, makes that machine-parseable, then sets HAVE_CDF5 to No for < 4.4.x, to Yes for 4.4.x, and query nc-config for >= 4.5.x. Or something like that. Or NCO could just do nothing and file with UNKNOWN_FORMAT on CDF5 files.

czender commented Sep 21, 2017

Umm, sort of. We can cue from nc-config --has-cdf5 with v. 4.5.x. Earlier versions will not have this and nc-config will crash if asked about CDF5. So NCO would need to implement a multi-staged rule where autoconf/cmake first finds the version, makes that machine-parseable, then sets HAVE_CDF5 to No for < 4.4.x, to Yes for 4.4.x, and query nc-config for >= 4.5.x. Or something like that. Or NCO could just do nothing and file with UNKNOWN_FORMAT on CDF5 files.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Oct 21, 2017

It appears that netCDF 4.5.0 has been released without a fix for this CDF5 issue. My collaborators want netCDF with dependable CDF5 on 64-bit machines and do not care about 32-bit environments. Is it likely their issues will be addressed in 4.5.1? Or is future netCDF support for CDF5 uncertain? Or...?

czender commented Oct 21, 2017

It appears that netCDF 4.5.0 has been released without a fix for this CDF5 issue. My collaborators want netCDF with dependable CDF5 on 64-bit machines and do not care about 32-bit environments. Is it likely their issues will be addressed in 4.5.1? Or is future netCDF support for CDF5 uncertain? Or...?

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Oct 21, 2017

Member

It will be addressed in 4.5.1 insofar as we will enforce no cdf5 writing on 32-bit. You can enable cdf5 in 4.5.0 at configure time with —enable-cdf5. The release had languished enough that disabling it (by default) and getting 4.5.0 out the door was necessary.

Member

WardF commented Oct 21, 2017

It will be addressed in 4.5.1 insofar as we will enforce no cdf5 writing on 32-bit. You can enable cdf5 in 4.5.0 at configure time with —enable-cdf5. The release had languished enough that disabling it (by default) and getting 4.5.0 out the door was necessary.

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Oct 21, 2017

To be clear, 4.5.0 does not seem to address the problem I reported in this issue. Will some version of #478, that appears to solve the problem I reported, and that fixes the test I wrote, be in 4.5.1?

czender commented Oct 21, 2017

To be clear, 4.5.0 does not seem to address the problem I reported in this issue. Will some version of #478, that appears to solve the problem I reported, and that fixes the test I wrote, be in 4.5.1?

@czender

This comment has been minimized.

Show comment
Hide comment
@czender

czender Jan 9, 2018

There have been no recent updates on this. Is a fix still planned? for 4.6.0? In my other hat as a GCM developer this is the most critical netCDF bug I am aware of because it prevents analysis of high-resolution simulations conducted with CDF5 format, and there is no easy workaround. There is some resistance/inertia to netCDF4 in the GCM community because PnetCDF and CDF5 have advantages in parallel speed and familiarity. However, this persistent bug is causing at least one GCM group to seriously consider alternatives to CDF5.

czender commented Jan 9, 2018

There have been no recent updates on this. Is a fix still planned? for 4.6.0? In my other hat as a GCM developer this is the most critical netCDF bug I am aware of because it prevents analysis of high-resolution simulations conducted with CDF5 format, and there is no easy workaround. There is some resistance/inertia to netCDF4 in the GCM community because PnetCDF and CDF5 have advantages in parallel speed and familiarity. However, this persistent bug is causing at least one GCM group to seriously consider alternatives to CDF5.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Jan 9, 2018

Contributor

The fix is in #478. At least in my opinion it is ready.
It has been marked as a 4.6.0 milestone.

Contributor

wkliao commented Jan 9, 2018

The fix is in #478. At least in my opinion it is ready.
It has been marked as a 4.6.0 milestone.

@ckhroulev

This comment has been minimized.

Show comment
Hide comment
@ckhroulev

ckhroulev Jan 25, 2018

We (@pism developers) also think that this is the most critical netCDF bug we know of.

We would like to use CDF5 but cannot for exactly the reasons listed by @czender : this bug "prevents analysis of high-resolution simulations conducted with CDF5 format, and there is no easy workaround."

ckhroulev commented Jan 25, 2018

We (@pism developers) also think that this is the most critical netCDF bug we know of.

We would like to use CDF5 but cannot for exactly the reasons listed by @czender : this bug "prevents analysis of high-resolution simulations conducted with CDF5 format, and there is no easy workaround."

@WardF WardF modified the milestones: 4.5.0, 4.6.1 Jan 25, 2018

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Jan 25, 2018

Member

Getting #478 merged now.

Member

WardF commented Jan 25, 2018

Getting #478 merged now.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Jan 29, 2018

Member

Had a couple bumps with #478 on ARM, re-evaluating now.

Member

WardF commented Jan 29, 2018

Had a couple bumps with #478 on ARM, re-evaluating now.

@edhartnett

This comment has been minimized.

Show comment
Hide comment
@edhartnett

edhartnett Jan 29, 2018

Contributor

I have merged this into HPC netCDF and it passes all test. (But I am not testing on my ARM. I will turn that on...)

Contributor

edhartnett commented Jan 29, 2018

I have merged this into HPC netCDF and it passes all test. (But I am not testing on my ARM. I will turn that on...)

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Mar 13, 2018

Member

Fixed with #478 being merged.

Member

WardF commented Mar 13, 2018

Fixed with #478 being merged.

@WardF WardF closed this Mar 13, 2018

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue May 16, 2018

wen
Update to 4.6.1
Upstream changes:
## 4.6.1 - March 15, 2018

* [Bug Fix] Corrected an issue which could result in a dap4 failure. See [Github #888](Unidata/netcdf-c#888) for more information.
* [Bug Fix][Enhancement] Allow `nccopy` to control output filter suppresion.  See [Github #894](Unidata/netcdf-c#894) for more information.
* [Enhancement] Reverted some new behaviors that, while in line with the netCDF specification, broke existing workflows.  See [Github #843](Unidata/netcdf-c#843) for more information.
* [Bug Fix] Improved support for CRT builds with Visual Studio, improves zlib detection in hdf5 library. See [Github #853](Unidata/netcdf-c#853) for more information.
* [Enhancement][Internal] Moved HDF4 into a distinct dispatch layer. See [Github #849](Unidata/netcdf-c#849) for more information.

## 4.6.0 - January 24, 2018
* [Enhancement] Full support for using HDF5 dynamic filters, both for reading and writing. See the file docs/filters.md.
* [Enhancement] Added an option to enable strict null-byte padding for headers; this padding was specified in the spec but was not enforced.  Enabling this option will allow you to check your files, as it will return an E_NULLPAD error.  It is possible for these files to have been written by older versions of libnetcdf.  There is no effective problem caused by this lack of null padding, so enabling these options is informational only.  The options for `configure` and `cmake` are `--enable-strict-null-byte-header-padding` and `-DENABLE_STRICT_NULL_BYTE_HEADER_PADDING`, respectively.  See [Github #657](Unidata/netcdf-c#657) for more information.
* [Enhancement] Reverted behavior/handling of out-of-range attribute values to pre-4.5.0 default. See [Github #512](Unidata/netcdf-c#512) for more information.
* [Bug] Fixed error in tst_parallel2.c. See [Github #545](Unidata/netcdf-c#545) for more information.
* [Bug] Fixed handling of corrupt files + proper offset handling for hdf5 files. See [Github #552](Unidata/netcdf-c#552) for more information.
* [Bug] Corrected a memory overflow in `tst_h_dimscales`, see [Github #511](Unidata/netcdf-c#511), [Github #505](Unidata/netcdf-c#505), [Github #363](Unidata/netcdf-c#363) and [Github #244](Unidata/netcdf-c#244) for more information.

## 4.5.0 - October 20, 2017

* Corrected an issue which could potential result in a hang while using parallel file I/O. See [Github #449](Unidata/netcdf-c#449) for more information.
* Addressed an issue with `ncdump` not properly handling dates on a 366 day calendar. See [GitHub #359](Unidata/netcdf-c#359) for more information.

### 4.5.0-rc3 - September 29, 2017

* [Update] Due to ongoing issues, native CDF5 support has been disabled by **default**.  You can use the options mentioned below (`--enable-cdf5` or `-DENABLE_CDF5=TRUE` for `configure` or `cmake`, respectively).  Just be aware that for the time being, Reading/Writing CDF5 files on 32-bit platforms may result in unexpected behavior when using extremely large variables.  For 32-bit platforms it is best to continue using `NC_FORMAT_64BIT_OFFSET`.
* [Bug] Corrected an issue where older versions of curl might fail. See [GitHub #487](Unidata/netcdf-c#487) for more information.
* [Enhancement] Added options to enable/disable `CDF5` support at configure time for autotools and cmake-based builds.  The options are `--enable/disable-cdf5` and `ENABLE_CDF5`, respectively.  See [Github #484](Unidata/netcdf-c#484) for more information.
* [Bug Fix] Corrected an issue when subsetting a netcdf3 file via `nccopy -v/-V`. See [Github #425](Unidata/netcdf-c#425) and [Github #463](Unidata/netcdf-c#463) for more information.
* [Bug Fix] Corrected `--has-dap` and `--has-dap4` output for cmake-based builds. See [GitHub #473](Unidata/netcdf-c#473) for more information.
* [Bug Fix] Corrected an issue where `NC_64BIT_DATA` files were being read incorrectly by ncdump, despite the data having been written correctly.  See [GitHub #457](Unidata/netcdf-c#457) for more information.
* [Bug Fix] Corrected a potential stack buffer overflow.  See [GitHub #450](Unidata/netcdf-c#450) for more information.

### 4.5.0-rc2 - August 7, 2017

* [Bug Fix] Addressed an issue with how cmake was implementing large file support on 32-bit systems. See [GitHub #385](Unidata/netcdf-c#385) for more information.
* [Bug Fix] Addressed an issue where ncgen would not respect keyword case. See [GitHub #310](Unidata/netcdf-c#310) for more information.

### 4.5.0-rc1 - June 5, 2017

* [Enhancement] DAP4 is now included. Since dap2 is the default for urls, dap4 must be specified by
(1) using "dap4:" as the url protocol, or
(2) appending "#protocol=dap4" to the end of the url, or
(3) appending "#dap4" to the end of the url
Note that dap4 is enabled by default but remote-testing is
disbled until the testserver situation is resolved.
* [Enhancement] The remote testing server can now be specified with the `--with-testserver` option to ./configure.
* [Enhancement] Modified netCDF4 to use ASCII for NC_CHAR.  See [Github Pull request #316](Unidata/netcdf-c#316) for more information.
* [Bug Fix] Corrected an error with how dimsizes might be read. See [Github #410](Unidata/netcdf-c#410) for more information.
* [Bug Fix] Corrected an issue where 'make check' would fail if 'make' or 'make all' had not run first.  See [Github #339](Unidata/netcdf-c#339) for more information.
* [Bug Fix] Corrected an issue on Windows with Large file tests. See [Github #385](Unidata/netcdf-c#385]) for more information.
* [Bug Fix] Corrected an issue with diskless file access, see [Pull Request #400](Unidata/netcdf-c#400) and [Pull Request #403](Unidata/netcdf-c#403) for more information.
* [Upgrade] The bash based test scripts have been upgraded to use a common test_common.sh include file that isolates build specific information.
* [Upgrade] The bash based test scripts have been upgraded to use a common test_common.sh include file that isolates build specific information.
* [Refactor] the oc2 library is no longer independent of the main netcdf-c library. For example, it now uses ncuri, nclist, and ncbytes instead of its homegrown equivalents.
* [Bug Fix] `NC_EGLOBAL` is now properly returned when attempting to set a global `_FillValue` attribute. See [GitHub #388](Unidata/netcdf-c#388) and [GitHub #389](Unidata/netcdf-c#389) for more information.
* [Bug Fix] Corrected an issue where data loss would occur when `_FillValue` was mistakenly allowed to be redefined.  See [Github #390](Unidata/netcdf-c#390), [GitHub #387](Unidata/netcdf-c#387) for more information.
* [Upgrade][Bug] Corrected an issue regarding how "orphaned" DAS attributes were handled. See [GitHub #376](Unidata/netcdf-c#376) for more information.
* [Upgrade] Update utf8proc.[ch] to use the version now maintained by the Julia Language project (https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).
* [Bug] Addressed conversion problem with Windows sscanf.  This primarily affected some OPeNDAP URLs on Windows.  See [GitHub #365](Unidata/netcdf-c#365) and [GitHub #366](Unidata/netcdf-c#366) for more information.
* [Enhancement] Added support for HDF5 collective metadata operations when available. Patch submitted by Greg Sjaardema, see [Pull request #335](Unidata/netcdf-c#335) for more information.
* [Bug] Addressed a potential type punning issue. See [GitHub #351](Unidata/netcdf-c#351) for more information.
* [Bug] Addressed an issue where netCDF wouldn't build on Windows systems using MSVC 2012. See [GitHub #304](Unidata/netcdf-c#304) for more information.
* [Bug] Fixed an issue related to potential type punning, see [GitHub #344](Unidata/netcdf-c#344) for more information.
* [Enhancement] Incorporated an enhancement provided by Greg Sjaardema, which may improve read/write times for some complex files.  Basically, linked lists were replaced in some locations where it was safe to use an array/table.  See [Pull request #328](Unidata/netcdf-c#328) for more information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment