New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDF-5 fix: let NC_var.len be the true size of variable #478

Merged
merged 17 commits into from Feb 4, 2018

Conversation

Projects
None yet
4 participants
@wkliao
Contributor

wkliao commented Sep 7, 2017

This PR should fix the problem discussed #463

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Sep 7, 2017

Member

Ok, after checking, this change is incorrect. The use of UINT MAX is used as a flag to
indicated 64-bit-offset. So we need to keep it. A better long term solution is to change
all of the current size_t fields like ->len and ->recsize to be unsigned long long.
Then we can properly compute lengths without having too worry about that special flag.
However, we do need to be careful in writing out the file length to keep the flag in the
netcdf file indicating large file size.

Member

DennisHeimbigner commented Sep 7, 2017

Ok, after checking, this change is incorrect. The use of UINT MAX is used as a flag to
indicated 64-bit-offset. So we need to keep it. A better long term solution is to change
all of the current size_t fields like ->len and ->recsize to be unsigned long long.
Then we can properly compute lengths without having too worry about that special flag.
However, we do need to be careful in writing out the file length to keep the flag in the
netcdf file indicating large file size.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Sep 8, 2017

Member

Whatever fix we get, I think it is important to test the following 6 configurations.
(64bit,32bit) X (netcdf3,cdf5,64-bit-offset).
We also need to make sure to enable large file tests and to test with
Charlie's ncap2 tests.

Member

DennisHeimbigner commented Sep 8, 2017

Whatever fix we get, I think it is important to test the following 6 configurations.
(64bit,32bit) X (netcdf3,cdf5,64-bit-offset).
We also need to make sure to enable large file tests and to test with
Charlie's ncap2 tests.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 11, 2017

Member

When running python tests against libnetcdf built with this patch, I see the following (on 64-bit systems only).

This will need to be sorted out before merging this fix in or saying that it 'fixes' the problem.

netcdf4-python version: 1.3.0
HDF5 lib version:       1.8.19
netcdf lib version:     4.5.1-development
numpy version           1.11.0
...............................foo_bar
.http://remotetest.unidata.ucar.edu/thredds/dodsC/testdods/testData.nc => /tmp/occookieKmOOvt
..............................................python: /home/tester/netcdf-c/libsrc/nc3internal.c:794: NC_endef: Assertion `ncp->begin_rec >= ncp->old->begin_rec' failed.
Member

WardF commented Sep 11, 2017

When running python tests against libnetcdf built with this patch, I see the following (on 64-bit systems only).

This will need to be sorted out before merging this fix in or saying that it 'fixes' the problem.

netcdf4-python version: 1.3.0
HDF5 lib version:       1.8.19
netcdf lib version:     4.5.1-development
numpy version           1.11.0
...............................foo_bar
.http://remotetest.unidata.ucar.edu/thredds/dodsC/testdods/testData.nc => /tmp/occookieKmOOvt
..............................................python: /home/tester/netcdf-c/libsrc/nc3internal.c:794: NC_endef: Assertion `ncp->begin_rec >= ncp->old->begin_rec' failed.
@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 12, 2017

Contributor

Where to obtain netcdf4-python version: 1.3.0?
I only see 1.2.9.

Contributor

wkliao commented Sep 12, 2017

Where to obtain netcdf4-python version: 1.3.0?
I only see 1.2.9.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 12, 2017

Contributor

@WardF

I do not have a netcdf4-python installed nearby.
Could you do a quick check of this new additional patch against the python test program?

Contributor

wkliao commented Sep 12, 2017

@WardF

I do not have a netcdf4-python installed nearby.
Could you do a quick check of this new additional patch against the python test program?

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 12, 2017

Contributor

@WardF

I think you were using branch gh478 to run this test, because the assert statement in line 794 of libsrc/nc3internal.c in gh478 has become line 797 in my patch.

It appears that gh478 does not include my patch (this pr 478).

Contributor

wkliao commented Sep 12, 2017

@WardF

I think you were using branch gh478 to run this test, because the assert statement in line 794 of libsrc/nc3internal.c in gh478 has become line 797 in my patch.

It appears that gh478 does not include my patch (this pr 478).

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 12, 2017

Member

You are correct. gh478 does not contain your patch. I merged it locally to a temporary branch and observed the failures. I will test the updated patch against NetCDF python tomorrow and follow up here.

Member

WardF commented Sep 12, 2017

You are correct. gh478 does not contain your patch. I merged it locally to a temporary branch and observed the failures. I will test the updated patch against NetCDF python tomorrow and follow up here.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 12, 2017

Contributor

Regarding to developing test programs for 32-bit machines, I think NetCDF may not be able to support CDF-5 on 32-bit machines. The main obstacle is "size_t" being a 4-byte integer on 32-bit platforms, and most of the netCDF APIs have arguments of type size_t. For instance, one cannot define a dimension of size > 2^32, because argument "len" is of type size_t in the "nc_def_dim".

     int nc_def_dim (int ncid, const char *name, size_t len, int *dimidp);

To support CDF-5 on 32-bit machines, NetCDF needs to change its APIs, which will not be backward compatible. Maybe the only option is to disable CDF-5 on 32-bit machine?

Contributor

wkliao commented Sep 12, 2017

Regarding to developing test programs for 32-bit machines, I think NetCDF may not be able to support CDF-5 on 32-bit machines. The main obstacle is "size_t" being a 4-byte integer on 32-bit platforms, and most of the netCDF APIs have arguments of type size_t. For instance, one cannot define a dimension of size > 2^32, because argument "len" is of type size_t in the "nc_def_dim".

     int nc_def_dim (int ncid, const char *name, size_t len, int *dimidp);

To support CDF-5 on 32-bit machines, NetCDF needs to change its APIs, which will not be backward compatible. Maybe the only option is to disable CDF-5 on 32-bit machine?

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 12, 2017

Member

@wkliao 1.3.0 is coming through from the master test being used. I'll check 1.2.9 as well.

Member

WardF commented Sep 12, 2017

@wkliao 1.3.0 is coming through from the master test being used. I'll check 1.2.9 as well.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Sep 12, 2017

Member

1.2.9 also fails in the same place. The CDF-5 on 32-bit machines is problematic and I will have to think about it. In the short term we definitely need to add a way at configure time to make CDF5 support optional. If we are on a 32-bit platform we can disable it by default, possibly, if that is the approach we decide to take. In any event it would give us an avenue for providing a release while working around CDF5-specific issues.

For what it is worth, this netcdf-python failure is only manifesting on 64-bit platforms, not 32-bit platforms.

Member

WardF commented Sep 12, 2017

1.2.9 also fails in the same place. The CDF-5 on 32-bit machines is problematic and I will have to think about it. In the short term we definitely need to add a way at configure time to make CDF5 support optional. If we are on a 32-bit platform we can disable it by default, possibly, if that is the approach we decide to take. In any event it would give us an avenue for providing a release while working around CDF5-specific issues.

For what it is worth, this netcdf-python failure is only manifesting on 64-bit platforms, not 32-bit platforms.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 13, 2017

Contributor

That python program, tst_cdf5.py, itself is also problematic. Line 7 sets "dimsize" to the maximum value of a signed 64-bit integer, 9223372036854775807 or NC_MAX_INT64. (By the way, the comment should be fixed.) Then, line 19 defines a 1-D variable of type unsigned int with NC_MAX_INT64 elements, which makes the variable size (=NC_MAX_INT64 x 4 bytes) bigger than an unsigned 64-bit integer can represent (overflow). In this case, NetCDF should throw an error code, such as NC_EINTOVERFLOW. By the way, in PnetCDF, the limit will be max signed 64-bit integer (X_INT64_MAX - 3), not unsigned, as we use MPI_Offset which is a signed long long.

...
  7 dimsize = np.iinfo(np.int64).max # max unsigned 64 bit integer
...
 17         d = nc.createDimension('dim',dimsize) # 64-bit dimension
 18         # create an 8-bit unsigned integer variable
 19         v = nc.createVariable('var',np.uint8,'dim')

I wrote a C program to mimic the python test program and it ran without error. However, the correct behavior should be trowing NC_EVARSIZE.

% cat big_dim.c
#include <stdio.h>
#include <netcdf.h>

#define ERR {if(err!=NC_NOERR){printf("Error at line %d in %s: %s\n", __LINE__,__FILE__, nc_strerror(err));nerrs++;}}

int main(int argc, char *argv[])
{
    int err, nerrs=0, ncid, dimid, varid;
    err = nc_create("cdf5.nc", NC_CLOBBER|NC_64BIT_DATA, &ncid); ERR;
    err = nc_def_dim(ncid, "dim", NC_MAX_INT64, &dimid); ERR;
    err = nc_def_var(ncid, "var1", NC_UINT, 1, &dimid, &varid); ERR
    err = nc_set_fill(ncid, NC_NOFILL, NULL); ERR
    err = nc_close(ncid); ERR
    return (nerrs > 0);
}
Contributor

wkliao commented Sep 13, 2017

That python program, tst_cdf5.py, itself is also problematic. Line 7 sets "dimsize" to the maximum value of a signed 64-bit integer, 9223372036854775807 or NC_MAX_INT64. (By the way, the comment should be fixed.) Then, line 19 defines a 1-D variable of type unsigned int with NC_MAX_INT64 elements, which makes the variable size (=NC_MAX_INT64 x 4 bytes) bigger than an unsigned 64-bit integer can represent (overflow). In this case, NetCDF should throw an error code, such as NC_EINTOVERFLOW. By the way, in PnetCDF, the limit will be max signed 64-bit integer (X_INT64_MAX - 3), not unsigned, as we use MPI_Offset which is a signed long long.

...
  7 dimsize = np.iinfo(np.int64).max # max unsigned 64 bit integer
...
 17         d = nc.createDimension('dim',dimsize) # 64-bit dimension
 18         # create an 8-bit unsigned integer variable
 19         v = nc.createVariable('var',np.uint8,'dim')

I wrote a C program to mimic the python test program and it ran without error. However, the correct behavior should be trowing NC_EVARSIZE.

% cat big_dim.c
#include <stdio.h>
#include <netcdf.h>

#define ERR {if(err!=NC_NOERR){printf("Error at line %d in %s: %s\n", __LINE__,__FILE__, nc_strerror(err));nerrs++;}}

int main(int argc, char *argv[])
{
    int err, nerrs=0, ncid, dimid, varid;
    err = nc_create("cdf5.nc", NC_CLOBBER|NC_64BIT_DATA, &ncid); ERR;
    err = nc_def_dim(ncid, "dim", NC_MAX_INT64, &dimid); ERR;
    err = nc_def_var(ncid, "var1", NC_UINT, 1, &dimid, &varid); ERR
    err = nc_set_fill(ncid, NC_NOFILL, NULL); ERR
    err = nc_close(ncid); ERR
    return (nerrs > 0);
}
@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 13, 2017

Contributor

This patch 8b3d32c checks variable size against (X_INT64_MAX - 3) for CDF-5 files and throws NC_EVARSIZE. Below is a revised test program to check the expected error code.

#include <stdio.h>
#include <netcdf.h>

#define ERR {if(err!=NC_NOERR){printf("Error at line %d in %s: %s\n", __LINE__,__FILE__, nc_strerror(err));nerrs++;}}

#define EXP_ERR(exp,err) { \
    if (err != exp) { \
        nerrs++; \
        printf("Error at line %d in %s: expecting %s but got %d\n", \
        __LINE__,__FILE__,#exp, err); \
    } \
}

int main(int argc, char *argv[])
{
    int err, nerrs=0, ncid, dimid, varid;
    err = nc_create("cdf5.nc", NC_CLOBBER|NC_64BIT_DATA, &ncid); ERR;
    err = nc_def_dim(ncid, "dim", NC_MAX_INT64, &dimid); ERR;
    err = nc_def_var(ncid, "var1", NC_UINT, 1, &dimid, &varid); ERR
    err = nc_set_fill(ncid, NC_NOFILL, NULL); ERR
    err = nc_close(ncid); EXP_ERR(NC_EVARSIZE,err)
    return (nerrs > 0);
}
Contributor

wkliao commented Sep 13, 2017

This patch 8b3d32c checks variable size against (X_INT64_MAX - 3) for CDF-5 files and throws NC_EVARSIZE. Below is a revised test program to check the expected error code.

#include <stdio.h>
#include <netcdf.h>

#define ERR {if(err!=NC_NOERR){printf("Error at line %d in %s: %s\n", __LINE__,__FILE__, nc_strerror(err));nerrs++;}}

#define EXP_ERR(exp,err) { \
    if (err != exp) { \
        nerrs++; \
        printf("Error at line %d in %s: expecting %s but got %d\n", \
        __LINE__,__FILE__,#exp, err); \
    } \
}

int main(int argc, char *argv[])
{
    int err, nerrs=0, ncid, dimid, varid;
    err = nc_create("cdf5.nc", NC_CLOBBER|NC_64BIT_DATA, &ncid); ERR;
    err = nc_def_dim(ncid, "dim", NC_MAX_INT64, &dimid); ERR;
    err = nc_def_var(ncid, "var1", NC_UINT, 1, &dimid, &varid); ERR
    err = nc_set_fill(ncid, NC_NOFILL, NULL); ERR
    err = nc_close(ncid); EXP_ERR(NC_EVARSIZE,err)
    return (nerrs > 0);
}
@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 13, 2017

Contributor

I think throwing NC_EVARSIZE in nc_def_var is better than in nc_enddef.
The test program is revised accordingly.

#include <stdio.h>
#include <netcdf.h>

#define ERR {if(err!=NC_NOERR){printf("Error at line %d in %s: %s\n", __LINE__,__FILE__, nc_strerror(err));nerrs++;}}

#define EXP_ERR(exp,err) { \
    if (err != exp) { \
        nerrs++; \
        printf("Error at line %d in %s: expecting %s but got %d\n", \
        __LINE__,__FILE__,#exp, err); \
    } \
}

int main(int argc, char *argv[])
{
    int err, nerrs=0, ncid, dimid[2], varid[2];
    err = nc_create("cdf5.nc", NC_CLOBBER|NC_64BIT_DATA, &ncid); ERR;
    err = nc_def_dim(ncid, "dim0", NC_UNLIMITED, &dimid[0]); ERR;
    err = nc_def_dim(ncid, "dim1", NC_MAX_INT64, &dimid[1]); ERR;

    err = nc_def_var(ncid, "var0", NC_UINT, 1, &dimid[1], &varid[0]);
    EXP_ERR(NC_EVARSIZE,err)

    err = nc_def_var(ncid, "var1", NC_UINT, 2, &dimid[0], &varid[1]);
    EXP_ERR(NC_EVARSIZE,err)

    err = nc_set_fill(ncid, NC_NOFILL, NULL); ERR
    err = nc_close(ncid); ERR
    return (nerrs > 0);
}
Contributor

wkliao commented Sep 13, 2017

I think throwing NC_EVARSIZE in nc_def_var is better than in nc_enddef.
The test program is revised accordingly.

#include <stdio.h>
#include <netcdf.h>

#define ERR {if(err!=NC_NOERR){printf("Error at line %d in %s: %s\n", __LINE__,__FILE__, nc_strerror(err));nerrs++;}}

#define EXP_ERR(exp,err) { \
    if (err != exp) { \
        nerrs++; \
        printf("Error at line %d in %s: expecting %s but got %d\n", \
        __LINE__,__FILE__,#exp, err); \
    } \
}

int main(int argc, char *argv[])
{
    int err, nerrs=0, ncid, dimid[2], varid[2];
    err = nc_create("cdf5.nc", NC_CLOBBER|NC_64BIT_DATA, &ncid); ERR;
    err = nc_def_dim(ncid, "dim0", NC_UNLIMITED, &dimid[0]); ERR;
    err = nc_def_dim(ncid, "dim1", NC_MAX_INT64, &dimid[1]); ERR;

    err = nc_def_var(ncid, "var0", NC_UINT, 1, &dimid[1], &varid[0]);
    EXP_ERR(NC_EVARSIZE,err)

    err = nc_def_var(ncid, "var1", NC_UINT, 2, &dimid[0], &varid[1]);
    EXP_ERR(NC_EVARSIZE,err)

    err = nc_set_fill(ncid, NC_NOFILL, NULL); ERR
    err = nc_close(ncid); ERR
    return (nerrs > 0);
}
add NC_check_voffs() to check whether the file starting offsets (begi…
…n) of all variables follows the same increasing order as they were defined.
@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 15, 2017

Contributor

This patch 294c734 checks file offsets of all variables and reports NC_ENOTNC when opening the corrupted CDF-5 file generated from nccopy case in #463

Contributor

wkliao commented Sep 15, 2017

This patch 294c734 checks file offsets of all variables and reports NC_ENOTNC when opening the corrupted CDF-5 file generated from nccopy case in #463

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Sep 18, 2017

Member

Can we get a summary comment about what all this pr is doing?

Member

DennisHeimbigner commented Sep 18, 2017

Can we get a summary comment about what all this pr is doing?

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 18, 2017

Contributor

The last 5 commits added two test programs and a corrupted CDF-5 file for testing.
tst_open_cdf5.c opens a corrupted CDF5 file, bad_cdf5_begin.nc, and reports NC_ENOTNC.
tst_cdf5_begin.c defines one big variable followed by a small variable, writes partial data to the (buggy) overlapped area and reads back for validation.

Without the patches from this pull request, NetCDF should fail both test programs.
The commit messages should have provided more details.
Let me know if you have questions for a particular commit.

Contributor

wkliao commented Sep 18, 2017

The last 5 commits added two test programs and a corrupted CDF-5 file for testing.
tst_open_cdf5.c opens a corrupted CDF5 file, bad_cdf5_begin.nc, and reports NC_ENOTNC.
tst_cdf5_begin.c defines one big variable followed by a small variable, writes partial data to the (buggy) overlapped area and reads back for validation.

Without the patches from this pull request, NetCDF should fail both test programs.
The commit messages should have provided more details.
Let me know if you have questions for a particular commit.

@WardF WardF added this to the 4.5.0 milestone Sep 25, 2017

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 29, 2017

Contributor

A note about the fix in this PR.

In 4.4.1 and prior, the member len of NC_var struct is used to store the variable size in bytes. Because CDF-1 and 2 allow one large, last-defined variable (>4GB), NC_var.len is checked and re-set to X_UINT_MAX to indicate the large variable. This approach most likely is based on CDF specifications: "2^32 - 1 is used in the vsize field for such variables."

In this PR, I keep NC_var.len unchanged. When writing to field vsize in file header, NC_var.len is checked against (2^32 - 4) and writes (2^32 - 1) to vsize in file, but its value is not changed. This fix allows NC_var.len to be used later when calculated NC_var.begin (of all other variables, not just itself), especially for variables in CDF-5.

Hope this makes sense.

Contributor

wkliao commented Sep 29, 2017

A note about the fix in this PR.

In 4.4.1 and prior, the member len of NC_var struct is used to store the variable size in bytes. Because CDF-1 and 2 allow one large, last-defined variable (>4GB), NC_var.len is checked and re-set to X_UINT_MAX to indicate the large variable. This approach most likely is based on CDF specifications: "2^32 - 1 is used in the vsize field for such variables."

In this PR, I keep NC_var.len unchanged. When writing to field vsize in file header, NC_var.len is checked against (2^32 - 4) and writes (2^32 - 1) to vsize in file, but its value is not changed. This fix allows NC_var.len to be used later when calculated NC_var.begin (of all other variables, not just itself), especially for variables in CDF-5.

Hope this makes sense.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Sep 29, 2017

Member

I am reluctant to do this until we can make a detailed
study of how that NC_MAX_UINT flag for vlen is used
throughout the libsrc code. The consequences for
not checking every case could be catastrophic.
In any case, this not going to occur until at best,
the next release+1.

Member

DennisHeimbigner commented Sep 29, 2017

I am reluctant to do this until we can make a detailed
study of how that NC_MAX_UINT flag for vlen is used
throughout the libsrc code. The consequences for
not checking every case could be catastrophic.
In any case, this not going to occur until at best,
the next release+1.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Sep 29, 2017

Contributor

I have checked vlen and its comparison against NC_MAX_UINT throughout the codes.
But, it will be great if it is also checked by a different reviewer.

Contributor

wkliao commented Sep 29, 2017

I have checked vlen and its comparison against NC_MAX_UINT throughout the codes.
But, it will be great if it is also checked by a different reviewer.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Sep 29, 2017

Member

Good! Thanks for doing that.

Member

DennisHeimbigner commented Sep 29, 2017

Good! Thanks for doing that.

@WardF WardF modified the milestones: 4.5.0, 4.5.1 Oct 17, 2017

@wkliao wkliao changed the title from fix the setting of the member len of NC_var object to CDF-5 fix: let NC_var.len be the true size of variable Nov 11, 2017

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Nov 11, 2017

Contributor

I rename the title of this PR to better describe its purpose.

I believe the solution to the CDF-5 bug is to

  1. make NC_var.len be the true size of the variable at all time and
  2. check NC_var.len against the size limits set by different classic file formats at the time of writing the variable size to the file header.
Contributor

wkliao commented Nov 11, 2017

I rename the title of this PR to better describe its purpose.

I believe the solution to the CDF-5 bug is to

  1. make NC_var.len be the true size of the variable at all time and
  2. check NC_var.len against the size limits set by different classic file formats at the time of writing the variable size to the file header.

@WardF WardF modified the milestones: 4.6.0, 4.6.1 Jan 25, 2018

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Jan 25, 2018

Member

Working on this pull request right now.

Member

WardF commented Jan 25, 2018

Working on this pull request right now.

@edhartnett

This comment has been minimized.

Show comment
Hide comment
@edhartnett

edhartnett Jan 29, 2018

Contributor

Is this pr ready to merge?

Contributor

edhartnett commented Jan 29, 2018

Is this pr ready to merge?

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Jan 29, 2018

Member

I'm seeing several failures on ARM that may or may not be related. Working to diagnose them.

Member

WardF commented Jan 29, 2018

I'm seeing several failures on ARM that may or may not be related. Working to diagnose them.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Jan 29, 2018

Member

@DennisHeimbigner Dennis, I've reviewed this and the issues I'm seeing are outside this PR. If I merge it into a commit w/out the issues, I don't see any problems. Can you give this a quick look over and see if anything pops out at you? If not, I will merge it in shortly.

Member

WardF commented Jan 29, 2018

@DennisHeimbigner Dennis, I've reviewed this and the issues I'm seeing are outside this PR. If I merge it into a commit w/out the issues, I don't see any problems. Can you give this a quick look over and see if anything pops out at you? If not, I will merge it in shortly.

@DennisHeimbigner

I notice that in a couple of places, size_t
is being used to hold the file size.
But the sizeof(size_t) can vary. So I think
it is safer to always use the unsigned long long
type to hold the file size.

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Jan 30, 2018

Member

Sounds good; I will make the changes on my end, as there doesn't appear to be any disagreement on this.

Member

WardF commented Jan 30, 2018

Sounds good; I will make the changes on my end, as there doesn't appear to be any disagreement on this.

WardF added some commits Jan 30, 2018

@WardF

WardF approved these changes Feb 2, 2018

@WardF WardF requested review from DennisHeimbigner and removed request for DennisHeimbigner Feb 2, 2018

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Feb 2, 2018

Member

Failure building on Windows; should be a simple fix, once that's sorted will get it merged.

Member

WardF commented Feb 2, 2018

Failure building on Windows; should be a simple fix, once that's sorted will get it merged.

@DennisHeimbigner

See my comment on the pr conversation

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 2, 2018

Member

I do not understand the
+AM_CONDITIONAL(ENABLE_CDF5, [test "$ac_cv_sizeof_size_t" -ge "8"])
in configure.ac. We do not need to use size_t internally to netcdf-c but rather
can directly use unsigned long long, which is guaranteed to be 8bytes.
The only time we need to downsize from unsigned long long is at the point where
we write the length to the file (in v1hpg.c, code change is already in pr).

Also,, much of this change should not be cdf5 specific. There is no reason
not to properly handle the file size for netcdf-3 files in general, even if
the various max file sizes are different.

Also, if this is correct
https://stackoverflow.com/questions/9073667/where-to-find-the-complete-definition-of-off-t-type
off_t suffers from the same varying size problems as size_t So we should not use
it anywhere. If we want an 8byte value, then we need to guarantee an 8byte value
independent of the machine word size.

Member

DennisHeimbigner commented Feb 2, 2018

I do not understand the
+AM_CONDITIONAL(ENABLE_CDF5, [test "$ac_cv_sizeof_size_t" -ge "8"])
in configure.ac. We do not need to use size_t internally to netcdf-c but rather
can directly use unsigned long long, which is guaranteed to be 8bytes.
The only time we need to downsize from unsigned long long is at the point where
we write the length to the file (in v1hpg.c, code change is already in pr).

Also,, much of this change should not be cdf5 specific. There is no reason
not to properly handle the file size for netcdf-3 files in general, even if
the various max file sizes are different.

Also, if this is correct
https://stackoverflow.com/questions/9073667/where-to-find-the-complete-definition-of-off-t-type
off_t suffers from the same varying size problems as size_t So we should not use
it anywhere. If we want an 8byte value, then we need to guarantee an 8byte value
independent of the machine word size.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Feb 2, 2018

Contributor

The reason of NetCDF-4 being unable to support CDF-5 is explained in my earlier post of this PR on Sep 12, 2017.

The off_t issue was explained in my post of #632. That PR is also applied to CDF-1 and 2.
I also discussed AC_SYS_LARGEFILE in #375.

Contributor

wkliao commented Feb 2, 2018

The reason of NetCDF-4 being unable to support CDF-5 is explained in my earlier post of this PR on Sep 12, 2017.

The off_t issue was explained in my post of #632. That PR is also applied to CDF-1 and 2.
I also discussed AC_SYS_LARGEFILE in #375.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Feb 2, 2018

Contributor

I meant to say NetCDF-4 is unable to support CDF-5 when size_t is 4-byte.

Contributor

wkliao commented Feb 2, 2018

I meant to say NetCDF-4 is unable to support CDF-5 when size_t is 4-byte.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 2, 2018

Member

Sorry, but I am missing something., I need to see an explicit statement about why
using off_t is necessary as opposed to using long long or unsigned long long.
PR #632 does not seem to address that.

Member

DennisHeimbigner commented Feb 2, 2018

Sorry, but I am missing something., I need to see an explicit statement about why
using off_t is necessary as opposed to using long long or unsigned long long.
PR #632 does not seem to address that.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 2, 2018

Member

I meant to say NetCDF-4 is unable to support CDF-5 when size_t is 4-byte.
(You said netcdf-4, but you meant, I think, netcdf-3).,
In any case, I agree with this for the reasons stated. Perhaps I misread your changes.
My point is that internally the netcdf-c library should avoid using
size_t or off_t to represent file sizes (or offsets) because of their varying size
depending on the machine word size. We need to internally use an explicit 8byte
type to represent the file size and convert it only when necessary.

Member

DennisHeimbigner commented Feb 2, 2018

I meant to say NetCDF-4 is unable to support CDF-5 when size_t is 4-byte.
(You said netcdf-4, but you meant, I think, netcdf-3).,
In any case, I agree with this for the reasons stated. Perhaps I misread your changes.
My point is that internally the netcdf-c library should avoid using
size_t or off_t to represent file sizes (or offsets) because of their varying size
depending on the machine word size. We need to internally use an explicit 8byte
type to represent the file size and convert it only when necessary.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 2, 2018

Member

There is probably a larger point here. We should also never be using the type "long"
anywhere. It is a useless type.

Member

DennisHeimbigner commented Feb 2, 2018

There is probably a larger point here. We should also never be using the type "long"
anywhere. It is a useless type.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Feb 2, 2018

Contributor

I did mean NetCDF-4, specifically NetCDF-4 APIs. Most of the arguments in NetCDF-4 APIs are of type size_t and size_t is always 4-byte on 32bit machines. This makes NetCDF-4 not possible to support CDF-5 properly on 32bit machine.

Can you point out the internal variables that should be defined as long long?
I can take a look.

Contributor

wkliao commented Feb 2, 2018

I did mean NetCDF-4, specifically NetCDF-4 APIs. Most of the arguments in NetCDF-4 APIs are of type size_t and size_t is always 4-byte on 32bit machines. This makes NetCDF-4 not possible to support CDF-5 properly on 32bit machine.

Can you point out the internal variables that should be defined as long long?
I can take a look.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 2, 2018

Member

At least these:
In nc3internal.h:
struct NC_var
fields: dsizes, begin, len

struct NC3_INFO:
fields: xsz begin_var, begin_rec, recsize, 

In v1hpg.c:
struct v1hs
fields: offset extent (note will require casting when passing to nciop)
function: v1h_put_NC_var
variable vsize (added by your code)

In nc3internal.c:
function NC_check_vlens:
variables: vlen_max

Note: ncio.h has a number, but that api probably needs fixing
as a separate effort.

Member

DennisHeimbigner commented Feb 2, 2018

At least these:
In nc3internal.h:
struct NC_var
fields: dsizes, begin, len

struct NC3_INFO:
fields: xsz begin_var, begin_rec, recsize, 

In v1hpg.c:
struct v1hs
fields: offset extent (note will require casting when passing to nciop)
function: v1h_put_NC_var
variable vsize (added by your code)

In nc3internal.c:
function NC_check_vlens:
variables: vlen_max

Note: ncio.h has a number, but that api probably needs fixing
as a separate effort.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 2, 2018

Member

Also, if you can compile with proper conversion warning, you should
be able to catch some other required changes/castings.

Member

DennisHeimbigner commented Feb 2, 2018

Also, if you can compile with proper conversion warning, you should
be able to catch some other required changes/castings.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Feb 3, 2018

Contributor

With AC_SYS_LARGEFILE in configure.ac, off_t will automatically set to an 8-byte integer if the 32-bit machine supports large file access. So, the only question left is for those variables of type size_t. If we agree to disable CDF-5 wherever size_t is not 8-byte, then the rest of compile-time type casting warnings becomes harmless.

Adding -Wconversion to CFLAGS will produce a long list of compile-time warnings about type casting, majority of them are not CDF-5 related. I also believe they probably are harmless. Of course, removing all such warnings is the best, but requires a big effort.

Contributor

wkliao commented Feb 3, 2018

With AC_SYS_LARGEFILE in configure.ac, off_t will automatically set to an 8-byte integer if the 32-bit machine supports large file access. So, the only question left is for those variables of type size_t. If we agree to disable CDF-5 wherever size_t is not 8-byte, then the rest of compile-time type casting warnings becomes harmless.

Adding -Wconversion to CFLAGS will produce a long list of compile-time warnings about type casting, majority of them are not CDF-5 related. I also believe they probably are harmless. Of course, removing all such warnings is the best, but requires a big effort.

@DennisHeimbigner

You should not ever rely on a flag to determine
the type size; this introduces a "hidden" dependency between the code and the type.
It is always safer to use a known size all the time.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 3, 2018

Member

More to the point, if we are going to do this, then let's do it right and in the cleanest
possible fashion. Which, IMO means do not rely on types whose size can change.

Member

DennisHeimbigner commented Feb 3, 2018

More to the point, if we are going to do this, then let's do it right and in the cleanest
possible fashion. Which, IMO means do not rely on types whose size can change.

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Feb 3, 2018

Contributor

What you asked is out of scope of this PR. Based on your argument, I wonder why you don't apply the same standard to all other PRs?

This PR passes all the tests on both 64 and 32 bit machines. I do not see the point of blocking this PR using a problem that has been in NetCDF for a long time.

I also disagree your argument of one should not ever rely on a flag to determine the type size. That has been a common practice in autotools. It really depends on whether you can use it properly. We can debate this for a long time, but the most important question is whether you would like this PR in NetCDF to fix the CDF5 problem observed/reported by @czender or not.

Contributor

wkliao commented Feb 3, 2018

What you asked is out of scope of this PR. Based on your argument, I wonder why you don't apply the same standard to all other PRs?

This PR passes all the tests on both 64 and 32 bit machines. I do not see the point of blocking this PR using a problem that has been in NetCDF for a long time.

I also disagree your argument of one should not ever rely on a flag to determine the type size. That has been a common practice in autotools. It really depends on whether you can use it properly. We can debate this for a long time, but the most important question is whether you would like this PR in NetCDF to fix the CDF5 problem observed/reported by @czender or not.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 3, 2018

Member

I wonder why you don't apply the same standard to all other PRs?
I do, sometimes.

IMO, the whole varying type size thing was a hack because the long long
type was not well supported a long time ago. Now it is supported and efficient,
so it is time to move on from the old way of doing things.

I will leave the call to Ward. If he takes your PR as is, then I am ok with that.
I will follow up later with another PR to complete the fix to get rid of off_t.
I am good either way.

Member

DennisHeimbigner commented Feb 3, 2018

I wonder why you don't apply the same standard to all other PRs?
I do, sometimes.

IMO, the whole varying type size thing was a hack because the long long
type was not well supported a long time ago. Now it is supported and efficient,
so it is time to move on from the old way of doing things.

I will leave the call to Ward. If he takes your PR as is, then I am ok with that.
I will follow up later with another PR to complete the fix to get rid of off_t.
I am good either way.

@edhartnett

This comment has been minimized.

Show comment
Hide comment
@edhartnett

edhartnett Feb 3, 2018

Contributor

@wkliao unfortunately I can't help get this PR merged to Unidata netcdf. If you find it useful, I am maintaining HPC NetCDF https://github.com/HPC-NetCDF/netcdf-c which has this PR merged, as well as most of the other PRs pending.

HPC Netcdf is a drop-in replacement for Unidata netcdf, with advanced HPC features. (It is intended that all new features from HPC Netcdf will also be back-ported to Unidata netcdf, but that may take a while due to limited Unidata resources.)

Contributor

edhartnett commented Feb 3, 2018

@wkliao unfortunately I can't help get this PR merged to Unidata netcdf. If you find it useful, I am maintaining HPC NetCDF https://github.com/HPC-NetCDF/netcdf-c which has this PR merged, as well as most of the other PRs pending.

HPC Netcdf is a drop-in replacement for Unidata netcdf, with advanced HPC features. (It is intended that all new features from HPC Netcdf will also be back-ported to Unidata netcdf, but that may take a while due to limited Unidata resources.)

@WardF

This comment has been minimized.

Show comment
Hide comment
@WardF

WardF Feb 3, 2018

Member

Happy Saturday all; I’ve just checked my work email, and am getting caught up on this. Lacking a demonstrated issue, I think the best course of action will be to accept this PR and if there is a potential issue that needs addressed we can do it via a follow up issue/pull request discussion.

Let me merge a PR that fixes a window test and then I’ll follow on with this one. I’ll get that done now.

Member

WardF commented Feb 3, 2018

Happy Saturday all; I’ve just checked my work email, and am getting caught up on this. Lacking a demonstrated issue, I think the best course of action will be to accept this PR and if there is a potential issue that needs addressed we can do it via a follow up issue/pull request discussion.

Let me merge a PR that fixes a window test and then I’ll follow on with this one. I’ll get that done now.

@DennisHeimbigner

This comment has been minimized.

Show comment
Hide comment
@DennisHeimbigner

DennisHeimbigner Feb 4, 2018

Member

@wkliao
I thought about it; You are correct and I apologize.
You provided a perfectly reasonable PR to fix a specific
problem. I should not push you to solve a different problem
just because I perceive an overlap.

Member

DennisHeimbigner commented Feb 4, 2018

@wkliao
I thought about it; You are correct and I apologize.
You provided a perfectly reasonable PR to fix a specific
problem. I should not push you to solve a different problem
just because I perceive an overlap.

It was ill considered

@wkliao

This comment has been minimized.

Show comment
Hide comment
@wkliao

wkliao Feb 4, 2018

Contributor

We all want the best for NetCDF, just approaching it from different angles.

@edhartnett, HPC NetCDF is an interesting work. You have recently contributed lots of patches to NetCDF. I wish I can be as energetic as you.

Contributor

wkliao commented Feb 4, 2018

We all want the best for NetCDF, just approaching it from different angles.

@edhartnett, HPC NetCDF is an interesting work. You have recently contributed lots of patches to NetCDF. I wish I can be as energetic as you.

@WardF WardF merged commit b4a2947 into Unidata:master Feb 4, 2018

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: C/C++ No alert changes
Details
lgtm analysis: JavaScript No alert changes
Details
license/cla Contributor License Agreement is signed.
Details
@edhartnett

This comment has been minimized.

Show comment
Hide comment
@edhartnett

edhartnett Feb 5, 2018

Contributor

@wkliao I don't believe in letting the grass grow under my feet.

HPC NetCDF is about to get more interesting. ;-)

Contributor

edhartnett commented Feb 5, 2018

@wkliao I don't believe in letting the grass grow under my feet.

HPC NetCDF is about to get more interesting. ;-)

@wkliao wkliao deleted the wkliao:cdf5_var_len branch Sep 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment