Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NC_DISKLESS returns garbage data for certain files #401

Closed
thehesiod opened this issue May 1, 2017 · 33 comments
Closed

NC_DISKLESS returns garbage data for certain files #401

thehesiod opened this issue May 1, 2017 · 33 comments

Comments

@thehesiod
Copy link
Contributor

thehesiod commented May 1, 2017

using the latest netcdf-c from master, NC_DISKLESS mode can return garbage data for certain files.

With the following file:
foo.zip

and the following code:

#include <iostream>
#include <netcdf.h>
#include <netcdf_mem.h>
#include <iostream>
#include <fstream>
#include <assert.h>
#import <vector>


using namespace std;

#define    HRAPY  200
#define    HRAPX  333


static size_t value_count[] = {HRAPY, HRAPX};
static size_t start[] = {0, 0};

int main(int argc, const char * argv[]) {
    ifstream file("/tmp/foo.nc", ios::in | ios::binary | ios::ate);
    streamsize size = file.tellg();
    file.seekg(0, ios::beg);
    
    std::vector<char> buffer(size);
    int status;
    int ncid;
    
    if (file.read(buffer.data(), size))
    {
//        status = nc_open_mem(".nc", 0, size, buffer.data(), &ncid);
        status = nc_open("/tmp/foo.nc", NC_DISKLESS|NC_CDF5, &ncid);
        if(status != NC_NOERR) {
            cout << nc_strerror(status);
            assert(false);
        }
        
        int rh_id;
        status = nc_inq_varid(ncid, "amountofprecip", &rh_id);
        if(status != NC_NOERR) {
            cout << nc_strerror(status);
            assert(false);
        }
        
        int rh_vals[HRAPY][HRAPX];
        status = nc_get_vara_int(ncid, rh_id, start, value_count, (int*)rh_vals);
        if(status != NC_NOERR) {
            cout << nc_strerror(status);
            assert(false);
        }
    }

    std::cout << "Hello, World!\n";
    return 0;
}

note there's no errors, however rows > 100 return garbage, row 100 looks like:

[ -71   76   96   37   20    8  -67  -47  -89   62   52  -52   78   86  -62
   60   46   44  -68  -44  -34   28   30  -30   -2   -1   14  -64  -68  -63
  -88  -86  -95   40  -70  -55    5  -56  -50  -61   88   57   45  -73   -3
  -45   19    5  -73   -6  -

whereas row 101 looks like:

[ 4325376  2818048   393215 -3997697 -4063233 -1114113 -1310720  4915199
 -6488064  5832703 -1114113 -2228225 -2686977 -3932160  6160384  3276799
  -720896  3670015 -4063232  6225919 -5636096  1966080  3014655 -4063233
 -5308416  1114112  3407871 -4653057 

opening the same file with nc_open does not return corrupt data.

@thehesiod thehesiod changed the title NC_DISKLESS broken for certain files NC_DISKLESS returns garbage data for certain files May 1, 2017
@thehesiod
Copy link
Contributor Author

@WardF if you wouldn't mind looking at this it would really help, I'm kinda heavily invested in getting in-mem working and ran into this with regular NC_DISKLESS mode. This is really bad as existing code can get garbage data from netcdf-c. I'll continue investigating for as long I can. But definitely some kind of issue between reading from disk and from memory. I'm guessing so far it's some kind of block offset issue?

@thehesiod
Copy link
Contributor Author

btw, have a fix in #400, however not sure why it works with that fix.

@WardF
Copy link
Member

WardF commented May 1, 2017

I will take a look; I was out of town for the past week but will start digging into this shortly.

@DennisHeimbigner
Copy link
Collaborator

It may be relevant that your test file foo.nc is cdf5 format.

@WardF
Copy link
Member

WardF commented May 2, 2017

Just a note; your example program is incomplete, insofar as size and buffer (and possibly other variables) are undefined. Dennis' point about cdf5 format is a good one; what happens if you open the file passing NC_DISKLESS|NC_CDF5 to nc_open? Is the behavior the same?

I'll try to recreate this in straight C in the meantime, and will also take a look at pull request #400. I'm pleased to have a fix although I'd be loathe to include it until we're sure why it works :).

@thehesiod
Copy link
Contributor Author

whoops, sorry, fixed

@WardF
Copy link
Member

WardF commented May 2, 2017

Thanks; replicated on my end, and using NC_DISKLESS|NC_CDF5 does not correct the issue.

@thehesiod
Copy link
Contributor Author

ok updated code again with what I just ran. Just re-tried with NC_CDF5 and didn't make a difference, I'm 90% sure the issue is related the the code called by ncx_getn_int_int.

@WardF
Copy link
Member

WardF commented May 2, 2017

Thanks; I suspect you're right. I've asked @DennisHeimbigner to take a look at #400 since he wrote the code, but I've confirmed that it corrects this issue. I'm going to write a plain C test that we can incorporate into our unit testing and, once that is done and @DennisHeimbigner has given his input/ok, I will merge in the fix. Thanks very much for your help diagnosing and correcting this!

@thehesiod
Copy link
Contributor Author

no problem! I'm really curious why this "fixes" it. Anxious to hear @DennisHeimbigner's investigation :) I'm really excited now that I can drop scipy. I gave up on it after finding yet another bug and realized they had >800 open issues.

@DennisHeimbigner
Copy link
Collaborator

I did not investigations and confirmed the problem. I tried three different
methods for reading the file:

  1. standard nc_open on foo.nc
  2. standard nc_open on foo.nc, but with NC_DISKLESS set
  3. nc_open_mem after reading the contents of the file into memory.
    Netcdf cmake #1 passed, but Use GNUInstallDirs to install into /usr/lib64 as needed #2 and cmake build does not install netcdf.3 man page #3 both failed. So, the problem is with NC_DISKLESS.
    I also switched the type to float and that failed the same way as int.

@DennisHeimbigner
Copy link
Collaborator

One more thing. Can you look at your config.h file and see if
HAVE_MMAP and/or HAVE_MREMAP is turned on?

@thehesiod
Copy link
Contributor Author

doesn't look like it

brew install homebrew/science/netcdf
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (caskroom/cask).
No changes to formulae.

==> Installing netcdf from homebrew/science
==> Downloading https://homebrew.bintray.com/bottles-science/netcdf-4.4.1.1_4.sierra.bottle.tar.gz
Already downloaded: /Users/amohr/Library/Caches/Homebrew/netcdf-4.4.1.1_4.sierra.bottle.tar.gz
==> Pouring netcdf-4.4.1.1_4.sierra.bottle.tar.gz
🍺  /usr/local/Cellar/netcdf/4.4.1.1_4: 66 files, 2.3MB

amohr@amohr-macbook:~/dev/thehesiod/netcdf4-python/netCDF4$ rgrep /usr/local/Cellar/netcdf/4.4.1.1_4/include "HAVE_MREMAP|HAVE_MMAP"
amohr@amohr-macbook:~/dev/thehesiod/netcdf4-python/netCDF4$ 

@WardF
Copy link
Member

WardF commented May 2, 2017 via email

@WardF
Copy link
Member

WardF commented May 2, 2017

The test file can be seen here: https://github.com/Unidata/netcdf-c/blob/ghpull-400/nc_test/tst_diskless_mem.c

If you build w/ diskless and then run the test manually, you should see the contents of the array printed to standard out. The plan was to do an explicit test for value range, but when I run this code I don't see any error as expected. Note that your pull request has not been integrated into this branch yet; I expect to see a failure, so that I can confirm the failure goes away when the pull request is merged.

@thehesiod
Copy link
Contributor Author

thehesiod commented May 2, 2017

perhaps you don't have enough test data? I'm guessing it happens when it crosses some boundary.

@WardF
Copy link
Member

WardF commented May 2, 2017

I'm using the data you provided. I generated the file from your foo.nc file, using ncdump to create a cdl file, and then ncgen to create a C program that would recreate the nc file. I then shuffled the functions around and added the check function.

@thehesiod
Copy link
Contributor Author

probably will need to debug the code where it does repro I guess

@WardF
Copy link
Member

WardF commented May 2, 2017

Yes; I'm just making notes here to myself as much as anything. Since I can recreate the issue (and it sounds like Dennis can as well), just need to figure out why it goes away in this circumstance. So perhaps not as straight forward as one would hope.

@WardF
Copy link
Member

WardF commented May 2, 2017 via email

@thehesiod
Copy link
Contributor Author

using

clang tst_diskless_mem.c -lnetcdf -o tst_diskless_mem

and running it, doesn't seem to reproduce any garbage data.

@WardF
Copy link
Member

WardF commented May 2, 2017

Ok, additional information. Running the test program provided against the foo.nc provided, I see garbage data. Running the test program provided against the binary netcdf file generated with the ncdump foo.nc > foo.cdl && ncgen -l c foo.cdl > foo.c && compile foo.c && run foo does not produce any garbage data. It appears that the issue might be in the binary file, foo.nc, but I'm not entirely certain why that would be. Comparing foo.cdl and tst_diskless_mem.cdl does not reveal any difference in the data. This will bear more investigation, but I'm curious how foo.nc was generated.

@thehesiod
Copy link
Contributor Author

foo.nc was generated with netcdf4-python.

@WardF
Copy link
Member

WardF commented May 2, 2017

This is tickling something at the back of my mind, but I'll need to think a little bit and dig through closed issues. I assume that netcdf4-python was using the latest release, 4.4.1.1. I'd be curious to see if this issue occurred with a file generated by netcdf4-python against the current master branch. 100% speculation at this point, however; I need to remember what the previous bug was, other than it was something to do with contiguous memory. It's entirely likely it was unrelated to what we're seeing here.

@WardF
Copy link
Member

WardF commented May 2, 2017

It's interesting that the same file, generated with the C library (sameness confirmed by comparing ncdump output) does not exhibit this issue. It's not necessarily informative, but interesting.

@WardF
Copy link
Member

WardF commented May 2, 2017

Attaching the version of foo.nc, generated via the pure C api using the master branch, that does not exhibit the issue described when fed into the test program provided above.

tst_diskless_mem.nc.zip

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented May 2, 2017 via email

@thehesiod
Copy link
Contributor Author

I'll try to generate a CDF3 version to see if we get the same issue

@WardF
Copy link
Member

WardF commented May 2, 2017

Is your generated file also CDF5?

Good catch Dennis; no, it is not, it is classic. If I copy it to cdf5 using nccopy the issue reappears. So, it is definitely cdf5 related.

Given that #400 seems to fix the issue, perhaps looking at that (3 changes, I believe?) it will provide some insight into what's going wrong? It doesn't for me since I'm less familiar with the code but it would be where I'd start.

WardF added a commit that referenced this issue May 2, 2017
@thehesiod
Copy link
Contributor Author

ya, verified doesn't happen with classic I generated

DennisHeimbigner added a commit that referenced this issue May 3, 2017
It turns out that the chunksize used in the
ncio-related code must be a multiple of eight in
size.  Both memio.c and mmapio.c were
potentially violating this constraint.

See also pr #400
@DennisHeimbigner
Copy link
Collaborator

Problem identified. In all of the ncio code in libsrc,
it is assumed that the chunk read size is a multiple of
eight bytes. For memio.c (and mmapio) this chunk size
is determined by the total actual file size. For some files,
this can result in a chunk size not a multiple of eight.
This causes data access to get off by some number of bytes
(2 in the case of foo.nc) and causes the garbage.
PR to fix is being tested now.

@DennisHeimbigner
Copy link
Collaborator

Fixed by pr #403

@WardF
Copy link
Member

WardF commented May 10, 2017

Closing issue as it was fixed as outlined by Dennis.

@WardF WardF closed this as completed May 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants