Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on reading NC_VLEN variable with unlimited dimension #2181

Open
krisfed opened this issue Jan 10, 2022 · 12 comments
Open

Crash on reading NC_VLEN variable with unlimited dimension #2181

krisfed opened this issue Jan 10, 2022 · 12 comments

Comments

@krisfed
Copy link

krisfed commented Jan 10, 2022

(This seems to be again running into issues with reclaiming NC_VLENs and might be possibly addressed by the proposed fix for #2143 )

We are using netcdf-c 4.8.1, and ran into this interesting crash. It seems to happen when

  • there are 2 NC_VLEN variables (the base type doesn't seem to matter), both with the same unlimited dimension
  • at least one of them has to have offset in the unlimited dimension
  • crash happens on read of one of the variables (depending on their offsets in unlimited dimension... seems like reading the one with the smaller offset crashes)
  • offsets in the unlimited dimensions for two variables must be different

Here is some simplistic reproduction code that both creates the file and then tries to read the variable:

#include <iostream>
#include "netcdf.h"


void checkErrorCode(int status, const char* message){
    if (status != NC_NOERR){
        std::cout << "Error code: " << status << " from " << message << std::endl;
        std::cout << nc_strerror(status) << std::endl << std::endl;
    }
}

int main(int argc, const char * argv[]) {
    
    // ================ WRITE ==================
    
    // Setup data
    size_t DATA_LENGTHS[2] = {2, 3};
    nc_vlen_t data[DATA_LENGTHS[0] * DATA_LENGTHS[1]];
    
    const int first_size = 6;
    double first[first_size] = {65, 66, 67, 68, 69, 70};
    data[0].p = first;
    data[0].len = first_size;
    
    const int second_size = 6;
    double second[second_size] = {65, 66, 67, 68, 69, 70};
    data[1].p = second;
    data[1].len = second_size;
    
    const int third_size = 5;
    double third[third_size] = {65, 66, 67, 68, 69};
    data[2].p = third;
    data[2].len = third_size;
    
    const int fourth_size = 5;
    double fourth[fourth_size] = {65, 66, 67, 68, 69};
    data[3].p = fourth;
    data[3].len = fourth_size;
    
    const int fifth_size = 8;
    double fifth[fifth_size] = {65, 66, 67, 68, 69, 70, 71, 72};
    data[4].p = fifth;
    data[4].len = fifth_size;
    
    const int sixth_size = 4;
    double sixth[sixth_size] = {65, 66, 67, 68};
    data[5].p = sixth;
    data[5].len = sixth_size;

    // Open file
    int ncid;
    int retval;
    
    retval = nc_create("myfile.nc", NC_NETCDF4, &ncid);
    checkErrorCode(retval, "nc_create");
    
    // Define vlen type named RAGGED_DOUBLE
    nc_type vlen_typeID;
    retval = nc_def_vlen(ncid, "RAGGED_DOUBLE", NC_DOUBLE, &vlen_typeID);
    checkErrorCode(retval, "nc_def_vlen");
    
    // Define dimensions
    int dimid_x;
    retval = nc_def_dim(ncid, "xdim", NC_UNLIMITED, &dimid_x);
    checkErrorCode(retval, "nc_def_dim (1)");
    
    int dimid_y;
    retval = nc_def_dim(ncid, "ydim", 10, &dimid_y);
    checkErrorCode(retval, "nc_def_dim (2)");
    
    int dims[2] = {dimid_y, dimid_x};
    
    // Define vlen variable 1
    int varid1;
    retval = nc_def_var(ncid, "Var1", vlen_typeID, 2, dims, &varid1);
    checkErrorCode(retval, "nc_def_var (1)");
    
    // Write vlen variable 1
    size_t start1[2] = {0, 1};
    ptrdiff_t stride[2] = {1,1};
    retval = nc_put_vars(ncid, varid1, start1, DATA_LENGTHS, stride, data);
    checkErrorCode(retval, "nc_put_vars (1)");
    
    // Define vlen variable 2
    int varid2;
    retval = nc_def_var(ncid, "Var2", vlen_typeID, 2, dims, &varid2);
    checkErrorCode(retval, "nc_def_var (2)");
    
    // Write vlen variable 2
    size_t start2[2] = {0, 2};
    retval = nc_put_vars(ncid, varid2, start2, DATA_LENGTHS, stride, data);
    checkErrorCode(retval, "nc_put_vars (2)");
    
    retval = nc_close(ncid);
    checkErrorCode(retval, "nc_close (1)");
    
    
    // ================ READ ==================
    
    // open file
    retval = nc_open("myfile.nc", NC_NOWRITE, &ncid);
    checkErrorCode(retval, "nc_open");
    
    // read vlen variable with the smaller of the offsets
    
    // the length of fixed dimension is 10, and
    // the length of unlimited dimension is 5
    // (we wrote 3 datapoints with offset of 2 for 2nd var)
    const int num_items = 50;
    
    nc_vlen_t* data_read = new nc_vlen_t[num_items];
    retval = nc_get_var(ncid, varid1, data_read);
    checkErrorCode(retval, "nc_get_var");
    
    retval = nc_free_vlens(num_items, data_read);
    checkErrorCode(retval, "nc_free_vlens");
    
    retval = nc_close(ncid);
    checkErrorCode(retval, "nc_close (2)");
    
    return retval;
}

To reproduce the crash outside of our code base I had to use some environment variables that mess with memory allocation and release. I.e. on Debian 10:

% setenv MALLOC_CHECK_ 3
% setenv MALLOC_PERTURB_ 204
% ./a.out
free(): invalid pointer
Abort

And on macOS 11.2.3:

$ export MallocScribble=1
$ ./a.out 
a.out(23741,0x10b7fae00) malloc: enabling scribbling to detect mods to free blocks
a.out(23741,0x10b7fae00) malloc: *** error for object 0xaaaaaaaaaaaaaaaa: pointer being freed was not allocated
a.out(23741,0x10b7fae00) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6
@DennisHeimbigner
Copy link
Collaborator

The primary problem was that you were not clearing the data_read memory; this left junk in it that was being incorrectly interpreted by nc_vlens_free.

@krisfed
Copy link
Author

krisfed commented Jan 11, 2022

Thanks Dennis! I am not quite clear, what is the expectation for data_read before it goes to nc_free_vlens?
The doc just says "pass the pointer back to this function, when you're done with the data, and it will free the vlen memory"
Doesn't nc_free_vlens need the nc_vlen_t.p pointers to properly release the memory?

@DennisHeimbigner
Copy link
Collaborator

The problem is that your read does not appear to actually use all the space allocated for data_read.
This means that part of data_read is not overwritten with legal vlen_t instances. WHen nc_vlen_free
is called with the data_read argument and a length that is the whole size of data_read, nc_vlen_free
will assume every vlen_t in data_read is legitimate and it will crash when it attempts to free a vlen_t
instance that was not overwritten by the read call.
You have several options:

  1. zero out all of data_read
  2. change the count passed to nc_vlen_free to exactly cover what was actually read.

@krisfed
Copy link
Author

krisfed commented Jan 13, 2022

Oh ok, I see what you mean - there are unused/not-overwritten elements of data_read because of the offsets in both dimensions and only partially writing data to the dimid_y dimension (that has fixed length of 10).

So to "zero out" data_read, do I have to do something like:

    nc_vlen_t* data_read = new nc_vlen_t[num_items];
    for (int i=0; i< num_items; i++)
    {
        data_read[i].p = NULL;
        data_read[i].len = 0;
    }

such that all nc_vlen_t elements are legitimate ahead of time? Or are there better ways?

I think for option 2, if you don't have prior knowledge of the file, you wouldn't know which portion of the data has actually been written to. So if you want to read all the data, assuming 5x10=50 (going off of the current lengths of dimensions) seems like the best guess.

I am also unsure why this crash doesn't happen in other circumstances when reading back more data than has been written. When I was originally investigating this crash and trying to come up with simplest repro steps possible, I have tried just having one variable with one dimension (either unlimited or fixed-length, in which case writing less elements than the dimension length), and writing it with an offset. Then trying to read back from the variable the number of elements equal to dimension length (so that would mean the first elements would be unused due to offset, and, with the fixed dimension, the last elements would be also unused since I wrote fewer elements than the dimension length). The crash does not seem to happen in those cases. Even with the above code, reading and freeing data from varid2 ("Var2") works fine. I was only able to reproduce the crash with those specific circumstances I listed in the original post.

And I guess the reason that unused elements of data_read contain invalid nc_vlen_t values (with bogus values for len and p fields) goes back to NC_VLENs not having a Fill Value turned on by default (#2068)?

@DennisHeimbigner
Copy link
Collaborator

One way to zero a block of memory,

#include <string.h> /* need the c header to get memset *.
...
memset((void*)data_read,0,sizeof(data_read);
...

As for this:

...reading back more data than has been written...

It depends on what it is overwriting; it what it overwrote does not contain data
then it is possible that no error would be detected.

@krisfed
Copy link
Author

krisfed commented Jan 14, 2022

Sorry to keep bugging you Dennis, just trying to understand this... I don't think memset is enough, as data_read is an array of nc_vlen_t structs, and memset wouldn't create "legal nc_vlen_t instances" for nc_free_vlens to release. I still get the crash when I use memset, but I was able to avoid it with the above for-loop setting all nc_vlen_t.len to 0s and all nc_vlen_t.p to NULL.

Trying to understand this:

It depends on what it is overwriting; it what it overwrote does not contain data
then it is possible that no error would be detected.

didn't you say that the issue happens when data is NOT overwritten, so there are "illegal"/bogus nc_vlen_t elements that nc_free_vlens will try to free?

I actually went as far as printing out all the nc_vlen_t elements in data_read before and after nc_get_var call... Looks like nc_get_var DOES overwrite some values that I would think would be unused... For example, it DOES overwrite data_read[0] which should be unused because Var1 was written with offset of 1. It also overwrites data_read[9] through data_read[39] (although we are only writing 6 values with some gaps in between, which should only go up to data_read[8] I think. But it DOES NOT overwrite data_read[40] through data_read[49], and that's what causing the crash.

Here is the ncdump of Var1:

 Var1 =
  {{}, {65, 66, 67, 68, 69, 70}, {65, 66, 67, 68, 69, 70}, 
    {65, 66, 67, 68, 69}, {}},
  {{}, {65, 66, 67, 68, 69}, {65, 66, 67, 68, 69, 70, 71, 72}, 
    {65, 66, 67, 68}, {}},
  {{}, {}, {}, {}, {}},
  {{}, {}, {}, {}, {}},
  {{}, {}, {}, {}, {}},
  {{}, {}, {}, {}, {}},
  {{}, {}, {}, {}, {}},
  {{}, {}, {}, {}, {}},
  {{}, {}, {}, {}, {}},
  {{}, {}, {}, {}, {}} ;

And here is all the print out:

Before reading the variable (but after memset)
data_read[0].len: 0
data_read[0].p: 0xaaaaaaaaaaaaaaaa
data_read[0].p is NOT NULL
data_read[1].len: 12297829382473034410
data_read[1].p: 0xaaaaaaaaaaaaaaaa
data_read[1].p is NOT NULL
data_read[2].len: 12297829382473034410
data_read[2].p: 0xaaaaaaaaaaaaaaaa
data_read[2].p is NOT NULL
data_read[3].len: 12297829382473034410
data_read[3].p: 0xaaaaaaaaaaaaaaaa
data_read[3].p is NOT NULL
data_read[4].len: 12297829382473034410
data_read[4].p: 0xaaaaaaaaaaaaaaaa
data_read[4].p is NOT NULL
data_read[5].len: 12297829382473034410
data_read[5].p: 0xaaaaaaaaaaaaaaaa
data_read[5].p is NOT NULL
data_read[6].len: 12297829382473034410
data_read[6].p: 0xaaaaaaaaaaaaaaaa
data_read[6].p is NOT NULL
data_read[7].len: 12297829382473034410
data_read[7].p: 0xaaaaaaaaaaaaaaaa
data_read[7].p is NOT NULL
data_read[8].len: 12297829382473034410
data_read[8].p: 0xaaaaaaaaaaaaaaaa
data_read[8].p is NOT NULL
data_read[9].len: 12297829382473034410
data_read[9].p: 0xaaaaaaaaaaaaaaaa
data_read[9].p is NOT NULL
data_read[10].len: 12297829382473034410
data_read[10].p: 0xaaaaaaaaaaaaaaaa
data_read[10].p is NOT NULL
data_read[11].len: 12297829382473034410
data_read[11].p: 0xaaaaaaaaaaaaaaaa
data_read[11].p is NOT NULL
data_read[12].len: 12297829382473034410
data_read[12].p: 0xaaaaaaaaaaaaaaaa
data_read[12].p is NOT NULL
data_read[13].len: 12297829382473034410
data_read[13].p: 0xaaaaaaaaaaaaaaaa
data_read[13].p is NOT NULL
data_read[14].len: 12297829382473034410
data_read[14].p: 0xaaaaaaaaaaaaaaaa
data_read[14].p is NOT NULL
data_read[15].len: 12297829382473034410
data_read[15].p: 0xaaaaaaaaaaaaaaaa
data_read[15].p is NOT NULL
data_read[16].len: 12297829382473034410
data_read[16].p: 0xaaaaaaaaaaaaaaaa
data_read[16].p is NOT NULL
data_read[17].len: 12297829382473034410
data_read[17].p: 0xaaaaaaaaaaaaaaaa
data_read[17].p is NOT NULL
data_read[18].len: 12297829382473034410
data_read[18].p: 0xaaaaaaaaaaaaaaaa
data_read[18].p is NOT NULL
data_read[19].len: 12297829382473034410
data_read[19].p: 0xaaaaaaaaaaaaaaaa
data_read[19].p is NOT NULL
data_read[20].len: 12297829382473034410
data_read[20].p: 0xaaaaaaaaaaaaaaaa
data_read[20].p is NOT NULL
data_read[21].len: 12297829382473034410
data_read[21].p: 0xaaaaaaaaaaaaaaaa
data_read[21].p is NOT NULL
data_read[22].len: 12297829382473034410
data_read[22].p: 0xaaaaaaaaaaaaaaaa
data_read[22].p is NOT NULL
data_read[23].len: 12297829382473034410
data_read[23].p: 0xaaaaaaaaaaaaaaaa
data_read[23].p is NOT NULL
data_read[24].len: 12297829382473034410
data_read[24].p: 0xaaaaaaaaaaaaaaaa
data_read[24].p is NOT NULL
data_read[25].len: 12297829382473034410
data_read[25].p: 0xaaaaaaaaaaaaaaaa
data_read[25].p is NOT NULL
data_read[26].len: 12297829382473034410
data_read[26].p: 0xaaaaaaaaaaaaaaaa
data_read[26].p is NOT NULL
data_read[27].len: 12297829382473034410
data_read[27].p: 0xaaaaaaaaaaaaaaaa
data_read[27].p is NOT NULL
data_read[28].len: 12297829382473034410
data_read[28].p: 0xaaaaaaaaaaaaaaaa
data_read[28].p is NOT NULL
data_read[29].len: 12297829382473034410
data_read[29].p: 0xaaaaaaaaaaaaaaaa
data_read[29].p is NOT NULL
data_read[30].len: 12297829382473034410
data_read[30].p: 0xaaaaaaaaaaaaaaaa
data_read[30].p is NOT NULL
data_read[31].len: 12297829382473034410
data_read[31].p: 0xaaaaaaaaaaaaaaaa
data_read[31].p is NOT NULL
data_read[32].len: 12297829382473034410
data_read[32].p: 0xaaaaaaaaaaaaaaaa
data_read[32].p is NOT NULL
data_read[33].len: 12297829382473034410
data_read[33].p: 0xaaaaaaaaaaaaaaaa
data_read[33].p is NOT NULL
data_read[34].len: 12297829382473034410
data_read[34].p: 0xaaaaaaaaaaaaaaaa
data_read[34].p is NOT NULL
data_read[35].len: 12297829382473034410
data_read[35].p: 0xaaaaaaaaaaaaaaaa
data_read[35].p is NOT NULL
data_read[36].len: 12297829382473034410
data_read[36].p: 0xaaaaaaaaaaaaaaaa
data_read[36].p is NOT NULL
data_read[37].len: 12297829382473034410
data_read[37].p: 0xaaaaaaaaaaaaaaaa
data_read[37].p is NOT NULL
data_read[38].len: 12297829382473034410
data_read[38].p: 0xaaaaaaaaaaaaaaaa
data_read[38].p is NOT NULL
data_read[39].len: 12297829382473034410
data_read[39].p: 0xaaaaaaaaaaaaaaaa
data_read[39].p is NOT NULL
data_read[40].len: 12297829382473034410
data_read[40].p: 0xaaaaaaaaaaaaaaaa
data_read[40].p is NOT NULL
data_read[41].len: 12297829382473034410
data_read[41].p: 0xaaaaaaaaaaaaaaaa
data_read[41].p is NOT NULL
data_read[42].len: 12297829382473034410
data_read[42].p: 0xaaaaaaaaaaaaaaaa
data_read[42].p is NOT NULL
data_read[43].len: 12297829382473034410
data_read[43].p: 0xaaaaaaaaaaaaaaaa
data_read[43].p is NOT NULL
data_read[44].len: 12297829382473034410
data_read[44].p: 0xaaaaaaaaaaaaaaaa
data_read[44].p is NOT NULL
data_read[45].len: 12297829382473034410
data_read[45].p: 0xaaaaaaaaaaaaaaaa
data_read[45].p is NOT NULL
data_read[46].len: 12297829382473034410
data_read[46].p: 0xaaaaaaaaaaaaaaaa
data_read[46].p is NOT NULL
data_read[47].len: 12297829382473034410
data_read[47].p: 0xaaaaaaaaaaaaaaaa
data_read[47].p is NOT NULL
data_read[48].len: 12297829382473034410
data_read[48].p: 0xaaaaaaaaaaaaaaaa
data_read[48].p is NOT NULL
data_read[49].len: 12297829382473034410
data_read[49].p: 0xaaaaaaaaaaaaaaaa
data_read[49].p is NOT NULL

After reading the variable
data_read[0].len: 0
data_read[0].p: 0x0
data_read[0].p is NULL
data_read[1].len: 6
data_read[1].p: 0x7fc06ac494e0
data_read[1].p is NOT NULL
data_read[2].len: 6
data_read[2].p: 0x7fc06ac49540
data_read[2].p is NOT NULL
data_read[3].len: 5
data_read[3].p: 0x7fc06ac495b0
data_read[3].p is NOT NULL
data_read[4].len: 0
data_read[4].p: 0x0
data_read[4].p is NULL
data_read[5].len: 5
data_read[5].p: 0x7fc06ac49510
data_read[5].p is NOT NULL
data_read[6].len: 8
data_read[6].p: 0x7fc06ac49570
data_read[6].p is NOT NULL
data_read[7].len: 4
data_read[7].p: 0x7fc06ac495e0
data_read[7].p is NOT NULL
data_read[8].len: 0
data_read[8].p: 0x0
data_read[8].p is NULL
data_read[9].len: 0
data_read[9].p: 0x0
data_read[9].p is NULL
data_read[10].len: 0
data_read[10].p: 0x0
data_read[10].p is NULL
data_read[11].len: 0
data_read[11].p: 0x0
data_read[11].p is NULL
data_read[12].len: 0
data_read[12].p: 0x0
data_read[12].p is NULL
data_read[13].len: 0
data_read[13].p: 0x0
data_read[13].p is NULL
data_read[14].len: 0
data_read[14].p: 0x0
data_read[14].p is NULL
data_read[15].len: 0
data_read[15].p: 0x0
data_read[15].p is NULL
data_read[16].len: 0
data_read[16].p: 0x0
data_read[16].p is NULL
data_read[17].len: 0
data_read[17].p: 0x0
data_read[17].p is NULL
data_read[18].len: 0
data_read[18].p: 0x0
data_read[18].p is NULL
data_read[19].len: 0
data_read[19].p: 0x0
data_read[19].p is NULL
data_read[20].len: 0
data_read[20].p: 0x0
data_read[20].p is NULL
data_read[21].len: 0
data_read[21].p: 0x0
data_read[21].p is NULL
data_read[22].len: 0
data_read[22].p: 0x0
data_read[22].p is NULL
data_read[23].len: 0
data_read[23].p: 0x0
data_read[23].p is NULL
data_read[24].len: 0
data_read[24].p: 0x0
data_read[24].p is NULL
data_read[25].len: 0
data_read[25].p: 0x0
data_read[25].p is NULL
data_read[26].len: 0
data_read[26].p: 0x0
data_read[26].p is NULL
data_read[27].len: 0
data_read[27].p: 0x0
data_read[27].p is NULL
data_read[28].len: 0
data_read[28].p: 0x0
data_read[28].p is NULL
data_read[29].len: 0
data_read[29].p: 0x0
data_read[29].p is NULL
data_read[30].len: 0
data_read[30].p: 0x0
data_read[30].p is NULL
data_read[31].len: 0
data_read[31].p: 0x0
data_read[31].p is NULL
data_read[32].len: 0
data_read[32].p: 0x0
data_read[32].p is NULL
data_read[33].len: 0
data_read[33].p: 0x0
data_read[33].p is NULL
data_read[34].len: 0
data_read[34].p: 0x0
data_read[34].p is NULL
data_read[35].len: 0
data_read[35].p: 0x0
data_read[35].p is NULL
data_read[36].len: 0
data_read[36].p: 0x0
data_read[36].p is NULL
data_read[37].len: 0
data_read[37].p: 0x0
data_read[37].p is NULL
data_read[38].len: 0
data_read[38].p: 0x0
data_read[38].p is NULL
data_read[39].len: 0
data_read[39].p: 0x0
data_read[39].p is NULL
data_read[40].len: 0
data_read[40].p: 0xaaaaaaaaaaaaaaaa
data_read[40].p is NOT NULL
data_read[41].len: 0
data_read[41].p: 0xaaaaaaaaaaaaaaaa
data_read[41].p is NOT NULL
data_read[42].len: 0
data_read[42].p: 0xaaaaaaaaaaaaaaaa
data_read[42].p is NOT NULL
data_read[43].len: 0
data_read[43].p: 0xaaaaaaaaaaaaaaaa
data_read[43].p is NOT NULL
data_read[44].len: 0
data_read[44].p: 0xaaaaaaaaaaaaaaaa
data_read[44].p is NOT NULL
data_read[45].len: 0
data_read[45].p: 0xaaaaaaaaaaaaaaaa
data_read[45].p is NOT NULL
data_read[46].len: 0
data_read[46].p: 0xaaaaaaaaaaaaaaaa
data_read[46].p is NOT NULL
data_read[47].len: 0
data_read[47].p: 0xaaaaaaaaaaaaaaaa
data_read[47].p is NOT NULL
data_read[48].len: 0
data_read[48].p: 0xaaaaaaaaaaaaaaaa
data_read[48].p is NOT NULL
data_read[49].len: 0
data_read[49].p: 0xaaaaaaaaaaaaaaaa
data_read[49].p is NOT NULL
a.out(21568,0x10f56ee00) malloc: *** error for object 0xaaaaaaaaaaaaaaaa: pointer being freed was not allocated
a.out(21568,0x10f56ee00) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

@DennisHeimbigner
Copy link
Collaborator

Memset works because it sets the whole block of memory to zero. Since the nc_vlen_t instances
in a vector are contiguous, this ends up filling each nc_vlen_t.len with zeros and nc_vlen_t.p with zeros. This equivalent to explicitly setting the len and p fields to 0 and NULL respectively.
My speculation is that the length argument (the last one) to memset is not covering the full
block of memory allocated to data_read. It should look like this:

memset((void*)data_read,0,sizeof(nc_vlen_t)*n)

where n is the number of nc_vlen_t instances in data read.
This should be equivlalent to

nc_vlen_t data_read[n];
for(int i=0;i<n;i++) { data_read[i].len = 0; data_read.p = NULL;

although memset is probably a bit faster.

@DennisHeimbigner
Copy link
Collaborator

Looks like nc_get_var DOES overwrite some values that I would think would be unused...

What you need to know is that the portion of the variable written on disk is different
than the portion of memory that is read.
Suppose we have a 1-dimensional variable v on disk, with a dimension of size 5.
Suppose we have a 1-dimensional variable mwrite in memory with a dimension size of 10.
If write 5 elements of mwrite starting at mwrite[5] to v and specify that it begin writing into v[0]
then we will have:

mwrite[5] -> v[0]
...
mwrite[9] -> v[4]
```
Suppose now we have another  1-dimensional variable mread in memory with a dimension size of 10.
If we later read v back into mread but specify that it start reading into mread[0],
then we will have:

v[0] -> mread[0]
...
v[4]->mread[4]

At this point the contents of mread[5-9] are garbage and if we then make the mistake
of calling nc_vlen_free(mread,10), it may fail.
Note that although we wrote into v with offset 5, the reading back will by default
go into mread[0]. 

@krisfed
Copy link
Author

krisfed commented Jan 17, 2022

Memset works because it sets the whole block of memory to zero. Since the nc_vlen_t instances
in a vector are contiguous, this ends up filling each nc_vlen_t.len with zeros and nc_vlen_t.p with zeros. This equivalent to explicitly setting the len and p fields to 0 and NULL respectively.

Ah, right! Thanks for explaining!

What you need to know is that the portion of the variable written on disk is different
than the portion of memory that is read.

Thank you, I think I understand. Although I would have thought mwrite and v should be switched around (i.e. offset should be in terms of the variable we are writing into (v) , not the variable we are copying from (mwrite)? If we started to write to v from v[1] and only wrote 4 elements, shouldn't then mread[0] also contain gabage? Sorry if I am just confusing myself)

More importantly, while qualifying the workaround of setting all elements to nc_vlen_t.len=0 and nc_vlen_t.p=NULL before reading the variable, I still observe the crash if I try to turn on the fill value for both NC_VLEN variables and try to set it to default (which I believe is also nc_vlen_t.len=0 and nc_vlen_t.p=NULL).

The code is exactly the same as the original except it has

    // set fill value
    nc_vlen_t fillValue;
    fillValue.p = NULL;
    fillValue.len = 0;
    retval = nc_def_var_fill(ncid,varid1,NC_FILL,&fillValue);
    checkErrorCode(retval, "nc_def_var_fill (1)");

and

    // set fill value
    retval = nc_def_var_fill(ncid,varid2,NC_FILL,&fillValue);
    checkErrorCode(retval, "nc_def_var_fill (2)");

right after defining each variable. And the extra print statements.

Looks like the values data_read[40] through data_read[49] are re-filled with bogus pointers upon read:

Before reading the variable (but after memset)
data_read[0].len: 0
data_read[0].p: 0x0
data_read[0].p is NULL
data_read[1].len: 0
data_read[1].p: 0x0
data_read[1].p is NULL
data_read[2].len: 0
data_read[2].p: 0x0
data_read[2].p is NULL
data_read[3].len: 0
data_read[3].p: 0x0
data_read[3].p is NULL
data_read[4].len: 0
data_read[4].p: 0x0
data_read[4].p is NULL
data_read[5].len: 0
data_read[5].p: 0x0
data_read[5].p is NULL
data_read[6].len: 0
data_read[6].p: 0x0
data_read[6].p is NULL
data_read[7].len: 0
data_read[7].p: 0x0
data_read[7].p is NULL
data_read[8].len: 0
data_read[8].p: 0x0
data_read[8].p is NULL
data_read[9].len: 0
data_read[9].p: 0x0
data_read[9].p is NULL
data_read[10].len: 0
data_read[10].p: 0x0
data_read[10].p is NULL
data_read[11].len: 0
data_read[11].p: 0x0
data_read[11].p is NULL
data_read[12].len: 0
data_read[12].p: 0x0
data_read[12].p is NULL
data_read[13].len: 0
data_read[13].p: 0x0
data_read[13].p is NULL
data_read[14].len: 0
data_read[14].p: 0x0
data_read[14].p is NULL
data_read[15].len: 0
data_read[15].p: 0x0
data_read[15].p is NULL
data_read[16].len: 0
data_read[16].p: 0x0
data_read[16].p is NULL
data_read[17].len: 0
data_read[17].p: 0x0
data_read[17].p is NULL
data_read[18].len: 0
data_read[18].p: 0x0
data_read[18].p is NULL
data_read[19].len: 0
data_read[19].p: 0x0
data_read[19].p is NULL
data_read[20].len: 0
data_read[20].p: 0x0
data_read[20].p is NULL
data_read[21].len: 0
data_read[21].p: 0x0
data_read[21].p is NULL
data_read[22].len: 0
data_read[22].p: 0x0
data_read[22].p is NULL
data_read[23].len: 0
data_read[23].p: 0x0
data_read[23].p is NULL
data_read[24].len: 0
data_read[24].p: 0x0
data_read[24].p is NULL
data_read[25].len: 0
data_read[25].p: 0x0
data_read[25].p is NULL
data_read[26].len: 0
data_read[26].p: 0x0
data_read[26].p is NULL
data_read[27].len: 0
data_read[27].p: 0x0
data_read[27].p is NULL
data_read[28].len: 0
data_read[28].p: 0x0
data_read[28].p is NULL
data_read[29].len: 0
data_read[29].p: 0x0
data_read[29].p is NULL
data_read[30].len: 0
data_read[30].p: 0x0
data_read[30].p is NULL
data_read[31].len: 0
data_read[31].p: 0x0
data_read[31].p is NULL
data_read[32].len: 0
data_read[32].p: 0x0
data_read[32].p is NULL
data_read[33].len: 0
data_read[33].p: 0x0
data_read[33].p is NULL
data_read[34].len: 0
data_read[34].p: 0x0
data_read[34].p is NULL
data_read[35].len: 0
data_read[35].p: 0x0
data_read[35].p is NULL
data_read[36].len: 0
data_read[36].p: 0x0
data_read[36].p is NULL
data_read[37].len: 0
data_read[37].p: 0x0
data_read[37].p is NULL
data_read[38].len: 0
data_read[38].p: 0x0
data_read[38].p is NULL
data_read[39].len: 0
data_read[39].p: 0x0
data_read[39].p is NULL
data_read[40].len: 0
data_read[40].p: 0x0
data_read[40].p is NULL
data_read[41].len: 0
data_read[41].p: 0x0
data_read[41].p is NULL
data_read[42].len: 0
data_read[42].p: 0x0
data_read[42].p is NULL
data_read[43].len: 0
data_read[43].p: 0x0
data_read[43].p is NULL
data_read[44].len: 0
data_read[44].p: 0x0
data_read[44].p is NULL
data_read[45].len: 0
data_read[45].p: 0x0
data_read[45].p is NULL
data_read[46].len: 0
data_read[46].p: 0x0
data_read[46].p is NULL
data_read[47].len: 0
data_read[47].p: 0x0
data_read[47].p is NULL
data_read[48].len: 0
data_read[48].p: 0x0
data_read[48].p is NULL
data_read[49].len: 0
data_read[49].p: 0x0
data_read[49].p is NULL

After reading the variable
data_read[0].len: 0
data_read[0].p: 0x0
data_read[0].p is NULL
data_read[1].len: 6
data_read[1].p: 0x7fa254509670
data_read[1].p is NOT NULL
data_read[2].len: 6
data_read[2].p: 0x7fa2545096d0
data_read[2].p is NOT NULL
data_read[3].len: 5
data_read[3].p: 0x7fa254509740
data_read[3].p is NOT NULL
data_read[4].len: 0
data_read[4].p: 0x0
data_read[4].p is NULL
data_read[5].len: 5
data_read[5].p: 0x7fa2545096a0
data_read[5].p is NOT NULL
data_read[6].len: 8
data_read[6].p: 0x7fa254509700
data_read[6].p is NOT NULL
data_read[7].len: 4
data_read[7].p: 0x7fa254509770
data_read[7].p is NOT NULL
data_read[8].len: 0
data_read[8].p: 0x0
data_read[8].p is NULL
data_read[9].len: 0
data_read[9].p: 0x0
data_read[9].p is NULL
data_read[10].len: 0
data_read[10].p: 0x0
data_read[10].p is NULL
data_read[11].len: 0
data_read[11].p: 0x0
data_read[11].p is NULL
data_read[12].len: 0
data_read[12].p: 0x0
data_read[12].p is NULL
data_read[13].len: 0
data_read[13].p: 0x0
data_read[13].p is NULL
data_read[14].len: 0
data_read[14].p: 0x0
data_read[14].p is NULL
data_read[15].len: 0
data_read[15].p: 0x0
data_read[15].p is NULL
data_read[16].len: 0
data_read[16].p: 0x0
data_read[16].p is NULL
data_read[17].len: 0
data_read[17].p: 0x0
data_read[17].p is NULL
data_read[18].len: 0
data_read[18].p: 0x0
data_read[18].p is NULL
data_read[19].len: 0
data_read[19].p: 0x0
data_read[19].p is NULL
data_read[20].len: 0
data_read[20].p: 0x0
data_read[20].p is NULL
data_read[21].len: 0
data_read[21].p: 0x0
data_read[21].p is NULL
data_read[22].len: 0
data_read[22].p: 0x0
data_read[22].p is NULL
data_read[23].len: 0
data_read[23].p: 0x0
data_read[23].p is NULL
data_read[24].len: 0
data_read[24].p: 0x0
data_read[24].p is NULL
data_read[25].len: 0
data_read[25].p: 0x0
data_read[25].p is NULL
data_read[26].len: 0
data_read[26].p: 0x0
data_read[26].p is NULL
data_read[27].len: 0
data_read[27].p: 0x0
data_read[27].p is NULL
data_read[28].len: 0
data_read[28].p: 0x0
data_read[28].p is NULL
data_read[29].len: 0
data_read[29].p: 0x0
data_read[29].p is NULL
data_read[30].len: 0
data_read[30].p: 0x0
data_read[30].p is NULL
data_read[31].len: 0
data_read[31].p: 0x0
data_read[31].p is NULL
data_read[32].len: 0
data_read[32].p: 0x0
data_read[32].p is NULL
data_read[33].len: 0
data_read[33].p: 0x0
data_read[33].p is NULL
data_read[34].len: 0
data_read[34].p: 0x0
data_read[34].p is NULL
data_read[35].len: 0
data_read[35].p: 0x0
data_read[35].p is NULL
data_read[36].len: 0
data_read[36].p: 0x0
data_read[36].p is NULL
data_read[37].len: 0
data_read[37].p: 0x0
data_read[37].p is NULL
data_read[38].len: 0
data_read[38].p: 0x0
data_read[38].p is NULL
data_read[39].len: 0
data_read[39].p: 0x0
data_read[39].p is NULL
data_read[40].len: 0
data_read[40].p: 0x7fa254506990
data_read[40].p is NOT NULL
data_read[41].len: 0
data_read[41].p: 0x7fa254506990
data_read[41].p is NOT NULL
data_read[42].len: 0
data_read[42].p: 0x7fa254506990
data_read[42].p is NOT NULL
data_read[43].len: 0
data_read[43].p: 0x7fa254506990
data_read[43].p is NOT NULL
data_read[44].len: 0
data_read[44].p: 0x7fa254506990
data_read[44].p is NOT NULL
data_read[45].len: 0
data_read[45].p: 0x7fa254506990
data_read[45].p is NOT NULL
data_read[46].len: 0
data_read[46].p: 0x7fa254506990
data_read[46].p is NOT NULL
data_read[47].len: 0
data_read[47].p: 0x7fa254506990
data_read[47].p is NOT NULL
data_read[48].len: 0
data_read[48].p: 0x7fa254506990
data_read[48].p is NOT NULL
data_read[49].len: 0
data_read[49].p: 0x7fa254506990
data_read[49].p is NOT NULL
a.out(68435,0x10b566e00) malloc: *** error for object 0x7fa254506990: pointer being freed was not allocated
a.out(68435,0x10b566e00) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Thank you again for all your help!

@krisfed
Copy link
Author

krisfed commented Aug 25, 2022

Hi, just wanted to check for any updates! I know this is a very-very edge-case scenario, but we do still see the crash even with the suggested workaround:

More importantly, while qualifying the workaround of setting all elements to nc_vlen_t.len=0 and nc_vlen_t.p=NULL before reading the variable, I still observe the crash if I try to turn on the fill value for both NC_VLEN variables and try to set it to default (which I believe is also nc_vlen_t.len=0 and nc_vlen_t.p=NULL).

@krisfed
Copy link
Author

krisfed commented Feb 7, 2023

Hi, just wanted to check - has there been any updates on this issue?

@DennisHeimbigner
Copy link
Collaborator

My belief is that your example falls into one of the known bugs described in PR #2179. Namely, bug #1 on vlen fillvalue. I still do not have a fix for this case since my suspicion is that it is an HDF5 failure. You might try again with the very latest version of HDF5 on the off chance that they fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants