-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on reading NC_VLEN variable with unlimited dimension #2181
Comments
The primary problem was that you were not clearing the data_read memory; this left junk in it that was being incorrectly interpreted by nc_vlens_free. |
Thanks Dennis! I am not quite clear, what is the expectation for data_read before it goes to nc_free_vlens? |
The problem is that your read does not appear to actually use all the space allocated for data_read.
|
Oh ok, I see what you mean - there are unused/not-overwritten elements of data_read because of the offsets in both dimensions and only partially writing data to the dimid_y dimension (that has fixed length of 10). So to "zero out" data_read, do I have to do something like: nc_vlen_t* data_read = new nc_vlen_t[num_items];
for (int i=0; i< num_items; i++)
{
data_read[i].p = NULL;
data_read[i].len = 0;
} such that all nc_vlen_t elements are legitimate ahead of time? Or are there better ways? I think for option 2, if you don't have prior knowledge of the file, you wouldn't know which portion of the data has actually been written to. So if you want to read all the data, assuming 5x10=50 (going off of the current lengths of dimensions) seems like the best guess. I am also unsure why this crash doesn't happen in other circumstances when reading back more data than has been written. When I was originally investigating this crash and trying to come up with simplest repro steps possible, I have tried just having one variable with one dimension (either unlimited or fixed-length, in which case writing less elements than the dimension length), and writing it with an offset. Then trying to read back from the variable the number of elements equal to dimension length (so that would mean the first elements would be unused due to offset, and, with the fixed dimension, the last elements would be also unused since I wrote fewer elements than the dimension length). The crash does not seem to happen in those cases. Even with the above code, reading and freeing data from varid2 ("Var2") works fine. I was only able to reproduce the crash with those specific circumstances I listed in the original post. And I guess the reason that unused elements of data_read contain invalid nc_vlen_t values (with bogus values for len and p fields) goes back to NC_VLENs not having a Fill Value turned on by default (#2068)? |
One way to zero a block of memory,
As for this:
It depends on what it is overwriting; it what it overwrote does not contain data |
Sorry to keep bugging you Dennis, just trying to understand this... I don't think memset is enough, as data_read is an array of nc_vlen_t structs, and memset wouldn't create "legal nc_vlen_t instances" for nc_free_vlens to release. I still get the crash when I use memset, but I was able to avoid it with the above for-loop setting all nc_vlen_t.len to 0s and all nc_vlen_t.p to NULL. Trying to understand this:
didn't you say that the issue happens when data is NOT overwritten, so there are "illegal"/bogus nc_vlen_t elements that nc_free_vlens will try to free? I actually went as far as printing out all the nc_vlen_t elements in data_read before and after nc_get_var call... Looks like nc_get_var DOES overwrite some values that I would think would be unused... For example, it DOES overwrite data_read[0] which should be unused because Var1 was written with offset of 1. It also overwrites data_read[9] through data_read[39] (although we are only writing 6 values with some gaps in between, which should only go up to data_read[8] I think. But it DOES NOT overwrite data_read[40] through data_read[49], and that's what causing the crash. Here is the ncdump of Var1:
And here is all the print out:
|
Memset works because it sets the whole block of memory to zero. Since the nc_vlen_t instances
where n is the number of nc_vlen_t instances in data read.
although memset is probably a bit faster. |
What you need to know is that the portion of the variable written on disk is different
v[0] -> mread[0]
|
Ah, right! Thanks for explaining!
Thank you, I think I understand. Although I would have thought mwrite and v should be switched around (i.e. offset should be in terms of the variable we are writing into (v) , not the variable we are copying from (mwrite)? If we started to write to v from v[1] and only wrote 4 elements, shouldn't then mread[0] also contain gabage? Sorry if I am just confusing myself) More importantly, while qualifying the workaround of setting all elements to nc_vlen_t.len=0 and nc_vlen_t.p=NULL before reading the variable, I still observe the crash if I try to turn on the fill value for both NC_VLEN variables and try to set it to default (which I believe is also nc_vlen_t.len=0 and nc_vlen_t.p=NULL). The code is exactly the same as the original except it has // set fill value
nc_vlen_t fillValue;
fillValue.p = NULL;
fillValue.len = 0;
retval = nc_def_var_fill(ncid,varid1,NC_FILL,&fillValue);
checkErrorCode(retval, "nc_def_var_fill (1)"); and
right after defining each variable. And the extra print statements. Looks like the values data_read[40] through data_read[49] are re-filled with bogus pointers upon read:
Thank you again for all your help! |
Hi, just wanted to check for any updates! I know this is a very-very edge-case scenario, but we do still see the crash even with the suggested workaround:
|
Hi, just wanted to check - has there been any updates on this issue? |
My belief is that your example falls into one of the known bugs described in PR #2179. Namely, bug #1 on vlen fillvalue. I still do not have a fix for this case since my suspicion is that it is an HDF5 failure. You might try again with the very latest version of HDF5 on the off chance that they fixed it. |
(This seems to be again running into issues with reclaiming NC_VLENs and might be possibly addressed by the proposed fix for #2143 )
We are using netcdf-c 4.8.1, and ran into this interesting crash. It seems to happen when
Here is some simplistic reproduction code that both creates the file and then tries to read the variable:
To reproduce the crash outside of our code base I had to use some environment variables that mess with memory allocation and release. I.e. on Debian 10:
And on macOS 11.2.3:
The text was updated successfully, but these errors were encountered: