-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmor crashing #6
Comments
@pfuhe1 Could you please attach or send me a sample script that reproduces this? |
@pfuhe1 can you compile with debug so that the trace tell us where the core dump happens? Or run it via gdb. |
@doutriaux1 I have been having trouble reproducing this crash reliably, so will do a bit more testing myself before sending you a script. I am also unsure if I have compiled in debug model correctly. I set the environment variable CFLAGS = '-g' when I compiled, but this doesn't seem to change the trace that is output when it crashes. Do I have to specify some other debug options or set them another way? |
I have come back to this issue again, and have produced a simple script that uses cmor to write random data to a file. It then loops, writing over the file many times. It seems to have a memory leak and crashes after a while from running out of memory. I'm wondering if this is due to the same issue as above. I don't think I can attach the files here, so I'm sending you the script and some example output by email. |
thank you so much for doing this, can you please post the script it will help debugging. |
|
That's it. Sorry about the length of the script. |
perfect! Thx! |
You need to change the lines setting opts['outpath'] and opts['tablepath'] for your machine. Note I am running cmor 2.9.1 with python 2.7.6 and numpy 1.8.0. I have also just ran the script on an old machine I still have access to, which had cmor 2.8.3 installed along with python 2.6 and numpy 1.3.0, and the problem with the increasing memory usage doesn't occur. |
I use the same system for which Peter reported the memory leak. More or less by accident I found that it's due to building with a particular copy of the uuid library that was on the machine. Using a new version built from source fixes the leak. Now testing whether this fixes the intermittent crashes from the full processing. |
@MartinDix this is great news! please let usknow, I will tweak to configure to make sure we use the correct uuid version. |
This wasn't the real problem. A lucky observation showed that the crashes occurred when writing a 4D file after a 3D file.This allowed creating an example small enough to run in totalview which then pointed to the line free(cmor_axes[cmor_naxes].wrapping); at the end of cmor_axis in cmor_axes.c, https://github.com/PCMDI/cmor/blob/master/Src/cmor_axes.c#L1343 The wrapping pointer is only allocated for longitude axes. Sometimes this gives the double free error that Peter originally reported. Other times it gives other more or less obscure crashes. I've created an example script https://gist.github.com/MartinDix/6b2624d620da79c4e9f9 Adding a print printf("Wrapping %d %p\n", cmor_naxes, cmor_axes[cmor_naxes].wrapping); before the free then gives
In this case it's crashed at some point after the actual free, but just where it crashes seems to depend on array sizes, netcdf library versions etc. I think the fix is to add cmor_axes[cmor_naxes].wrapping = NULL; after the free. This seems to have fixed things here. |
wow! Nice catch! Will fix and add your script to the test suite! Thanks! |
I'm having an issue with cmor crashing intermittently. I am processing a number of files in a row, and cmor will produce a few files (even up to 100), then throw this error in the cmor_axis routine:
I've also seen this similar error:
*** glibc detected *** /apps/python/2.7.3/bin/python: free(): invalid pointer: 0x00000000246ee160 ***
and this:
*** glibc detected *** /apps/python/2.7.3/bin/python: corrupted double-linked list: 0x0000000003286ec0 ***
If I run the script to produce the same file again, generally it will succeed but then crash again when trying to produce a different file.
I am using the most recent version of cmor (2.9.1), with python2.7.3, netCDF4.3.0. I've tried building cmor with both intel and gnu compilers.
The text was updated successfully, but these errors were encountered: