Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures on 32-bit platforms #83

Closed
WardF opened this issue Sep 22, 2014 · 8 comments
Closed

Failures on 32-bit platforms #83

WardF opened this issue Sep 22, 2014 · 8 comments
Assignees
Labels

Comments

@WardF
Copy link
Member

WardF commented Sep 22, 2014

Starting with the nightly tests last Friday, the 32 bit platforms are failing two tests, tst_ncgen4 and tst_ncgen4_classic. A quick glance at the issues appear to be related to malloc() related, but I haven't investigated it yet.

@WardF WardF added the type/bug label Sep 22, 2014
@WardF
Copy link
Member Author

WardF commented Sep 22, 2014

A quick link to the latest failures on the CDash dashboard.

@WardF
Copy link
Member Author

WardF commented Sep 22, 2014

It took a bit of doing but for the failure in tst_ncgen4.sh I've been able to trace it as follows. The actual error occurs in tst_ncgen4_cycle.sh.

/home/vagrant/netcdf-c/build/ncdump/../ncgen/ncgen -b -k1 -o ref_tst_long_charconst.nc ref_tst_long_charconst.dmp
ncgen: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
Aborted (core dumped)

So, the actual command that crashes seems to be:

$ ncgen -b -k1 -o ref_tst_long_char_const.nc ref_tst_long_charconst.dmp

@DennisHeimbigner
Copy link
Collaborator

Ward-
Can you valgrind that command?
=Dennis

Ward Fisher wrote:

It took a bit of doing but for the failure in tst_ncgen4.sh I've been able to trace it as follows. The actual error occurs in tst_ncgen4_cycle.sh.

/home/vagrant/netcdf-c/build/ncdump/../ncgen/ncgen -b -k1 -o ref_tst_long_charconst.nc ref_tst_long_charconst.dmp
ncgen: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
Aborted (core dumped)

So, the actual command that crashes seems to be:

$ ncgen -b -k1 -o ref_tst_long_char_const.nc ref_tst_long_charconst.dmp


Reply to this email directly or view it on GitHub:
#83 (comment)

@WardF
Copy link
Member Author

WardF commented Sep 22, 2014

I'll give that a try. For now I'm in gdb and found the following:

(gdb) bt
#0 0xb7fdd424 in *kernel_vsyscall ()
#1 0xb7a29577 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#2 0xb7a2c9a3 in __GI_abort () at abort.c:89
#3 0xb7a6e26d in __malloc_assert (
assertion=assertion@entry=0xb7b61ca0 "(old_top == (((mbinptr) (((char ) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offs"..., file=file@entry=0xb7b5d234 "malloc.c", line=line@entry=2372,
function=function@entry=0xb7b5d5c5 <__func
.10915> "sysmalloc")
at malloc.c:293
#4 0xb7a70e7b in sysmalloc (av=0xb7ba7420 <main_arena>, nb=119952)
at malloc.c:2369
#5 _int_malloc (av=av@entry=0xb7ba7420 <main_arena>, bytes=bytes@entry=119944)
at malloc.c:3800
#6 0xb7a72906 in __libc_calloc (n=119944, elem_size=1) at malloc.c:3219
#7 0x0805390a in chkmalloc (size=119944)
at /home/vagrant/netcdf-c/ncgen/debug.c:39
#8 0x080538e3 in chkcalloc (size=119944, nelems=1)
at /home/vagrant/netcdf-c/ncgen/debug.c:33
#9 0x08068bbd in bbSetalloc (bb=0x808fea8, sz0=0)
at /home/vagrant/netcdf-c/ncgen/bytebuffer.c:56
#10 0x08068daf in bbAppend (bb=0x808fea8, elem=0 '\000')
---Type to continue, or q to quit---
at /home/vagrant/netcdf-c/ncgen/bytebuffer.c:116
#11 0x08069315 in bbNull (bb=0x808fea8)
at /home/vagrant/netcdf-c/ncgen/bytebuffer.c:266
#12 0x08062b2e in ncglex () at ncgen.l:197
#13 0x0805f9b2 in ncgparse () at ncgentab.c:1506
#14 0x0804ea82 in main (argc=1, argv=0xbffff718)
at /home/vagrant/netcdf-c/ncgen/main.c:428
(gdb)

@WardF
Copy link
Member Author

WardF commented Sep 22, 2014

You can also duplicate this on your local machine with Vagrant if you like. If you clone http://github.com/WardF/tiny-ci and then run vagrant up t32, it will create a 32-bit ubuntu machine that you can then access with vagrant ssh t32. It will download the official ubuntu image if need be.

Note there's also a variant machine t32_big, which uses 2 CPUs and 4GB ram instead of the 1 CPU and 512MB ram that t32 uses.

@WardF
Copy link
Member Author

WardF commented Sep 22, 2014

Here's the Valgrind output I got.

vagrant@bigt32:/netcdf-c/build/ncgen$ valgrind ./ncgen -b -k1 -o ref_tst_long_char_const.nc ../ncdump/ref_tst_long_charconst.dmp
==4409== Memcheck, a memory error detector
==4409== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==4409== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==4409== Command: ./ncgen -b -k1 -o ref_tst_long_char_const.nc ../ncdump/ref_tst_long_charconst.dmp
==4409==
==4409== Invalid write of size 1
==4409== at 0x8054F2E: unescape (escapes.c:737)
==4409== by 0x8062ACB: ncglex (ncgen.l:190)
==4409== by 0x805F9B1: ncgparse (ncgentab.c:1506)
==4409== by 0x804EA81: main (main.c:428)
==4409== Address 0x507b3d4 is 0 bytes after a block of size 59,972 alloc'd
==4409== at 0x402C109: calloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==4409== by 0x8053909: chkmalloc (debug.c:39)
==4409== by 0x80538E2: chkcalloc (debug.c:33)
==4409== by 0x8068BBC: bbSetalloc (bytebuffer.c:56)
==4409== by 0x8068CA4: bbSetlength (bytebuffer.c:79)
==4409== by 0x8062A6F: ncglex (ncgen.l:189)
==4409== by 0x805F9B1: ncgparse (ncgentab.c:1506)
==4409== by 0x804EA81: main (main.c:428)
==4409==
==4409==
==4409== HEAP SUMMARY:
==4409== in use at exit: 600,201 bytes in 98 blocks
==4409== total heap usage: 136 allocs, 38 frees, 1,135,991 bytes allocated
==4409==
==4409== LEAK SUMMARY:
==4409== definitely lost: 16,682 bytes in 5 blocks
==4409== indirectly lost: 262,144 bytes in 1 blocks
==4409== possibly lost: 0 bytes in 0 blocks
==4409== still reachable: 321,375 bytes in 92 blocks
==4409== suppressed: 0 bytes in 0 blocks
==4409== Rerun with --leak-check=full to see details of leaked memory
==4409==
==4409== For counts of detected and suppressed errors, rerun with: -v
==4409== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
vagrant@bigt32:
/netcdf-c/build/ncgen$

@WardF
Copy link
Member Author

WardF commented Sep 22, 2014

@DennisHeimbigner fixed in 8074e0f. Thanks! Upon inspection the issue would have been present on all platforms but was only causing a fault on the 32-bit Ubuntu systems.

@WardF
Copy link
Member Author

WardF commented Sep 22, 2014

More information on the Unidata JIRA system: https://bugtracking.unidata.ucar.edu/browse/NCF-315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants