Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in SpaceGroupTest.reduceToPrimitive via libsymspg #752

Closed
drew-parsons opened this issue Aug 29, 2021 · 15 comments · Fixed by #806
Closed

segfault in SpaceGroupTest.reduceToPrimitive via libsymspg #752

drew-parsons opened this issue Aug 29, 2021 · 15 comments · Fixed by #806

Comments

@drew-parsons
Copy link
Contributor

drew-parsons commented Aug 29, 2021

Avogadro version: 1.95

  • Avogadrolibs: 1.95.1
  • Qt: 5.15.2

**Desktop version: **

  • OS: Debian GNU/Linux
  • Version debian unstable (Linux 5.10.46)
  • Compiler g++ 10.2.1

Describe the bug

Running tests for the Debian build of Avogadrolibs: 1.95.1, I get a segfault at SpaceGroupTest.reduceToPrimitive

To Reproduce
Steps to reproduce the behavior:

  1. From the avogadrolibs source dir, build tests in a run_test subdir:
mkdir tests/run_test
cd tests/run_test

cmake -DAvogadroLibs_BINARY_DIR=/usr/include/ -Wno-dev ..
make AvogadroTests VERBOSE=1
  1. Then run the tests (from the tests/run_test dir)
./core/AvogadroTests
  1. See error

Expected behavior

All tests are expected to pass without error.

Screenshots
A gdb backtrace reports:

...
[----------] 5 tests from SpaceGroupTest
[ RUN      ] SpaceGroupTest.getSpaceGroup
[       OK ] SpaceGroupTest.getSpaceGroup (19 ms)
[ RUN      ] SpaceGroupTest.reduceToPrimitive

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff798e83d in _int_malloc (av=av@entry=0x7ffff7ac4b80 <main_arena>, bytes=bytes@entry=40) at malloc.c:3609
3609	malloc.c: No such file or directory.
(gdb) bt
#0  0x00007ffff798e83d in _int_malloc (av=av@entry=0x7ffff7ac4b80 <main_arena>, bytes=bytes@entry=40) at malloc.c:3609
#1  0x00007ffff7990164 in __GI___libc_malloc (bytes=40) at malloc.c:3058
#2  0x00007ffff78cac78 in ?? () from /lib/x86_64-linux-gnu/libsymspg.so.1
#3  0x00007ffff78cb2b3 in ?? () from /lib/x86_64-linux-gnu/libsymspg.so.1
#4  0x00007ffff78c6b80 in spa_search_spacegroup () from /lib/x86_64-linux-gnu/libsymspg.so.1
#5  0x00007ffff78ba082 in det_determine_all () from /lib/x86_64-linux-gnu/libsymspg.so.1
#6  0x00007ffff78c77ae in ?? () from /lib/x86_64-linux-gnu/libsymspg.so.1
#7  0x00007ffff7f20749 in Avogadro::Core::AvoSpglib::getHallNumber (mol=..., cartTol=1.0000000000000001e-05) at ./avogadro/core/avospglib.cpp:66
#8  0x00007ffff7f20f95 in Avogadro::Core::AvoSpglib::standardizeCell (mol=..., cartTol=1.0000000000000001e-05, toPrimitive=<optimized out>, idealize=<optimized out>) at ./avogadro/core/avospglib.cpp:181
#9  0x000055555566aaac in SpaceGroupTest_reduceToPrimitive_Test::TestBody() ()
#10 0x00005555556bd297 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
#11 0x00005555556b267e in testing::Test::Run() ()
#12 0x00005555556b27d5 in testing::TestInfo::Run() ()
#13 0x00005555556b2c69 in testing::TestSuite::Run() ()
#14 0x00005555556b32b2 in testing::internal::UnitTestImpl::RunAllTests() ()
#15 0x00005555556bd807 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ()
#16 0x00005555556b2898 in testing::UnitTest::Run() ()
#17 0x0000555555621b60 in main ()

Additional context

This is with external spglib (libsymspg1) 1.16.2

@ghutchis
Copy link
Member

Weird, it seems to be working with 1.16.1. Thanks, I'll look into it.

@drew-parsons
Copy link
Contributor Author

This segfault is still current. Are you able to reproduce it?

Installing libsymspg1-dbgsym gives more detail,

[ RUN      ] SpaceGroupTest.reduceToPrimitive
malloc(): unaligned fastbin chunk detected 3

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007ffff794e536 in __GI_abort () at abort.c:79
#2  0x00007ffff79a62b8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7ab43a4 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff79add0a in malloc_printerr (str=str@entry=0x7ffff7ab6c88 "malloc(): unaligned fastbin chunk detected 3") at malloc.c:5389
#4  0x00007ffff79b0d6d in _int_malloc (av=av@entry=0x7ffff7ae6ba0 <main_arena>, bytes=bytes@entry=40) at malloc.c:3635
#5  0x00007ffff79b2734 in __GI___libc_malloc (bytes=bytes@entry=40) at malloc.c:3078
#6  0x00007ffff78ecc78 in get_translation (rot=rot@entry=0x7fffffffb910, cell=cell@entry=0x5555557ca200, symprec=symprec@entry=1.0000000000000001e-05, is_identity=is_identity@entry=0) at ./src/symmetry.c:419
#7  0x00007ffff78ed2b3 in get_space_group_operations (symprec=1.0000000000000001e-05, primitive=0x5555557ca200, lattice_sym=0x7fffffffb910) at ./src/symmetry.c:688
#8  get_operations (primitive=0x5555557ca200, symprec=symprec@entry=1.0000000000000001e-05, angle_symprec=angle_symprec@entry=-1) at ./src/symmetry.c:317
#9  0x00007ffff78ed745 in sym_get_operation (primitive=<optimized out>, symprec=symprec@entry=1.0000000000000001e-05, angle_tolerance=angle_tolerance@entry=-1) at ./src/symmetry.c:196
#10 0x00007ffff78e8b80 in spa_search_spacegroup (primitive=0x5555557ca2a0, hall_number=hall_number@entry=0, symprec=1.0000000000000001e-05, angle_tolerance=-1) at ./src/spacegroup.c:584
#11 0x00007ffff78dc082 in get_spacegroup_and_primitive (angle_symprec=-1, symprec=1.0000000000000001e-05, hall_number=0, cell=0x5555557d6fc0) at ./src/determination.c:150
#12 det_determine_all (cell=cell@entry=0x5555557d6fc0, hall_number=hall_number@entry=0, symprec=symprec@entry=1.0000000000000001e-05, angle_symprec=angle_symprec@entry=-1) at ./src/determination.c:72
#13 0x00007ffff78e97ae in get_dataset (lattice=lattice@entry=0x7fffffffc1f0, position=position@entry=0x5555557caf00, types=types@entry=0x5555557d7030, num_atom=num_atom@entry=10, 
    hall_number=hall_number@entry=0, symprec=symprec@entry=1.0000000000000001e-05, angle_tolerance=angle_tolerance@entry=-1) at ./src/spglib.c:1240
#14 0x00007ffff78e9e00 in spg_get_dataset (lattice=lattice@entry=0x7fffffffc1f0, position=position@entry=0x5555557caf00, types=types@entry=0x5555557d7030, num_atom=num_atom@entry=10, 
    symprec=symprec@entry=1.0000000000000001e-05) at ./src/spglib.c:273
#15 0x00007ffff7f32fb1 in Avogadro::Core::AvoSpglib::getHallNumber (mol=..., cartTol=1.0000000000000001e-05) at ./avogadro/core/avospglib.cpp:66
#16 0x00007ffff7f3353f in Avogadro::Core::AvoSpglib::standardizeCell (mol=..., cartTol=1.0000000000000001e-05, toPrimitive=<optimized out>, idealize=<optimized out>) at ./avogadro/core/avospglib.cpp:181
#17 0x000055555566add4 in SpaceGroupTest_reduceToPrimitive_Test::TestBody() ()
#18 0x00005555556bf2e7 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()

@ghutchis
Copy link
Member

ghutchis commented Oct 4, 2021

I am unable to reproduce it, even on the GH build nodes, but I should be able to do something from this...

@drew-parsons
Copy link
Contributor Author

Just to confirm, were your tests using spglib 1.16.2 ?

@ghutchis
Copy link
Member

ghutchis commented Oct 4, 2021

Yes. I've bumped openchemistry to 1.16.2.

@drew-parsons
Copy link
Contributor Author

drew-parsons commented Oct 4, 2021

OK, let me know if there's anything you need to to check or patch. I'll upload avogadrolibs 1.95.1 now to check if clean builds on the Debian servers reproduce it. I guess the rest of the code is safe, so long as getHallNumber is not used.

@drew-parsons
Copy link
Contributor Author

drew-parsons commented Oct 4, 2021

Unrelated to the spglib segfault, you might like to be aware of the Qt5/GL bug reported at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=798408 . Something to do with ARM hardware GLES support. It's preventing a successful build on armhf and armel architectures (https://buildd.debian.org/status/package.php?p=avogadrolibs).

@drew-parsons
Copy link
Contributor Author

As for the spglib segfault, it's reproduced on the debian CI machines on amd64, arm64, i386 at https://ci.debian.net/packages/a/avogadrolibs/
https://ci.debian.net/data/autopkgtest/testing/amd64/a/avogadrolibs/15777229/log.gz
https://ci.debian.net/data/autopkgtest/testing/arm64/a/avogadrolibs/15777518/log.gz

[ RUN      ] SpaceGroupTest.reduceToPrimitive
malloc(): unaligned fastbin chunk detected 3
/tmp/autopkgtest-lxc.nu2ju37t/downtmp/build.KOt/src/debian/tests/test-avogadrolibs-cpp: line 9: 18678 Aborted                 ./core/AvogadroTests

@ghutchis
Copy link
Member

ghutchis commented Oct 5, 2021

Thanks for the heads-up on ARM. Do you know if it's resolved with Qt6?

@ghutchis
Copy link
Member

ghutchis commented Oct 5, 2021

Aha, I can finally reproduce on my Mac. 🎉

ghutchis added a commit to ghutchis/avogadrolibs that referenced this issue Oct 5, 2021
Not sure if this is the perfect solution, but it fixes OpenChemistry#800
Might also fix OpenChemistry#752

Signed-off-by: Geoff Hutchison <geoff.hutchison@gmail.com>
@ghutchis
Copy link
Member

ghutchis commented Oct 5, 2021

It's a symptom of a deeper problem with layers. I'm hopeful #806 will fix this for you too. Let me know - it's been hard to confirm your exact bug.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2021

Here are the build results
Avogadro2.AppImage
macOS.dmg
Ubuntu-2004.tar.gz
Win64.exe
Artifacts will only be retained for 90 days.

@drew-parsons
Copy link
Contributor Author

drew-parsons commented Oct 5, 2021

Thanks for the heads-up on ARM. Do you know if it's resolved with Qt6?

Not clear to me yet, but they did make a big overhaul of their video system handling.

I'll give patch #806 a try.

@drew-parsons
Copy link
Contributor Author

Patch #806 seems to be working well enough. Not crashing anymore on SpaceGroupTest.reduceToPrimitive , https://ci.debian.net/data/autopkgtest/testing/amd64/a/avogadrolibs/15790439/log.gz

There's a separate runtime crash related to changing the crystal cell, but I'm not certain how to reproduce it reliably. I guess it's not related to this spglib crash anyway, certainly not in the unit tests.

I think we can consider this bug resolved by #806 .

@ghutchis
Copy link
Member

ghutchis commented Oct 6, 2021

If you have other crashes, please create an issue with as much info as you have. I have ideas on a few of them and we can track them down.

#806 isn't perfect - figuring out the correct solution will take longer - but if it fixes crashes, it's a useful start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants