Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMXOMF crash with complex C++ #49

Closed
dmik opened this issue Dec 16, 2019 · 14 comments
Closed

EMXOMF crash with complex C++ #49

dmik opened this issue Dec 16, 2019 · 14 comments
Assignees

Comments

@dmik
Copy link
Contributor

dmik commented Dec 16, 2019

I've got EMXOMF.EXE crash a couple of times when running make -j4 here (the first one is here bitwiseworks/qt5-os2#11 (comment)). Given that there are completely different source files, I doubt it relates to compile options but still, here they are:

g++ -c -Zomf -march=i686 -mtune=i686 -g -std=gnu++1z -Wall -W -DQT_NO_IPV6 -DQT_TESTLIB_LIB -DQT_CORE_LIB -DQT_TESTCASE_BUILDDIR='"D:/Coding/qt5/qt5-dev-build/qtbase/tests/auto/corelib/thread/qthread"' -ID:/Coding/qt5/qt5/qtbase/tests/auto/corelib/thread/qthread -I. -ID:/Coding/qt5/qt5/qtbase/tests/shared -I../../../../../include -I../../../../../include/QtTest -I../../../../../include/QtCore -I.moc/debug -ID:/Coding/qt5/qt5/qtbase/mkspecs/os2-g++ -o .obj/debug/tst_qthread.obj D:/Coding/qt5/qt5/qtbase/tests/auto/corelib/thread/qthread/tst_qthread.cpp

And here's the popuplog.os2 entry:

12-16-2019  14:40:08  SYS3171  PID d75f  TID 0001  Slot 00ea
C:\USR\BIN\EMXOMF.EXE
c0000005
00033ac3
P1=00000002  P2=0004fffc  P3=XXXXXXXX  P4=XXXXXXXX
EAX=0014e4cc  EBX=0014e4cc  ECX=0014e8f4  EDX=00000103
ESI=00000001  EDI=0014e8d0
DS=0053  DSACC=d0f3  DSLIM=5fffffff
ES=0053  ESACC=d0f3  ESLIM=5fffffff
FS=150b  FSACC=00f3  FSLIM=00000030
GS=0000  GSACC=****  GSLIM=********
CS:EIP=005b:00033ac3  CSACC=d0df  CSLIM=5fffffff
SS:ESP=0053:00050000  SSACC=d0f3  SSLIM=5fffffff
EBP=00050008  FLG=00010202

EMXOMF.EXE 0001:00023ac3

It's hard to tell where the crash happens as there is no TRP file (we don't link LIBC tools to LIBCx — perhaps we should) and it's not easy to extract debug info from binary form contained in the .DBG file. I will replace EMXOMF with my own build (with TRP enabled) and we will see where it leads us.

@dmik dmik self-assigned this Dec 16, 2019
@dmik
Copy link
Contributor Author

dmik commented Dec 16, 2019

I've tried to run the above command directly and get the same crash in 100% cases. So running in parallel is irrelevant here. Note that in the current form this source is non-buildable with gcc 4.9.2 because of the missing std::thread support there (which was detected by configure when using a test gcc 9.2.0 build).

@dmik
Copy link
Contributor Author

dmik commented Dec 16, 2019

Some more testing. The problem only shows up with GCC 9.2.0 if -std=gnu++1z is used. And sometimes it not just crashes, it brings the whole machine down (hard reset). If I downgrade the standard to e.g. -std=gnu++14 all works.

It really feels like EMXOMF enters some endless loop and eventually drains all system resources (and SYS3171 usually means that there is not enough room on stack to process exception handlers so it may be some endless recursion or such). This may be caused by some new debug info produced in the newest C++ standard mode or something from that area.

Note that due to SYS3171, I don't even have a .TRP file as it needs stack to get created. But I have a map file for it at least.

@dmik
Copy link
Contributor Author

dmik commented Dec 16, 2019

I've checked it back to EMXOMF.EXE dated 2015-02-07 (i.e. the one used in all pre-LIBCn RPM builds of libc) and it just shows exactly the same crash. The latest EMXOMF.EXE from LIBC 0.6.6 CSD6 from Knut (2014-10-26) does not crash however (although it produces a really big OBJ file, much bigger than in -std=gnu++14 mode — 2.2 MB vs 1.1 MB). So it must be one of our patches missing in the Knut's tree. I need to compare.

Note that EMXOMF.EXE dated 2015-02-07 crashes similarly even in -std=gnu++14 mode. So our changes certainly break things.

@StevenLevine
Copy link

Moving this from bitwiseworks/qt5-os2#11 since it neglected to review all ticket comments first...

FWIW, this trap is because the stack overflowed. Your thought that there's a recursive loop is probably true.

If you can arrange to capture a process dump with instance data, we might be able to get a better idea of where the recursion is occurring. Using my pdumpctl.cmd, request a Full dump. This will provide both instance data and private code. We should not need shared data.

A copy of your emxomf.exe and emxomf.map might be sufficient too. It depends on how easy the generated code is to read.

Since you have a stack overflow, you will not be able to use exceptq.

@dmik
Copy link
Contributor Author

dmik commented Dec 16, 2019

@StevenLevine thank you for your comments! I guess that it's simpler to inspect the .map file and source changes for the beginning. If I run nowhere I will prepare a dump for you.

@dmik
Copy link
Contributor Author

dmik commented Dec 16, 2019

Simply reverting the source code to Knut's versions doesn't help here. So the culprit here must be the fact that we build EMXOMF (and LIBC itself) with GCC4 while Knut still uses GCC3. Debugging further.

@StevenLevine
Copy link

The trap is in d_print_comp() which can be call recursively.

@dmik
Copy link
Contributor Author

dmik commented Dec 16, 2019

I'm not sure what you mean by that. There is no such function in EMX sources. From what I see, it's some static func which isn't seen in the .map file, unfortunately. Presumably located in safe-ctype.c which is part of libiberty (which we now ship as part of binutils). BTW, this might be also the reason for the crash. Knut builds use his own binutils and libiberty which is kept inside the LIBC source tree).

I will continue my debugging tomorrow.

@StevenLevine
Copy link

exmomf.exe links to libiberty.a which is where all the demangle code lives. d_print_comp() is in cp-demangle.c.

@dmik
Copy link
Contributor Author

dmik commented Dec 16, 2019

BTW what makes you think it's d_print_comp? Do you know an easy way to get a closest function name from a .dbg file/NB04 block?

@dmik dmik changed the title EMXOMF crash when run in parallel EMXOMF crash with complex C++ Dec 17, 2019
@dmik
Copy link
Contributor Author

dmik commented Dec 18, 2019

The crash in libiberty has gone, we need to rebuild binutils and then rebuild libc. Will leave it open till then.

Note that the new libiberty doesn't crash but it doesn't recongize the mangled name either. This seems to be an upstream problem (see bitwiseworks/binutils-os2#1 for more details). It's not very relevant for us because this name is only used by EMXOMF to put it to the debug info block. So it will only make such non-demangleable functions look as is (mangled) in the generated .TRP reports (and in other places using HLL debug info to name functions). We can live with that. (And wait until it's resolved upstream).

@dryeo
Copy link

dryeo commented Dec 18, 2019 via email

@dmik
Copy link
Contributor Author

dmik commented Dec 19, 2019

@dryeo should be fixed now.

@dmik
Copy link
Contributor Author

dmik commented Dec 19, 2019

OTOH, it's more correct to close this ticket as it's resolved. I've been using a custom build of EMXOMF for now, will be available to everyone with a new RPM.

@dmik dmik closed this as completed Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants