Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise SIGTRAP in TR::trap() instead of SIGABRT #5774

Merged
merged 1 commit into from
Feb 10, 2021

Conversation

fjeremic
Copy link
Contributor

@fjeremic fjeremic commented Jan 29, 2021

SIGABRT only has one global signal handler, so we cannot guard
function calls against SIGABRT using the port library APIs. Raising a
SIGTRAP is useful for downstream projects who may want to catch such
signals for compilation thread crashes and requeue such compilations
(for another attempt, or perhaps to generate additional diagnostic data).

Signed-off-by: Filip Jeremic fjeremic@ca.ibm.com

@fjeremic fjeremic changed the title Add TR_EnableSegfaultOnTrap option Add TR_EnableSIGSEGVOnTrap option Jan 29, 2021
@fjeremic fjeremic marked this pull request as draft January 29, 2021 19:39
@fjeremic
Copy link
Contributor Author

Converting this to a draft PR, as I'm not seeing expected results. This PR is being driven by some work in OpenJ9. I'm finding that while I am able to set the new option for JitDump compilations, it seems we are unable to catch SIGSEGV which is generated by the store to memory location 0 via omrsig_protect call. If I change abort() to a store to 0, everything does work as expected.

It seems as if whatever the global abort signal handler is doing is affecting our ability to catch any other signal.

@fjeremic
Copy link
Contributor Author

I can get everything to work if we use comp->failCompilation<TR::CompilationException> instead of storing to memory location 0. However I'd like to understand further why we are unable to catch any signal following the abort() triggered by a TR_FATAL_ASSERT.

This is because we could in theory still encounter a crash at a different location when generating the JitDump, in which case we're back in the same scenario.

@fjeremic
Copy link
Contributor Author

fjeremic commented Jan 29, 2021

Just spoke to @babsingh who kindly explained some stuff to me. He explained that the way we catch sync/async signals is different. Calling abort() within TR_ASSERT_FATAL will not work well because we may reinstall the default signal handlers among other things. We tested instead using SIGTRAP to trap inside of TR::trap() (makes sense right? haha) and everything did work as expected.

The unfortunate side-effect of such a change, for OpenJ9 at least is that using SIGTRAP will make us print the register context when we hit a JIT assertion. I'll have to ask around to see if this is a showstopper for people. Here is what it would look like:

jeremic@Filips-MacBook-Pro openj9-openjdk-jdk11 % ./build/macosx-x86_64-normal-server-release/jdk/bin/java '-Xjit:{sun/nio/fs/UnixPath.checkNotNul(Ljava/lang/String;C)V}(crashduringcompile),verbose={*},vlog=vlog' -Xnoaot -version
Assertion failed at /Users/fjeremic/Projects/openj9-openjdk-jdk11/omr/compiler/compile/OMRCompilation.cpp:1035: !self()->getOption(TR_CrashDuringCompilation)
VMState: 0x00050080
	crashDuringCompile option is set
compiling sun/nio/fs/UnixPath.checkNotNul(Ljava/lang/String;C)V at level: warm
Unhandled exception
Type=Unhandled trap vmState=0x00050080
J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=00000001
Handler1=000000000DF92750 Handler2=000000000E227AC0
RDI=0000000000003F03 RSI=0000000000000005 RAX=0000000000000000 RBX=000070000040B000
RCX=0000700000406E98 RDX=0000000000000000 R8=00000000001ED5FC R9=FFFFFFFF00000000
R10=000070000040B000 R11=0000000000000246 R12=0000000000003F03 R13=0000000000000001
R14=0000000000000005 R15=0000000000000016
RIP=00007FFF20340462 GS=0000 FS=0000 RSP=0000700000406E98
RFlags=0000000000000246 CS=0007 RBP=0000700000406EC0 ERR=192B101002000148
TRAPNO=0200014800000085 CPU=1010020001480000 FAULTVADDR=00007FE7192B1010
XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM2 000000003f7f96a3 (f: 1065326272.000000, d: 5.263411e-315)
XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM8 0f0f0f0f0f0f0f0f (f: 252645136.000000, d: 3.815737e-236)
XMM9 0302020102010100 (f: 33620224.000000, d: 3.524484e-294)
XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM11 0000000100000000 (f: 0.000000, d: 2.121996e-314)
XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/usr/lib/system/libsystem_kernel.dylib
Module_base_address=00007FFF20339000 Symbol=__pthread_kill
Symbol_address=00007FFF20340458
Method_being_compiled=sun/nio/fs/UnixPath.checkNotNul(Ljava/lang/String;C)V
Target=2_90_20210128_000000 (Mac OS X 10.16)
CPU=amd64 (16 logical CPUs) (0x800000000 RAM)
----------- Stack Backtrace -----------
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2021/01/29 16:08:58 - please wait.
JVMDUMP032I JVM requested System dump using '/Users/fjeremic/Projects/openj9-openjdk-jdk11/core.20210129.160858.49986.0001.dmp' in response to an event
openjdk version "11.0.10-internal" 2021-01-19
OpenJDK Runtime Environment (build 11.0.10-internal+0-adhoc.fjeremic.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build 9522-interruption-2172b88589, JRE 11 Mac OS X amd64-64-Bit Compressed References 20210128_000000 (JIT enabled, AOT disabled)
OpenJ9   - 0390c2a95f
OMR      - <Unknown>
JCL      - 5e81062d6d based on jdk-11.0.10+5)
JVMDUMP012E Error in System dump: The core file created by child process with pid = 49987 was not found. Expected to find core file with name "/cores/core.49987"
JVMDUMP032I JVM requested Java dump using '/Users/fjeremic/Projects/openj9-openjdk-jdk11/javacore.20210129.160858.49986.0002.txt' in response to an event
JVMDUMP010I Java dump written to /Users/fjeremic/Projects/openj9-openjdk-jdk11/javacore.20210129.160858.49986.0002.txt
JVMDUMP032I JVM requested Snap dump using '/Users/fjeremic/Projects/openj9-openjdk-jdk11/Snap.20210129.160858.49986.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /Users/fjeremic/Projects/openj9-openjdk-jdk11/Snap.20210129.160858.49986.0003.trc
JVMDUMP032I JVM requested JIT dump using '/Users/fjeremic/Projects/openj9-openjdk-jdk11/jitdump.20210129.160858.49986.0004.dmp' in response to an event
Assertion failed at /Users/fjeremic/Projects/openj9-openjdk-jdk11/omr/compiler/compile/OMRCompilation.cpp:1035: !self()->getOption(TR_CrashDuringCompilation)
VMState: 0x00050080
	crashDuringCompile option is set
compiling sun/nio/fs/UnixPath.checkNotNul(Ljava/lang/String;C)V at level: warm
JVMDUMP010I JIT dump written to /Users/fjeremic/Projects/openj9-openjdk-jdk11/jitdump.20210129.160858.49986.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".
fjeremic@Filips-MacBook-Pro openj9-openjdk-jdk11 %

For reference this is how things looked like before:

fjeremic@Filips-MacBook-Pro openj9-openjdk-jdk11 % ./build/macosx-x86_64-normal-server-release/jdk/bin/java '-Xjit:{sun/nio/fs/UnixPath.checkNotNul(Ljava/lang/String;C)V}(crashduringcompile),verbose={*},vlog=vlog' -Xnoaot -version
Assertion failed at /Users/fjeremic/Projects/openj9-openjdk-jdk11/omr/compiler/compile/OMRCompilation.cpp:1035: !self()->getOption(TR_CrashDuringCompilation)
VMState: 0x00050080
	crashDuringCompile option is set
compiling sun/nio/fs/UnixPath.checkNotNul(Ljava/lang/String;C)V at level: warm

JVMDUMP039I Processing dump event "abort", detail "" at 2021/01/29 16:19:36 - please wait.
JVMDUMP032I JVM requested System dump using '/Users/fjeremic/Projects/openj9-openjdk-jdk11/core.20210129.161936.52917.0001.dmp' in response to an event
openjdk version "11.0.10-internal" 2021-01-19
OpenJDK Runtime Environment (build 11.0.10-internal+0-adhoc.fjeremic.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build 9522-interruption-2172b88589, JRE 11 Mac OS X amd64-64-Bit Compressed References 20210128_000000 (JIT enabled, AOT disabled)
OpenJ9   - 0390c2a95f
OMR      - <Unknown>
JCL      - 5e81062d6d based on jdk-11.0.10+5)
JVMDUMP012E Error in System dump: The core file created by child process with pid = 52918 was not found. Expected to find core file with name "/cores/core.52918"
JVMDUMP032I JVM requested Java dump using '/Users/fjeremic/Projects/openj9-openjdk-jdk11/javacore.20210129.161936.52917.0002.txt' in response to an event
JVMDUMP010I Java dump written to /Users/fjeremic/Projects/openj9-openjdk-jdk11/javacore.20210129.161936.52917.0002.txt
JVMDUMP032I JVM requested Snap dump using '/Users/fjeremic/Projects/openj9-openjdk-jdk11/Snap.20210129.161936.52917.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /Users/fjeremic/Projects/openj9-openjdk-jdk11/Snap.20210129.161936.52917.0003.trc
JVMDUMP032I JVM requested JIT dump using '/Users/fjeremic/Projects/openj9-openjdk-jdk11/jitdump.20210129.161936.52917.0004.dmp' in response to an event
Assertion failed at /Users/fjeremic/Projects/openj9-openjdk-jdk11/omr/compiler/compile/OMRCompilation.cpp:1035: !self()->getOption(TR_CrashDuringCompilation)
VMState: 0x00050080
	crashDuringCompile option is set
compiling sun/nio/fs/UnixPath.checkNotNul(Ljava/lang/String;C)V at level: warm

Notice that JVMDUMP010I JIT dump written to and JVMDUMP013I Processed dump event "gpf", detail "". are not printed. This is because we executed abort() twice and the JVM terminated before we were done the entire dumping process. This is what we're trying to fix, however it cannot be done if we continue to use abort() within TR::trap().

compiler/infra/Assert.cpp Outdated Show resolved Hide resolved
@dsouzai
Copy link
Member

dsouzai commented Jan 29, 2021

The unfortunate side-effect of such a change, for OpenJ9 at least is that using SIGTRAP will make us print the register context when we hit a JIT assertion. I'll have to ask around to see if this is a showstopper for people.

This happens when we hit a VM assert too in OpenJ9 doesn't it? In which case we'd be consistent with the VM.

@fjeremic
Copy link
Contributor Author

This happens when we hit a VM assert too in OpenJ9 doesn't it? In which case we'd be consistent with the VM.

The VM does different things. Some of their asserts call assert(0) (which ends up calling abort()), others call exit(-1).

@fjeremic fjeremic changed the title Add TR_EnableSIGSEGVOnTrap option Raise SIGTRAP in TR::trap() instead of SIGABRT Feb 1, 2021
@fjeremic fjeremic force-pushed the 9522-interruption branch 3 times, most recently from 4df654b to 5d19aba Compare February 1, 2021 17:45
SIGABRT only has one global signal handler, so we cannot guard
function calls against SIGABRT using the port library APIs. Raising a
SIGTRAP is useful for downstream projects who may want to catch such
signals for  compilation thread crashes and requeue such compilations
(for another attempt, or perhaps to generate additional diagnostic
data).

Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
@fjeremic fjeremic marked this pull request as ready for review February 3, 2021 15:31
@fjeremic
Copy link
Contributor Author

fjeremic commented Feb 3, 2021

I asked the community and no one seems to have concerns:
https://openj9.slack.com/archives/C8312LCV9/p1611955593018900

I'd like to proceed with the SIGTRAP change. This is ready for review.

@fjeremic
Copy link
Contributor Author

fjeremic commented Feb 4, 2021

@genie-omr build all

@0xdaryl
Copy link
Contributor

0xdaryl commented Feb 9, 2021

Tagging a few people who may wish to weigh in before merge: @ymanton @dsouzai @babsingh

@0xdaryl 0xdaryl self-assigned this Feb 9, 2021
@dsouzai
Copy link
Member

dsouzai commented Feb 9, 2021

No issues on my end.

Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@babsingh
Copy link
Contributor

babsingh commented Feb 9, 2021

SIGTRAP isn't available on Windows: https://docs.microsoft.com/en-us/previous-versions/dwwzkt4c(v=vs.140). Will this code also run on Windows?

//FIXME: this doesn't work on z/OS
*(volatile int*)(0) = 0; // let crashlog do its thing
}

#ifdef _MSC_VER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@babsingh yes it will run on Windows. There is an ifdef to handle Windows specifically.

Copy link
Contributor

@babsingh babsingh Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: DebugBreak/SIGSEGV on Windows and raise(SIGTRAP) on Unix platforms.

@0xdaryl 0xdaryl merged commit fce4cbc into eclipse:master Feb 10, 2021
@fjeremic fjeremic deleted the 9522-interruption branch February 26, 2021 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants