Improve jitdump functionality #9120

fjeremic · 2020-04-03T20:06:06Z

Background

The jitdump is a dump agent [1] which collects JIT trace logs which can help investigation of OpenJ9 issues. This dump agent is enabled by default for general purpose faults and aborts [2].

A jitdump can typically help under two scenarios:

A crash during a JIT compilation
A crash in a JIT compiled method

For both of these scenarios we typically require a JIT trace log of the method in question for further investigation. Sometimes this is an iterative process, especially for case 2. as we may no know which area of the JIT compiler was responsible for generating the faulty logic in the JIT compiled method assembly. The iterative process may require us to learn more about the problem from every log, and suggest additional tracing options until we can pinpoint the problem.

For case 1. we often need to have additional tracing enabled of the area in the JIT that we crashed, in addition to having the JIT IL trees at hand.

Due to the dynamic nature of the JVM runtime environment, and the fact that the JIT compiler is guided by profiling information, a JIT compilation of a method in one JVM invocation may behave differently than a JIT compilation of the same method in a subsequent invocation of the JVM, even when the same environment and application is being run. This is a problem for servicing such issues if the first incident data collection did not capture enough information to be able to effectively service the issue and provide a resolution.

The typical result of the failure to obtain useful logging on first incident is that developers/service engineers must work with the stakeholder to reproduce the issue with additional tracing. This can take time and resources for both parties. A properly generated jitdump has a very high chance of reproducing the exact same compilation as the original, but with tracing enabled due to the fact that it runs in the same JVM process which produced the original faulty compilation. Therefore it is highly desirable to generate a useful jitdump on first incident to speed up the investigation effort of issues in the JIT.

[1] https://www.eclipse.org/openj9/docs/xdump/#dump-agents
[2] https://www.eclipse.org/openj9/docs/xdump/#default-dump-agents

Problems

There are several limitations when jitdump trace files are created:

The jitdump file is empty
The jitdump only contains a partial trace due to a recursive crash not related to original problem
The jitdump file fails to trace the right area of the JIT for finer grained information
The jitdump does not trace the full backtrace of interesting methods
The jitdump trace does not reproduce the original trace file
The options used for the jitdump generation were different than the options used for original compilation
The jitdump compilation fails to complete due to JVM shutdown

Goal

The goal of this effort is to figure out a way to resolve the problems outlined in the previous section, and to always generate a useful jitdump so that developers/service engineers can make use of the trace information obtained during first incident data collection. The success metric of this effort will be quantified by the reduction in the amount of time it takes for developers/service engineers to obtain a JIT trace log which contains valuable information to make progress on fixing a defect. Another goal of this effort is to improve documentation and code quality of the jitdump process in the JIT compiler.

Issues / PRs

fjeremic · 2020-04-28T16:35:51Z

Adding new PR (eclipse/omr#5135) to the list which will address some newline issues seen when jitdumps are generated.

fjeremic · 2020-04-28T16:38:09Z

I've reopened #9227 as we'll need to avoid printing snippets after a crash since we cannot reliably print them before binary encoding. This is further explained in eclipse/omr#5111 which will be addressed at some point in the future. For now, we still want to avoid recursive crashes so we get a proper jitdump out so I'll be addressing that issue in the next few days.

fjeremic · 2020-04-28T16:51:26Z

Adding new issue (#9386) on a proposal to enable paranoid opt. check for jitdump recompilations.

fjeremic · 2020-04-28T16:58:03Z

Adding new PR (#9387) to address inconsistency in generation of jitdump vs. javacore and other dump triggers. That is, the messages reported and how they are reported are now consistent with javacore, Snap dump, heapdump, etc. and there is no redundant prefixes in the messages.

In addition we use the same function naming convention as javacore and snap dumps to remain consistent with other parts of the JVM.

fjeremic · 2020-05-01T15:25:59Z

Adding new issue (#9428) to improve programmatically setting of tracing options for jitdump compiles.

fjeremic · 2020-05-06T19:20:04Z

Adding new issue (#9479) to support specifying sub-options using the -Xdump framework to jitdump so as to enable custom tracing to arbitrary failures.

fjeremic · 2020-05-11T20:10:45Z

Adding new issue (#9522) to avoid compilation interruptions, such as the JVM wanting to shut down, when generating jitdumps. This is often seen in JUnit type tests where for example a crash in the JIT will happen, or an exception is thrown in a test which reaches main. In such scenarios JUnit will report this error and it may terminate the rest of the tests at that point. The JVM will then want to shut down but jitdumps are still being generated. This results in truncated jitdumps which are not useful for diagnosing the problem.

fjeremic · 2020-06-05T13:29:46Z

Just a quick update on where things stand. I currently have several PRs up which I'm waiting to get merged before forging on. I think the most important issue to work on following this bulk of PRs getting merged is #9136.

fjeremic · 2020-06-29T16:25:12Z

Another update on #9136. I've gotten to the bottom of the major issue for one of the deadlocks. Still need to investigate the other much less common, and more artificial deadlock described in the latest comment in #9136. I'd like to fix them both to close off that item which is a major milestone in this work.

fjeremic · 2020-08-24T21:19:34Z

Back to trying to finish this off in the next month or so. Trying to knock off the easier items first, so I'm resuming #9428.

fjeremic · 2020-09-10T14:21:50Z

Another update from me. I do still have this on my radar but have been distracted by some machine migration that must be performed by end of September. I hope to get back to working on this in the next few weeks. I will post an update once I get back to doing something meaningful in this area.

fjeremic · 2020-09-23T16:07:02Z

The changes delivered here are already starting to show their benefit, for example a 0/420 defect was able to produce a useful jitdump on first failure data capture over in #10630 which will aid in debugging the assert there.

liqunl · 2020-10-06T20:16:51Z

I found one problem in a crash. The original crash is in AOT compilation, but the replay is for JIT compilation which finishes without error.

liqunl · 2020-10-07T18:58:44Z

It would be good if the trace log and jitdump tell us if it is an AOT compile. Another problem is that, if the crash is in ilgen, no trees will be printed out before replay. I guess if replay happens in the right context, that is not a problem, but it will still be good if we can print some information.

dsouzai · 2020-10-08T21:52:18Z

I found one problem in a crash. The original crash is in AOT compilation, but the replay is for JIT compilation which finishes without error.

Opened #10852

fjeremic · 2021-01-25T21:37:27Z

Getting back to this work in the last few days as I'm trying to polish this off given we are so close to completing everything. I started back looking at #9522 and that problem is mostly fixed, but during my stress testing around that area I discovered several issues which I've documented in #11765, #11770, and #11772. I have a firm understanding of the various problems now and I have solutions for each of them which I will try to deliver in the next few days. We are much closer to having robust JitDump generation.

fjeremic · 2021-01-29T18:41:28Z

I've dug myself out of the hole and have emerged with a ton of goodies. I've opened up #11825 which addresses what I believe to be all issues revolving around generation of JitDumps from crashed compilations. It will also help in the case of application thread crashes as well. This is the area I am going to stress test next and ensure every JIT compiled body on the stack of an application crash gets a JitDump recompilation. This will be the final step in this saga, afterwhich I expect every single JIT defect to have a useful JitDump accompanying it.

fjeremic · 2021-03-04T20:39:57Z

All the issues on compilation thread crashes have been resolved that I could find. Going to take a look at application thread crashes and see if there is anything to fix on that front. If not, I'll do another refactoring pass to clean everything up, add documentation, and proper tracing then close off the Epic.

fjeremic · 2021-03-11T20:46:40Z

We are almost done here. I've opened #12203 as a final refactoring PR. Once that PR is merged my contribution to this Epic is complete. Thanks to all who followed along!

liqunl · 2021-03-12T15:17:10Z

I wonder if you can add a change to turn on TR_DebugInliner when the crash is in inliner? Or could you point me where to set this option? A lot of inlining traces are guarded by TR_DebugInliner and they're not printed in a jitdump.

fjeremic · 2021-03-12T15:38:38Z

I wonder if you can add a change to turn on TR_DebugInliner when the crash is in inliner? Or could you point me where to set this option? A lot of inlining traces are guarded by TR_DebugInliner and they're not printed in a jitdump.

Implemented in #12208.

JamesKingdon · 2021-04-08T13:07:31Z

Hoping that late is still better than never - Thank you for all this work!!

fjeremic added the comp:jit label Apr 3, 2020

fjeremic mentioned this issue Apr 27, 2020

JTReg Test Fail: sun/nio/cs/FindDecoderBugs.java #8992

Open

fjeremic self-assigned this Feb 16, 2021

fjeremic mentioned this issue Mar 11, 2021

Refactor and document the JitDump process #12203

Merged

dsouzai closed this as completed in #12203 Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve jitdump functionality #9120

Improve jitdump functionality #9120

fjeremic commented Apr 3, 2020 •

edited

Loading

fjeremic commented Apr 28, 2020 •

edited

Loading

fjeremic commented Apr 28, 2020

fjeremic commented Apr 28, 2020

fjeremic commented Apr 28, 2020

fjeremic commented May 1, 2020

fjeremic commented May 6, 2020

fjeremic commented May 11, 2020

fjeremic commented Jun 5, 2020

fjeremic commented Jun 29, 2020

fjeremic commented Aug 24, 2020

fjeremic commented Sep 10, 2020

fjeremic commented Sep 23, 2020

liqunl commented Oct 6, 2020

liqunl commented Oct 7, 2020

dsouzai commented Oct 8, 2020

fjeremic commented Jan 25, 2021 •

edited

Loading

fjeremic commented Jan 29, 2021

fjeremic commented Mar 4, 2021

fjeremic commented Mar 11, 2021

liqunl commented Mar 12, 2021

fjeremic commented Mar 12, 2021

JamesKingdon commented Apr 8, 2021

Improve jitdump functionality #9120

Improve jitdump functionality #9120

Comments

fjeremic commented Apr 3, 2020 • edited Loading

Background

Problems

Goal

Issues / PRs

fjeremic commented Apr 28, 2020 • edited Loading

fjeremic commented Apr 28, 2020

fjeremic commented Apr 28, 2020

fjeremic commented Apr 28, 2020

fjeremic commented May 1, 2020

fjeremic commented May 6, 2020

fjeremic commented May 11, 2020

fjeremic commented Jun 5, 2020

fjeremic commented Jun 29, 2020

fjeremic commented Aug 24, 2020

fjeremic commented Sep 10, 2020

fjeremic commented Sep 23, 2020

liqunl commented Oct 6, 2020

liqunl commented Oct 7, 2020

dsouzai commented Oct 8, 2020

fjeremic commented Jan 25, 2021 • edited Loading

fjeremic commented Jan 29, 2021

fjeremic commented Mar 4, 2021

fjeremic commented Mar 11, 2021

liqunl commented Mar 12, 2021

fjeremic commented Mar 12, 2021

JamesKingdon commented Apr 8, 2021

fjeremic commented Apr 3, 2020 •

edited

Loading

fjeremic commented Apr 28, 2020 •

edited

Loading

fjeremic commented Jan 25, 2021 •

edited

Loading