Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncGetCallTrace replacement #795

Closed
apangin opened this issue Aug 11, 2023 · 5 comments
Closed

AsyncGetCallTrace replacement #795

apangin opened this issue Aug 11, 2023 · 5 comments

Comments

@apangin
Copy link
Collaborator

apangin commented Aug 11, 2023

Background

AsyncGetCallTrace (AGCT) is a non-standard extension of HotSpot JVM to obtain Java stack traces outside safepoints. async-profiler has been relying on AGCT heavily, it even got its name after this function.

In the meantime, being non-API, AsyncGetCallTrace was never supported in OpenJDK well enough: it did not receive enough testing, it was broken several times even in minor JDK updates, e.g. JDK-8307549.

AsyncGetCallTrace is notorious for inability to walk Java stack in different corner cases. There is a long-standing bug JDK-8178287 with several examples. But the worst thing is that AsyncGetCallTrace can crash JVM, and there is no reliable way to get around this outside the JVM.

Furthermore, AsyncGetCallTrace has limited capabilities. It does not provide information about Java frame type (interpreted/compiled/inlined). It skips all non-Java frames, so it's impossible to see JNI or JVM internal frames in the middle of the call stack.

It was previously proposed in #66 to re-implement stack walking in async-profiler without the problematic API. However, as async-profiler got workarounds for different shortcomings of AsyncGetCallTrace, the re-implementation was postponed. But after years, we still see issues with AGCT from time to time, including random crashes and missing stack traces.

Proposal

Get rid of the dependency on AsyncGetCallTrace entirely, at least, for HotSpot JVM. Leverage VMStructs for getting offsets of the HotSpot structures essential for stack walking. Implement stack walking externally like Serviceability Agent does.

Requirements

  1. Reliability. External stack walker must be protected against JVM crashes.
  2. Quality. Collected stack traces should be at least as accurate as with AGCT; the amount of "unknown" states should be minimal.
  3. Completeness. New implementation should display all frames: Java, native and JVM stubs throughout the whole stack.
  4. Details. Stack walker should provide additional information on each frame, like JIT compilation type.
  5. Performance. While the performance of the external implementation may be inferior to the JVM built-in one, it should be fast enough for production profiling.

Target platforms

The implementation targets HotSpot JVM on Linux and macOS, both x64 and ARM64.
It is not required to provide new stack walker for other platforms and other JVM implementations.

Example

This is how mixed stack traces may look on a flame graph:

fullstack
@apangin
Copy link
Collaborator Author

apangin commented Aug 17, 2023

Added new stack walker that does not depend on AGCT as an optional feature.
It can be enabled with --cstack vm argument.

With this option, async-profiler collects mixed stack traces that have Java and native frames interleaved. The total stack depth is controlled with -j jstackdepth option.

@apangin
Copy link
Collaborator Author

apangin commented Aug 17, 2023

As a part of this work, DWARF unwinding has been implemented for ARM64 (it was previously available only on x86_64).

@apangin
Copy link
Collaborator Author

apangin commented Aug 17, 2023

It's worth mentioning that new stack walker is fully enclosed by the crash protection based on setjmp/longjmp. Since the stack walker does not modify any VM structures and is in the full control of async-profiler, it is safe to interrupt it anywhere in the middle of execution.

@apangin
Copy link
Collaborator Author

apangin commented Aug 22, 2023

Closing the issue as solved. Further improvements will be done in the context of separate issues.

@apangin apangin closed this as completed Aug 22, 2023
@krzysztofslusarski
Copy link
Contributor

Thank you @apangin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants