Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some unwinding problems with the new vm cstack mode on AArch 64 #819

Closed
yanglong1010 opened this issue Sep 19, 2023 · 3 comments
Closed

Comments

@yanglong1010
Copy link
Contributor

Hi Andrei

Recently I was trying out the new stack unwinding implementation (--cstack vm) and found that there may be two problems here on AArch64.

reproducer

Linux iZbp1gsbpsgn4675kr8vfeZ 4.18.0-348.20.1.el7.aarch64 #1 SMP Wed Apr 13 20:57:50 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux`

openjdk version "1.8.0_352"
OpenJDK Runtime Environment (build 1.8.0_352-b08)
OpenJDK 64-Bit Server VM (build 25.352-b08, mixed mode)
public class Test {
    public static void main(String[] args) {
        long time = System.currentTimeMillis();
        while (true) {
            long n = wrapper();
            if (n - time > 1000) {
                System.out.println("tick");
                time = n;
            }
        }
    }

    private static long wrapper() {
        return test();
    }

    private static long test() {
        return System.currentTimeMillis();
    }
}
java -Xcomp Test

asprof -e cpu -f /root/profile.html -d 5 --cstack vm jps

The first problem is that the call_stub and native methods below main are unknown.

image

After discussing with D-D-H, he thought it might be because vm leaf call would reserve 2 words on the stack.

# {method} {0x0000ffffaf1f0348} 'main' '([Ljava/lang/String;)V' in 'Test'
...
  0x0000ffffb051727c: mov	x8, #0xdec0                	// #57024
                                                ;   {runtime_call}
  0x0000ffffb0517280: movk	x8, #0xbdb6, lsl #16
  0x0000ffffb0517284: movk	x8, #0xffff, lsl #32
  0x0000ffffb0517288: stp	xzr, x9, [sp, #-16]! // reserve 2 words   <----
  0x0000ffffb051728c: blr	x8
  0x0000ffffb0517290: add	sp, sp, #0x10   ;*invokestatic currentTimeMillis
                                                ; - Test::test@0 (line 18)
                                                ; - Test::wrapper@0 (line 14)
                                                ; - Test::main@4 (line 5)

I slightly modified stackWalker.cpp.
When calculating the caller sp of nmethod frame, check if there is an instruction likes stp xzr, x9, [sp, #-16]! near the pc,
and if so then deal with the sp accordingly. After verification, the problem can indeed be solved.

image

However, in the picture above, there is still a small probability of stack unwinding errors (left most, details in the picture below).
This leads to the second problem.
image

I found that the cfa_off of gettimeofday@plt is 0, it will enter the branch AArch64 default_frame, resulting in sp calculation error.
When executing gettimeofday@plt, since sp and fp will not be modified, they are actually the same as the fp and sp of os::javaTimeMillis.
I modified the code slightly, calculate sp like os::javaTimeMillis, and found that the problem could be solved.

image

After solving the above two problems, the correct result is as follows
image

I don't know if you can confirm it ?

If it is indeed a problem, I wonder if you would like me to fix it. It might take me some time and there is a lot of background knowledge to understand.

@apangin
Copy link
Collaborator

apangin commented Sep 20, 2023

Thank you for the report and the analysis.
I can confirm the issue. It's not related to cstack=vm - the same problem affects other stack walking modes too, including the stack walker in the JVM itself: both AsyncGetCallTrace and JFR fail to reconstruct stack traces ending with os::javaTimeMillis() calls.

MacroAssembler::call_VM_leaf_base saves two registers on a stack before the call,
thus breaking the invariant that sender_sp == current_sp + frame_size.
Furthermore, leaf VM calls do not save last_Java_sp and last_Java_fp in a thread, making it hard to recover the "right" stack pointer of a Java frame that calls into VM runtime. So the best async-profiler can do is to apply some heuristics like peeking into the instruction stream as you suggested.

I'll take a closer look when I have time.

@apangin
Copy link
Collaborator

apangin commented Jan 15, 2024

I've pushed the fix for both issues. Thanks again for the thorough analysis.

@apangin
Copy link
Collaborator

apangin commented Jan 16, 2024

For the reference, I submitted the JVM bug JDK-8323755.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants