Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug info support for performance dump #249

Merged
merged 1 commit into from
May 23, 2024

Conversation

zherczeg
Copy link
Collaborator

The code adds debug info support for jitdump, which helps analyzing performance bottlenecks. The source file for debugging is auto generated, since the jit compiler has access only to the walrus byte code. Depends on #244

@zherczeg
Copy link
Collaborator Author

Note: the code uses the WALRUS_PERF_DIR environment variable instead of command line arguments. This way it is easier to use it in other projects such as lwnode.

return;
}

m_sourceFileName = std::string(path) + "/jit-" + std::to_string(m_pid) + "-codedump.txt";
Copy link
Collaborator

@clover2123 clover2123 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of m_sourceFileName?
Is this also related with perf tool?
And it seems that this file is opened, but not closed after execution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I will fix that. The purpose is generating a text dump from the Walrus byte code, and this "source" file is referred by the debug line info. In Walrus byte code, we cannot connect the original source in any way, but this way we still get info where the execution is mostly happens.

Example dump:

--- FUNCTION:5 [stackAlloc] ---
Prolog
[0] MoveI32: R0(r0)
[1] Const32: R0(imm)
[2] GlobalGet32: R0(r1)
[3] I32Sub: P0(r1), P1(r0), R0(r1)
[4] I32And: P0(r1), P1(imm), R0(r1)
[5] MoveI32: P0(r1), R0(r0)
[6] GlobalSet32: P0(r1), T0(r3)
[7] End: P0(r0)
Epilog

Example mixed code:

Percent│      [550] MoveF32: P0(r0), R0(o:152)                                                                            ▒
  0.16 │        movss    %xmm0,0x98(%r15)                                                                                 ▒
       │      [551] F32Store: P0(r9), P1(r0), T0(r1), T1(r2)                                                              ▒
       │        mov      %r13d,%esi                                                                                       ▒
       │        mov      0x38(%rbx),%rax                                                                                  ▒
       │        add      $0x84,%rsi                                                                                       ▒
       │        cmp      $0xfffffc,%rsi                                                                                   ▒
       │      → ja       3067                                                                                             ▒
       │        movss    %xmm0,(%rax,%rsi,1)                                                                              ▒
       │      [552] F32Load: P0(r9), R0(o:144), T0(r1), T1(r2)                                                            ▒
       │        mov      %r13d,%esi                                                                                       ▒
       │        mov      0x38(%rbx),%rax                                                                                  ▒
       │        add      $0x6c,%rsi                                                                                       ▒
       │        cmp      $0xfffffc,%rsi                                                                                   ▒
       │      → ja       3067                                                                                             ▒
  0.32 │        movss    (%rax,%rsi,1),%xmm4                                                                              ▒
       │        movss    %xmm4,0x90(%r15)                                                                                 ▒
       │      [553] MoveF32: P0(o:168), R0(o:288)                                                                         ▒
       │        movss    0xa8(%r15),%xmm4                                                                                 ◆
       │        movss    %xmm4,0x120(%r15)                                                                                ▒
       │      [554] F32Sub: P0(o:232), P1(o:212), R0(o:296)                                                               ▒
  0.48 │        movss    0xe8(%r15),%xmm4                                                                                 ▒
       │        subss    0xd4(%r15),%xmm4                                                                                 ▒
       │        movss    %xmm4,0x128(%r15)                     

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, creating and using this 'source' file, we can analyze performance bottleneck of JIT code through perf tool, am I right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Without any source information, it is hard to figure out which machine code is generated from which byte code. This way at least we see the WebAssembly (Walrus to be more precise) byte code.

@@ -314,16 +314,18 @@ void JITCompiler::insertStackInitList(InstructionListItem* prev, size_t variable
}
}

const char* JITCompiler::m_byteCodeNames[] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_byteCodeNames array is used only for dump.
For those debugging tools, what about dynamically allocating this array only when dump operation is initialized?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how can I do that. To do this dynamically, I need some static information, which can be used to generate this array. Shall we use some external file, which is processed at runtime? Or some kind of compressed data structure, which is uncompressed at runtime?

Copy link
Collaborator

@clover2123 clover2123 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see
Instead of dynamic alloc, what about enabling these dump codes only for debug mode?
(using macro #ifndef NDEBUG)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That means --jit-verbose will have no effect on release mode. I think that is acceptable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable all dump operations only for debug build?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. Now the byte code dump can be disabled even if perf dump is enabled.

@zherczeg zherczeg force-pushed the perf_debug branch 3 times, most recently from ee20b1c to b8e2c2f Compare May 23, 2024 07:39
Signed-off-by: Zoltan Herczeg zherczeg.u-szeged@partner.samsung.com
Copy link
Collaborator

@clover2123 clover2123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clover2123 clover2123 merged commit 29a1cd0 into Samsung:main May 23, 2024
12 checks passed
@zherczeg zherczeg deleted the perf_debug branch May 23, 2024 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants