Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amdllpc does not preserve debug information #513

Closed
inequation opened this issue Mar 18, 2020 · 11 comments
Closed

amdllpc does not preserve debug information #513

inequation opened this issue Mar 18, 2020 · 11 comments
Assignees
Labels

Comments

@inequation
Copy link

Hello there,

I'm not even sure if I'm posting this issue on the correct project in the suite, so apologies if that isn't the case.

I'm trying to compile a simple Vulkan fragment shader with amdllpc. According to the AMDGPU target user guide, the ELFs produced by it can contain a .debug section with DWARF data. However, I can't seem to be able to get it:

$ llvm-objdump --all-headers ellipse.elf

ellipse.elf:    file format ELF64-amdgpu

architecture: amdgcn
start address: 0x0000000000000000

Program Header:

Dynamic Section:
Sections:
Idx Name            Size     VMA              Type
  0                 00000000 0000000000000000
  1 .strtab         00000052 0000000000000000
  2 .text           00000250 0000000000000000 TEXT
  3 .note           000002a4 0000000000000000
  4 .AMDGPU.disasm  000017c8 0000000000000000
  5 .note.GNU-stack 00000000 0000000000000000
  6 .symtab         00000048 0000000000000000

SYMBOL TABLE:
0000000000000098         .text  00000000 BB0_1
0000000000000000 g     F .text  00000250 _amdgpu_ps_main

I've figured out that I need to set -trim-debug-info=false not to strip the SPIR-V debug info, and I had a look at the SPIR-V lowering code and it seems like the preserves debug info. I can also in the LLVM bitcode emitted to stdout that some symbols are there. How do I get the DWARF info out?

Here's the command line I'm using:

$ amdllpc -gfxip=9.0.6 ../ellipse.frag -trim-debug-info=false -enable-outs

And attached is the stdout output, which contains the GLSL source and all the intermediate stages: outs.log

@kuhar
Copy link
Contributor

kuhar commented Mar 18, 2020

Hi @inequation,
From what I know, shader debugging is an area under active development across all fronts (dxc, SPIRV-Tools, llpc, driver), but is not there yet. @jaebaek is an expert on the DXC and SPIRV-Tools side of things.

What do you want to do with the debug info?

Although this is not directly related to your question, note that llvm-objdump doesn't currently produce complete disassemby that would allow you to modify the dump and get a valid elf after re-assembling with llvm-mc. IIRC the .note section is not being fully dumped.

@inequation
Copy link
Author

Hi @kuhar,

I'm working on a shader programming educational tool. GCN/RDNA ISA can be intimidating initially for inexperienced programmers, especially long, unrolled loops, and I'm in need of a way to provide some context for the generated disassembly ("where does this massive blob of instructions come from?"). I would have linked a publication about the software I'm writing, but it's a team effort and it hasn't been published yet. :)

I'm sure you can see how having that debug info available would be useful in other contexts.

I'm fine with parsing the DWARF myself, I just don't know the LLVM/LLPC codebases nearly well enough to know where to start looking to ensure it is actually dumped.

@nhaehnle
Copy link
Member

I agree with @kuhar: it's a known limitation because so far, nobody has done the work of just tracking the debug info through the compile pipeline to make sure it doesn't disappear. Work on this would of course be welcome, otherwise we'll surely get to it eventually, but with no date attached ;)

@inequation
Copy link
Author

inequation commented Mar 25, 2020

I'm fine with working on this myself and contributing the work here, but some guidance would be necessary. For instance, I wasn't able to find my way through all the abstraction layers of LLVM to where bitcode gets lowered to GCN ISA, or where the .debug ELF section gets populated. All I need is a shortcut to these places, and I can begin further investigation on my own. Can you guys help with that?

@Flakebi
Copy link
Member

Flakebi commented Mar 25, 2020

The part in LLPC which adds the necessary LLVM passes to convert IR to an ELF should be this line:

if (GetTargetMachine()->addPassesToEmitFile(passMgr, outStream, nullptr, codegen::getFileType()))

The ELF writing code in LLVM that is specific to the AMDGPU backend should be in the llvm/lib/Target/AMDGPU/MCTargetDesc directory of LLVM: https://github.com/llvm/llvm-project/tree/master/llvm/lib/Target/AMDGPU/MCTargetDesc
I think AMDGPUTargetStreamer.cpp and AMDGPUAsmPrinter.cpp are the main classes responsible for creating ELF files (please correct me if I’m mistaken).

Btw, if you start amdllpc with -debug you get loads of output of all the other stages that happen after IR (SelectionDAG and MachineIR). If symbols get lost somewhere this might be helpful.

@nhaehnle
Copy link
Member

Btw, if you start amdllpc with -debug you get loads of output of all the other stages that happen after IR (SelectionDAG and MachineIR). If symbols get lost somewhere this might be helpful.

-print-before-all / -print-after-all also applies to amdllpc and is extremely helpful for understanding the flow of compilation.

nhaehnle pushed a commit to nhaehnle/llpc that referenced this issue Apr 7, 2020
…ure use'

The previous fix broke compatibility in some AMD internal builds. This
commit, in conjunction with the corresponding XGL commit, fixes that.

Change-Id: Iec0ee5e489b2a15b8eb30add8ddadddeb0f20fad
Pull-Request: GPUOpen-Drivers#513
Author: Tim Renouf <tim.renouf@amd.com>
git-pf-change: stg@2087392
@amdrexu amdrexu removed their assignment Jun 2, 2020
@brianwatling brianwatling self-assigned this Jun 16, 2020
@inequation
Copy link
Author

Hello there! Long time, no see.

I finally got to explore this a bit, and with two hacks inside the LLVM codebase, I was able to generate an ELF with the following debug line info:

inequation@Spearhead:/mnt/d/projects/GPUOpen-Drivers/vulkandriver/drivers/xgl/builds/Debug64$ readelf --debug-dump=decodedline ellipse.elf
readelf: Error: Missing knowledge of 32-bit reloc types used in DWARF sections of machine number 224
readelf: Warning: unable to apply unsupported reloc type 3 to section .debug_line
Decoded dump of debug contents of section .debug_line:

CU: <stdin>:
File name                            Line number    Starting address
<stdin>                                       20                0x18

<stdin>                                       25                0x30
<stdin>                                       20                0x38
<stdin>                                       25                0x4c
<stdin>                                       21                0x58
<stdin>                                       20                0x60
<stdin>                                        0                0x68
<stdin>                                       12                0x98
<stdin>                                       15                0x9c
<stdin>                                       13                0xa0
<stdin>                                       15                0xa4
<stdin>                                       13                0xa8
<stdin>                                       15                0xac
<stdin>                                       36                0xc0
<stdin>                                       13                0xc8
<stdin>                                       15                0xcc
<stdin>                                       36                0xd0
<stdin>                                       15                0xd8
<stdin>                                       39                0xe0
<stdin>                                       13                0xe4
<stdin>                                       15                0xe8
<stdin>                                       36                0xf4
<stdin>                                       13                0xfc
<stdin>                                       39               0x100
<stdin>                                       15               0x108
<stdin>                                       36               0x114
<stdin>                                       39               0x11c
<stdin>                                       40               0x120
<stdin>                                       15               0x128
<stdin>                                       36               0x130
<stdin>                                       39               0x138
<stdin>                                       42               0x13c
<stdin>                                       13               0x140
<stdin>                                       40               0x144
<stdin>                                       15               0x14c
<stdin>                                       36               0x150
<stdin>                                       39               0x158
<stdin>                                       42               0x15c
<stdin>                                       40               0x168
<stdin>                                       13               0x170
<stdin>                                       15               0x174
<stdin>                                       39               0x17c
<stdin>                                       42               0x180
<stdin>                                       40               0x18c
<stdin>                                       15               0x194
<stdin>                                       36               0x198
<stdin>                                       42               0x1a0
<stdin>                                       40               0x1ac
<stdin>                                       15               0x1b4
<stdin>                                       39               0x1bc
<stdin>                                       13               0x1c0
<stdin>                                       42               0x1c4
<stdin>                                       15               0x1d0
<stdin>                                       36               0x1d4
<stdin>                                       40               0x1dc
<stdin>                                       42               0x1e4
<stdin>                                       15               0x1e8
<stdin>                                       39               0x1f0
<stdin>                                       42               0x1f4
<stdin>                                       36               0x1f8
<stdin>                                       40               0x200
<stdin>                                       42               0x208
<stdin>                                       39               0x210
<stdin>                                       40               0x214
<stdin>                                       42               0x218
<stdin>                                       34               0x21c
<stdin>                                       42               0x220
<stdin>                                       53               0x224
<stdin>                                       42               0x228
<stdin>                                       53               0x22c
<stdin>                                       45               0x230

And the sequence of line numbers makes sense, comparing it to the source GLSL (available in the attachment to the OP), so this is progress!

There are, however, clear problems (empty source file name replaced with <stdin>, for starters). They are explained to some extent by the hacks which were necessary to get this to work. I'm attaching them as patch - in both cases, I'm disabling checks that would prevent emission of debug symbols. The problems, as far as my limited understanding of LLVM goes, amount to:

  • machine instructions not belonging to a subprogram, which made DwarfDebug::beginInstruction() bail out before recording any source lines,
  • compilation units containing none of the following: types, retained types, global variables, and macros, preventing the registration of the DWARF compile unit and hitting an assertion, once the above hurdle was removed.

I'm quite sure these are just symptoms of problems that happen somewhere earlier up the pipeline. I'll keep digging, but I'd appreciate any guidance I can get.

@inequation
Copy link
Author

I synced to latest, and of course the patch no longer applied cleanly. Here it is attached, updated to match latest GPUOpen-Drivers/llvm-project, along with the new, much shorter output:

inequation@Spearhead:/mnt/d/projects/GPUOpen-Drivers/vulkandriver/drivers/xgl/builds/Debug64$ readelf --debug-dump=decodedline ellipse.elf
readelf: Error: Missing knowledge of 32-bit reloc types used in DWARF sections of machine number 224
readelf: Warning: unable to apply unsupported reloc type 3 to section .debug_line
Decoded dump of debug contents of section .debug_line:

CU: <stdin>:
File name                            Line number    Starting address
<stdin>                                       20                0x20

<stdin>                                       25                0x3c
<stdin>                                       20                0x44
<stdin>                                       25                0x58
<stdin>                                       20                0x64
<stdin>                                       21                0x6c
<stdin>                                        0                0x74
<stdin>                                       12                0xa0
<stdin>                                       15                0xa4
<stdin>                                       13                0xa8
<stdin>                                       15                0xac
<stdin>                                       13                0xb0
<stdin>                                       15                0xb4
<stdin>                                       36                0xc8
<stdin>                                       15                0xd8
<stdin>                                       39                0xe0
<stdin>                                       36                0xe4
<stdin>                                       39                0xec
<stdin>                                       40                0xf4
<stdin>                                       39                0xfc
<stdin>                                       42               0x100
<stdin>                                       40               0x104
<stdin>                                       34               0x108
<stdin>                                       42               0x10c
<stdin>                                       53               0x118
<stdin>                                       42               0x11c
<stdin>                                       45               0x124

hacks.txt

@inequation
Copy link
Author

inequation commented Jun 19, 2020

I believe I have a proper fix for half of the issue. See the PR mentioned above (#772).

The other half is that source file name gets lost somewhere on the way and becomes empty, which is later interpreted as <stdin>. #line directives are also lost, in terms of source string number.

@inequation
Copy link
Author

#756 has the potential to resolve all my problems, and now includes my changes from #772. Needs testing.

@inequation
Copy link
Author

As of current head, this almost works as I need it to! Fixing source file name requires a fix within glslang, which I'll be trying to get in via KhronosGroup/glslang#2321.

@jinjianrong jinjianrong added the enhancement New feature or request label Jul 14, 2020
@jinjianrong jinjianrong added fixed and removed enhancement New feature or request labels Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants