Understanding DWARF+WebAssembly offsets #9

kripken · 2019-12-19T22:39:02Z

Working on binaryen support for DWARF, I realized I don't know how to read the line info data. The main issues are:

The code addresses doc says offsets are the offset of an instruction relative within the Code section of the WebAssembly file. Does "the Code section" include the entire code section, with the 0xa0 byte to declare the code section and the LEB for the length? Or just the body, without those?
Can debug lines refer to code section offsets that are not code? (Like the function declarations.)
Can debug lines refer to inner parts of an instruction, and not the start?

In more detail here is what I am trying: I started with @yurydelendik 's fib2 sample,

__attribute__((used))
int fib(int n) {
  int i, t, a = 0, b = 1;
  for (i = 0; i < n; i++) {
    t = a;
    a = b;
    b += t;
  }
  return b;
}

and I build it with

clang fib2.c -O3 -g -o fib2.clang.wasm  -target wasm32-unknown-emscripten -nostdlib -Wl,--no-entry

LLVM's dwarfdump says this:

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000002      2      0      1   0             0  is_stmt
0x000000000000000b      4     17      1   0             0  is_stmt prologue_end
0x0000000000000010      4      3      1   0             0 
0x0000000000000012      0      3      1   0             0 
0x000000000000001e      7      7      1   0             0  is_stmt
0x0000000000000025      0      7      1   0             0 
0x0000000000000029      4     17      1   0             0  is_stmt
0x000000000000002e      4      3      1   0             0 
0x0000000000000034      9      3      1   0             0  is_stmt
0x0000000000000037      9      3      1   0             0  is_stmt end_sequence

The first line there says address 2. If the offset is in the code section body, then that's in the middle of the function declaration, and not executable code. Is that expected?

The fifth line has address 0x1e. Looking in the binary, though, the code section's body starts at 0x2d, and adding the offset we get 0x4b. That is the second out of 2 bytes of an i32.const -1, which seems odd?

fib2.clang.wasm.zip

The text was updated successfully, but these errors were encountered:

kripken · 2019-12-19T22:44:49Z

Also, when I load the wasm in the code explorer, it only shows 3 lines in the UI (2, 4, 7) while the debug line table also mentions line 9. Looking at that line 9 info, it starts at 0x34 which, relative to the start of the code section's body, is at 0x61 - which is past the end of the code section..?

cc @dschuff @yurydelendik

yurydelendik · 2019-12-19T23:21:16Z

the offset of an instruction relative within the Code section of the WebAssembly file

Code section starts at the its function count LEB. There are several decision that led to it:

We can potentially point to function locals bytes (see related response below), it is decided that it is better to start way before first function len LEB.
No valid DWARF offset shall be 0 or range start from 0. We reserving that for dead symbols: when linker cannot relocate entry, it places 0 in the .debug_info or .debug_line table.
The WASM files can be potentially manipulated to remove sections (and rewrite section header), so the decisions were made to make DWARF code offsets relative to the actual code section start.

Can debug lines refer to code section offsets that are not code?

In theory, yes. .debug_info will have ranges that point to entire function body. At the debugger side, "PC" pointing at locals bytes may signal entering frame. It is not used atm, we can change that requirement and use only offsets that point only to code section body/instructions.

Can debug lines refer to inner parts of an instruction, and not the start?

Not sure DWARF does have a requirement to point only to the start of the instruction.

The relocation section will definitely is capable to point to inner parts of an instruction.

kripken · 2019-12-19T23:31:37Z

Thanks @yurydelendik !

No valid DWARF offset shall be 0 or range start from 0. We reserving that for dead symbols: when linker cannot relocate entry, it places 0 in the .debug_info or .debug_line table.

Interesting, why not just drop that line then, seems like it won't be usable later anyhow? Or is there some other use for the information?

Not sure DWARF does have a requirement to point only to the start of the instruction.

It would require some additional logic in binaryen to support that. I was hoping not to need it...

yurydelendik · 2019-12-19T23:45:05Z

Interesting, why not just drop that line then

the lld cannot parse, optimize or re-write DWARF data due to its complexity. @sbc100 , is it correct?

seems like it won't be usable later anyhow?

It is not useful. Notice that .debug_line encodes only few offsets, and rest of them are deltas. That means delta becomes invalid/dead as well.

It would require some additional logic in binaryen to support that.

Agree. We can recommend that for WebAssembly DWARF.

sbc100 · 2019-12-20T01:00:37Z

On Thu, Dec 19, 2019 at 3:45 PM Yury Delendik ***@***.***> wrote: Interesting, why not just drop that line then the lld cannot parse, optimize or re-write DWARF data due to its complexity. @sbc100 <https://github.com/sbc100> , is it correct?

Correct, the linker doesn't do anything to DWARF info other than concatenate it. This is by design.

…

seems like it won't be usable later anyhow? It is not useful. Notice that .debug_line encodes only few offsets, and rest of them are deltas. That means delta becomes invalid/dead as well. It would require some additional logic in binaryen to support that. Agree. We can recommend that for WebAssembly DWARF. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AAD55ZLWDPKBG2JUDW43X2TQZQBQFA5CNFSM4J5R23F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHLNEIA#issuecomment-567726624>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD55ZJTSKWQDKXADS5H4MDQZQBQFANCNFSM4J5R23FQ> .

@yurydelendik

With this, we can update DWARF debug line info properly as we write a new binary. To do that we track binary locations as we write. Each instruction is mapped to the location it is written to. We must also adjust them as we move code around because of LEB optimization (we emit a function or a section with a 5-byte LEB placeholder, the maximal size; later we shrink it which is almost always possible). writeDWARFSections() now takes a second param, the new locations of instructions. It then maps debug line info from the original offsets in the binary to the new offsets in the binary being written. The core logic for updating the debug line section is in wasm-debug.cpp. It basically tracks state machine logic both to read the existing debug lines and to emit the new ones. I couldn't find a way to reuse LLVM code for this, but reading LLVM's code was very useful here. A final tricky thing we need to do is to update the DWARF section's internal size annotation. The LLVM YAML writing code doesn't do that for us. Luckily it's pretty easy, in fixEmittedSection we just update the first 4 bytes in place to have the section size, after we've emitted it and know the size. This ignores debug lines with a 0 in the line, col, or addr, see WebAssembly/debugging#9 (comment) This ignores debug line offsets into the middle of instructions, which LLVM sometimes emits for some reason, see WebAssembly/debugging#9 (comment) Handling that would likely at least double our memory usage, which is unfortunate - we are run in an LTO manner, where the entire app's DWARF is present, and it may be massive. I think we should see if such odd offsets are a bug in LLVM, and if we can fix or prevent that. This does not emit "special" opcodes for debug lines. Those are purely an optimization, which I wanted to leave for later. (Even without them we decrease the size quite a lot, btw, as many lines have 0s in them...) This adds some testing that shows we can load and save fib2.c and fannkuch.cpp properly. The latter includes more than one function and has nontrivial code. To actually emit correct offsets a few minor fixes are done here: * Fix the code section location tracking during reading - the correct offset we care about is the body of the code section, not including the section declaration and size. * Fix wasm-stack debug line emitting. We need to update in BinaryInstWriter::visit(), that is, right before writing bytes for the instruction. That differs from * BinaryenIRWriter::visit which is a recursive function that also calls the children - so the offset there would be of the first child. For some reason that is correct with source maps, I don't understand why, but it's wrong for DWARF... * Print code section offsets in hex, to match other tools. Remove DWARFUpdate pass, which was useful for testing temporarily, but doesn't make sense now (it just updates without writing a binary). cc @yurydelendik

turbolent · 2022-02-10T16:57:47Z

Thank you for the explanation @yurydelendik!

I'm trying to parse DWARF line info for https://github.com/turbolent/w2c2 and had the same questions after reading the spec, i.e. where the start of the code section is, and if it is normal that sometimes line addresses point to the middle of instructions. Maybe it is worth to document this better in the spec?

I'm still a bit confused about the last part, addresses pointing to the middle of instructions. Why not require alignment?

kripken mentioned this issue Dec 20, 2019

DWARF debug line updating WebAssembly/binaryen#2545

Merged

monperrus mentioned this issue Aug 18, 2020

add support for DWARF debugging symbols satabin/swam#94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding DWARF+WebAssembly offsets #9

Understanding DWARF+WebAssembly offsets #9

kripken commented Dec 19, 2019 •

edited

Loading

kripken commented Dec 19, 2019

yurydelendik commented Dec 19, 2019

kripken commented Dec 19, 2019

yurydelendik commented Dec 19, 2019

sbc100 commented Dec 20, 2019 via email

turbolent commented Feb 10, 2022 •

edited

Loading

Understanding DWARF+WebAssembly offsets #9

Understanding DWARF+WebAssembly offsets #9

Comments

kripken commented Dec 19, 2019 • edited Loading

kripken commented Dec 19, 2019

yurydelendik commented Dec 19, 2019

kripken commented Dec 19, 2019

yurydelendik commented Dec 19, 2019

sbc100 commented Dec 20, 2019 via email

turbolent commented Feb 10, 2022 • edited Loading

kripken commented Dec 19, 2019 •

edited

Loading

turbolent commented Feb 10, 2022 •

edited

Loading