-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding DWARF+WebAssembly offsets #9
Comments
Also, when I load the wasm in the code explorer, it only shows 3 lines in the UI (2, 4, 7) while the debug line table also mentions line 9. Looking at that line 9 info, it starts at |
Code section starts at the its function count LEB. There are several decision that led to it:
In theory, yes.
Not sure DWARF does have a requirement to point only to the start of the instruction. The relocation section will definitely is capable to point to inner parts of an instruction. |
Thanks @yurydelendik !
Interesting, why not just drop that line then, seems like it won't be usable later anyhow? Or is there some other use for the information?
It would require some additional logic in binaryen to support that. I was hoping not to need it... |
the lld cannot parse, optimize or re-write DWARF data due to its complexity. @sbc100 , is it correct?
It is not useful. Notice that .debug_line encodes only few offsets, and rest of them are deltas. That means delta becomes invalid/dead as well.
Agree. We can recommend that for WebAssembly DWARF. |
On Thu, Dec 19, 2019 at 3:45 PM Yury Delendik ***@***.***> wrote:
Interesting, why not just drop that line then
the lld cannot parse, optimize or re-write DWARF data due to its
complexity. @sbc100 <https://github.com/sbc100> , is it correct?
Correct, the linker doesn't do anything to DWARF info other than
concatenate it. This is by design.
… seems like it won't be usable later anyhow?
It is not useful. Notice that .debug_line encodes only few offsets, and
rest of them are deltas. That means delta becomes invalid/dead as well.
It would require some additional logic in binaryen to support that.
Agree. We can recommend that for WebAssembly DWARF.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9?email_source=notifications&email_token=AAD55ZLWDPKBG2JUDW43X2TQZQBQFA5CNFSM4J5R23F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHLNEIA#issuecomment-567726624>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD55ZJTSKWQDKXADS5H4MDQZQBQFANCNFSM4J5R23FQ>
.
|
With this, we can update DWARF debug line info properly as we write a new binary. To do that we track binary locations as we write. Each instruction is mapped to the location it is written to. We must also adjust them as we move code around because of LEB optimization (we emit a function or a section with a 5-byte LEB placeholder, the maximal size; later we shrink it which is almost always possible). writeDWARFSections() now takes a second param, the new locations of instructions. It then maps debug line info from the original offsets in the binary to the new offsets in the binary being written. The core logic for updating the debug line section is in wasm-debug.cpp. It basically tracks state machine logic both to read the existing debug lines and to emit the new ones. I couldn't find a way to reuse LLVM code for this, but reading LLVM's code was very useful here. A final tricky thing we need to do is to update the DWARF section's internal size annotation. The LLVM YAML writing code doesn't do that for us. Luckily it's pretty easy, in fixEmittedSection we just update the first 4 bytes in place to have the section size, after we've emitted it and know the size. This ignores debug lines with a 0 in the line, col, or addr, see WebAssembly/debugging#9 (comment) This ignores debug line offsets into the middle of instructions, which LLVM sometimes emits for some reason, see WebAssembly/debugging#9 (comment) Handling that would likely at least double our memory usage, which is unfortunate - we are run in an LTO manner, where the entire app's DWARF is present, and it may be massive. I think we should see if such odd offsets are a bug in LLVM, and if we can fix or prevent that. This does not emit "special" opcodes for debug lines. Those are purely an optimization, which I wanted to leave for later. (Even without them we decrease the size quite a lot, btw, as many lines have 0s in them...) This adds some testing that shows we can load and save fib2.c and fannkuch.cpp properly. The latter includes more than one function and has nontrivial code. To actually emit correct offsets a few minor fixes are done here: * Fix the code section location tracking during reading - the correct offset we care about is the body of the code section, not including the section declaration and size. * Fix wasm-stack debug line emitting. We need to update in BinaryInstWriter::visit(), that is, right before writing bytes for the instruction. That differs from * BinaryenIRWriter::visit which is a recursive function that also calls the children - so the offset there would be of the first child. For some reason that is correct with source maps, I don't understand why, but it's wrong for DWARF... * Print code section offsets in hex, to match other tools. Remove DWARFUpdate pass, which was useful for testing temporarily, but doesn't make sense now (it just updates without writing a binary). cc @yurydelendik
Thank you for the explanation @yurydelendik! I'm trying to parse DWARF line info for https://github.com/turbolent/w2c2 and had the same questions after reading the spec, i.e. where the start of the code section is, and if it is normal that sometimes line addresses point to the middle of instructions. Maybe it is worth to document this better in the spec? I'm still a bit confused about the last part, addresses pointing to the middle of instructions. Why not require alignment? |
Working on binaryen support for DWARF, I realized I don't know how to read the line info data. The main issues are:
the offset of an instruction relative within the Code section of the WebAssembly file
. Does "the Code section" include the entire code section, with the0xa0
byte to declare the code section and the LEB for the length? Or just the body, without those?In more detail here is what I am trying: I started with @yurydelendik 's fib2 sample,
and I build it with
LLVM's dwarfdump says this:
The first line there says address 2. If the offset is in the code section body, then that's in the middle of the function declaration, and not executable code. Is that expected?
The fifth line has address
0x1e
. Looking in the binary, though, the code section's body starts at0x2d
, and adding the offset we get0x4b
. That is the second out of 2 bytes of ani32.const -1
, which seems odd?fib2.clang.wasm.zip
The text was updated successfully, but these errors were encountered: