Skip to content

Commit 7a1d5ef

Browse files
jryansjmorse
andcommitted
[DebugInfo][NFC] Add instr-ref documentation, migration guide
This used to be D102158, but all the code it describes got re-written, so I figured I'd take another shot at documenting the new instruction referencing variable locations, this time from a higher level. Happily there's no longer any need to describe LiveDebugValues in any detail seeing how it's all SSA-based now. Probably the most important part is the explanation of what targets need to do to support instruction referencing. The list is small, mostly because there's nothing especially complicated that targets need to do: just instrument their target-specific optimisations and implement the stack spill/restore recognition target hooks. This is a small amount of text (which is a virtue), I'm extremely happy to expand on anything. Differential Revision: https://reviews.llvm.org/D113586 Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>
1 parent f0071d4 commit 7a1d5ef

File tree

3 files changed

+193
-2
lines changed

3 files changed

+193
-2
lines changed

llvm/docs/InstrRefDebugInfo.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Instruction referencing for debug info
2+
3+
This document explains how LLVM uses value tracking, or instruction
4+
referencing, to determine variable locations for debug info in the code
5+
generation stage of compilation. This content is aimed at those working on code
6+
generation targets and optimisation passes. It may also be of interest to anyone
7+
curious about low-level debug info handling.
8+
9+
# Problem statement
10+
11+
At the end of compilation, LLVM must produce a DWARF location list (or similar)
12+
describing what register or stack location a variable can be found in, for each
13+
instruction in that variable's lexical scope. We could track the virtual
14+
register that the variable resides in through compilation, however this is
15+
vulnerable to register optimisations during regalloc, and instruction
16+
movements.
17+
18+
# Solution: instruction referencing
19+
20+
Rather than identify the virtual register that a variable value resides in,
21+
instead in instruction referencing mode, LLVM refers to the machine instruction
22+
and operand position that the value is defined in. Consider the LLVM IR way of
23+
referring to instruction values:
24+
25+
%2 = add i32 %0, %1
26+
call void @llvm.dbg.value(metadata i32 %2,
27+
28+
In LLVM IR, the IR Value is synonymous with the instruction that computes the
29+
value, to the extent that in memory a Value is a pointer to the computing
30+
instruction. Instruction referencing implements this relationship in the
31+
codegen backend of LLVM, after instruction selection. Consider the X86 assembly
32+
below and instruction referencing debug info, corresponding to the earlier
33+
LLVM IR:
34+
35+
%2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
36+
DBG_INSTR_REF 1, 0, !123, !456, debug-location !789
37+
38+
While the function remains in SSA form, virtual register %2 is sufficient to
39+
identify the value computed by the instruction -- however the function
40+
eventually leaves SSA form, and register optimisations will obscure which
41+
register the desired value is in. Instead, a more consistent way of identifying
42+
the instruction's value is to refer to the MachineOperand where the value is
43+
defined: independently of which register is defined by that MachineOperand. In
44+
the code above, the DBG_INSTR_REF instruction refers to instruction number one,
45+
operand zero, while the ADD32rr has a debug-instr-number attribute attached
46+
indicating that it is instruction number one.
47+
48+
De-coupling variable locations from registers avoids difficulties involving
49+
register allocation and optimisation, but requires additional instrumentation
50+
when the instructions are optimised instead. Optimisations that replace
51+
instructions with optimised versions that compute the same value must either
52+
preserve the instruction number, or record a substitution from the old
53+
instruction / operand number pair to the new instruction / operand pair -- see
54+
MachineFunction::substituteDebugValuesForInst. If debug info maintenance is not
55+
performed, or an instruction is eliminated as dead code, the variable location
56+
is safely dropped and marked "optimised out". The exception is instructions
57+
that are mutated rather than replaced, which always need debug info
58+
maintenance.
59+
60+
# Register allocator considerations
61+
62+
When the register allocator runs, debugging instructions do not directly refer
63+
to any virtual registers, and thus there is no need for expensive location
64+
maintenance during regalloc (i.e., LiveDebugVariables). Debug instructions are
65+
unlinked from the function, then linked back in after register allocation
66+
completes.
67+
68+
The exception is PHI instructions: these become implicit definitions at control
69+
flow merges once regalloc finishes, and any debug numbers attached to PHI
70+
instructions are lost. To circumvent this, debug numbers of PHIs are recorded
71+
at the start of register allocation (phi-node-elimination), then DBG_PHI
72+
instructions are inserted after regalloc finishes. This requires some
73+
maintenance of which register a variable is located in during regalloc, but at
74+
single positions (block entry points) rather than ranges of instructions.
75+
76+
An example, before regalloc:
77+
78+
bb.2:
79+
%2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
80+
81+
After:
82+
83+
bb.2:
84+
DBG_PHI $rax, 1
85+
86+
# LiveDebugValues
87+
88+
After optimisations and code layout complete, information about variable
89+
values must be translated into variable locations, i.e. registers and stack
90+
slots. This is performed in the [LiveDebugValues pass][LiveDebugValues], where
91+
the debug instructions and machine code are separated out into two independent
92+
functions:
93+
* One that assigns values to variable names,
94+
* One that assigns values to machine registers and stack slots.
95+
96+
LLVM's existing SSA tools are used to place PHIs for each function, between
97+
variable values and the values contained in machine locations, with value
98+
propagation eliminating any un-necessary PHIs. The two can then be joined up
99+
to map variables to values, then values to locations, for each instruction in
100+
the function.
101+
102+
Key to this process is being able to identify the movement of values between
103+
registers and stack locations, so that the location of values can be preserved
104+
for the full time that they are resident in the machine.
105+
106+
# Required target support and transition guide
107+
108+
Instruction referencing will work on any target, but likely with poor coverage.
109+
Supporting instruction referencing well requires:
110+
* Target hooks to be implemented to allow LiveDebugValues to follow values through the machine,
111+
* Target-specific optimisations to be instrumented, to preserve instruction numbers.
112+
113+
## Target hooks
114+
115+
TargetInstrInfo::isCopyInstrImpl must be implemented to recognise any
116+
instructions that are copy-like -- LiveDebugValues uses this to identify when
117+
values move between registers.
118+
119+
TargetInstrInfo::isLoadFromStackSlotPostFE and
120+
TargetInstrInfo::isStoreToStackSlotPostFE are needed to identify spill and
121+
restore instructions. Each should return the destination or source register
122+
respectively. LiveDebugValues will track the movement of a value from / to
123+
the stack slot. In addition, any instruction that writes to a stack spill
124+
should have a MachineMemoryOperand attached, so that LiveDebugValues can
125+
recognise that a slot has been clobbered.
126+
127+
## Target-specific optimisation instrumentation
128+
129+
Optimisations come in two flavours: those that mutate a MachineInstr to make
130+
it do something different, and those that create a new instruction to replace
131+
the operation of the old.
132+
133+
The former _must_ be instrumented -- the relevant question is whether any
134+
register def in any operand will produce a different value, as a result of the
135+
mutation. If the answer is yes, then there is a risk that a DBG_INSTR_REF
136+
instruction referring to that operand will end up assigning the different
137+
value to a variable, presenting the debugging developer with an unexpected
138+
variable value. In such scenarios, call MachineInstr::dropDebugNumber() on the
139+
mutated instruction to erase its instruction number. Any DBG_INSTR_REF
140+
referring to it will produce an empty variable location instead, that appears
141+
as "optimised out" in the debugger.
142+
143+
For the latter flavour of optimisation, to increase coverage you should record
144+
an instruction number substitution: a mapping from the old instruction number /
145+
operand pair to new instruction number / operand pair. Consider if we replace
146+
a three-address add instruction with a two-address add:
147+
148+
%2:gr32 = ADD32rr %0, %1, debug-instr-number 1
149+
150+
becomes
151+
152+
%2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
153+
154+
With a substitution from "instruction number 1 operand 0" to "instruction number
155+
2 operand 0" recorded in the MachineFunction. In LiveDebugValues, DBG_INSTR_REFs
156+
will be mapped through the substitution table to find the most recent
157+
instruction number / operand number of the value it refers to.
158+
159+
Use MachineFunction::substituteDebugValuesForInst to automatically produce
160+
substitutions between an old and new instruction. It assumes that any operand
161+
that is a def in the old instruction is a def in the new instruction at the
162+
same operand position. This works most of the time, for example in the example
163+
above.
164+
165+
If operand numbers do not line up between the old and new instruction, use
166+
MachineInstr::getDebugInstrNum to acquire the instruction number for the new
167+
instruction, and MachineFunction::makeDebugValueSubstitution to record the
168+
mapping between register definitions in the old and new instructions. If some
169+
values computed by the old instruction are no longer computed by the new
170+
instruction, record no substitution -- LiveDebugValues will safely drop the
171+
now unavailable variable value.
172+
173+
Should your target clone instructions, much the same as the TailDuplicator
174+
optimisation pass, do not attempt to preserve the instruction numbers or
175+
record any substitutions. MachineFunction::CloneMachineInstr should drop the
176+
instruction number of any cloned instruction, to avoid duplicate numbers
177+
appearing to LiveDebugValues. Dealing with duplicated instructions is a
178+
natural extension to instruction referencing that's currently unimplemented.
179+
180+
[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations

llvm/docs/MIRLangRef.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -911,5 +911,6 @@ The additional operands to ``DBG_INSTR_REF`` are identical to ``DBG_VALUE``,
911911
and the ``DBG_INSTR_REF`` s position records where the variable takes on the
912912
designated value in the same way.
913913

914-
More information about how these constructs are used will appear on the source
915-
level debugging page in due course, see also :doc:`SourceLevelDebugging` and :doc:`HowToUpdateDebugInfo`.
914+
More information about how these constructs are used is available in
915+
:doc:`InstrRefDebugInfo`. The related documents :doc:`SourceLevelDebugging` and
916+
:doc:`HowToUpdateDebugInfo` may be useful as well.

llvm/docs/UserGuides.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ intermediate LLVM representation.
3838
HowToCrossCompileBuiltinsOnArm
3939
HowToCrossCompileLLVM
4040
HowToUpdateDebugInfo
41+
InstrRefDebugInfo
4142
LinkTimeOptimization
4243
LoopTerminology
4344
MarkdownQuickstartTemplate
@@ -160,6 +161,15 @@ Optimizations
160161
This document describes the design and philosophy behind the LLVM
161162
source-level debugger.
162163

164+
:doc:`How to Update Debug Info <HowToUpdateDebugInfo>`
165+
This document specifies how to correctly update debug info in various kinds
166+
of code transformations.
167+
168+
:doc:`InstrRefDebugInfo`
169+
This document explains how LLVM uses value tracking, or instruction
170+
referencing, to determine variable locations for debug info in the final
171+
stages of compilation.
172+
163173
Code Generation
164174
---------------
165175

0 commit comments

Comments
 (0)