[DebugInfo][NFC] Add instr-ref documentation, migration guide

jryans · jmorse · jryans · commit 7a1d5ef703f6 · 2022-05-20T14:13:46.000+01:00
This used to be D102158, but all the code it describes got re-written, so I figured I'd take another shot at documenting the new instruction referencing variable locations, this time from a higher level. Happily there's no longer any need to describe LiveDebugValues in any detail seeing how it's all SSA-based now. Probably the most important part is the explanation of what targets need to do to support instruction referencing. The list is small, mostly because there's nothing especially complicated that targets need to do: just instrument their target-specific optimisations and implement the stack spill/restore recognition target hooks. This is a small amount of text (which is a virtue), I'm extremely happy to expand on anything. Differential Revision: https://reviews.llvm.org/D113586 Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>
diff --git a/llvm/docs/InstrRefDebugInfo.md b/llvm/docs/InstrRefDebugInfo.md
@@ -0,0 +1,180 @@
+# Instruction referencing for debug info
+
+This document explains how LLVM uses value tracking, or instruction
+referencing, to determine variable locations for debug info in the code
+generation stage of compilation. This content is aimed at those working on code
+generation targets and optimisation passes. It may also be of interest to anyone
+curious about low-level debug info handling.
+
+# Problem statement
+
+At the end of compilation, LLVM must produce a DWARF location list (or similar)
+describing what register or stack location a variable can be found in, for each
+instruction in that variable's lexical scope. We could track the virtual
+register that the variable resides in through compilation, however this is
+vulnerable to register optimisations during regalloc, and instruction
+movements.
+
+# Solution: instruction referencing
+
+Rather than identify the virtual register that a variable value resides in,
+instead in instruction referencing mode, LLVM refers to the machine instruction
+and operand position that the value is defined in. Consider the LLVM IR way of
+referring to instruction values:
+
+  %2 = add i32 %0, %1
+  call void @llvm.dbg.value(metadata i32 %2,
+
+In LLVM IR, the IR Value is synonymous with the instruction that computes the
+value, to the extent that in memory a Value is a pointer to the computing
+instruction. Instruction referencing implements this relationship in the
+codegen backend of LLVM, after instruction selection. Consider the X86 assembly
+below and instruction referencing debug info, corresponding to the earlier
+LLVM IR:
+
+  %2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
+  DBG_INSTR_REF 1, 0, !123, !456, debug-location !789
+
+While the function remains in SSA form, virtual register %2 is sufficient to
+identify the value computed by the instruction -- however the function
+eventually leaves SSA form, and register optimisations will obscure which
+register the desired value is in. Instead, a more consistent way of identifying
+the instruction's value is to refer to the MachineOperand where the value is
+defined: independently of which register is defined by that MachineOperand. In
+the code above, the DBG_INSTR_REF instruction refers to instruction number one,
+operand zero, while the ADD32rr has a debug-instr-number attribute attached
+indicating that it is instruction number one.
+
+De-coupling variable locations from registers avoids difficulties involving
+register allocation and optimisation, but requires additional instrumentation
+when the instructions are optimised instead. Optimisations that replace
+instructions with optimised versions that compute the same value must either
+preserve the instruction number, or record a substitution from the old
+instruction / operand number pair to the new instruction / operand pair -- see
+MachineFunction::substituteDebugValuesForInst. If debug info maintenance is not
+performed, or an instruction is eliminated as dead code, the variable location
+is safely dropped and marked "optimised out". The exception is instructions
+that are mutated rather than replaced, which always need debug info
+maintenance.
+
+# Register allocator considerations
+
+When the register allocator runs, debugging instructions do not directly refer
+to any virtual registers, and thus there is no need for expensive location
+maintenance during regalloc (i.e., LiveDebugVariables). Debug instructions are
+unlinked from the function, then linked back in after register allocation
+completes.
+
+The exception is PHI instructions: these become implicit definitions at control
+flow merges once regalloc finishes, and any debug numbers attached to PHI
+instructions are lost. To circumvent this, debug numbers of PHIs are recorded
+at the start of register allocation (phi-node-elimination), then DBG_PHI
+instructions are inserted after regalloc finishes. This requires some
+maintenance of which register a variable is located in during regalloc, but at
+single positions (block entry points) rather than ranges of instructions.
+
+An example, before regalloc:
+
+  bb.2:
+    %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
+
+After:
+
+  bb.2:
+    DBG_PHI $rax, 1
+
+# LiveDebugValues
+
+After optimisations and code layout complete, information about variable
+values must be translated into variable locations, i.e. registers and stack
+slots. This is performed in the [LiveDebugValues pass][LiveDebugValues], where
+the debug instructions and machine code are separated out into two independent
+functions:
+ * One that assigns values to variable names,
+ * One that assigns values to machine registers and stack slots.
+
+LLVM's existing SSA tools are used to place PHIs for each function, between
+variable values and the values contained in machine locations, with value
+propagation eliminating any un-necessary PHIs. The two can then be joined up
+to map variables to values, then values to locations, for each instruction in
+the function.
+
+Key to this process is being able to identify the movement of values between
+registers and stack locations, so that the location of values can be preserved
+for the full time that they are resident in the machine.
+
+# Required target support and transition guide
+
+Instruction referencing will work on any target, but likely with poor coverage.
+Supporting instruction referencing well requires:
+ * Target hooks to be implemented to allow LiveDebugValues to follow values through the machine,
+ * Target-specific optimisations to be instrumented, to preserve instruction numbers.
+
+## Target hooks
+
+TargetInstrInfo::isCopyInstrImpl must be implemented to recognise any
+instructions that are copy-like -- LiveDebugValues uses this to identify when
+values move between registers.
+
+TargetInstrInfo::isLoadFromStackSlotPostFE and
+TargetInstrInfo::isStoreToStackSlotPostFE are needed to identify spill and
+restore instructions. Each should return the destination or source register
+respectively. LiveDebugValues will track the movement of a value from / to
+the stack slot. In addition, any instruction that writes to a stack spill
+should have a MachineMemoryOperand attached, so that LiveDebugValues can
+recognise that a slot has been clobbered.
+
+## Target-specific optimisation instrumentation
+
+Optimisations come in two flavours: those that mutate a MachineInstr to make
+it do something different, and those that create a new instruction to replace
+the operation of the old.
+
+The former _must_ be instrumented -- the relevant question is whether any
+register def in any operand will produce a different value, as a result of the
+mutation. If the answer is yes, then there is a risk that a DBG_INSTR_REF
+instruction referring to that operand will end up assigning the different
+value to a variable, presenting the debugging developer with an unexpected
+variable value. In such scenarios, call MachineInstr::dropDebugNumber() on the
+mutated instruction to erase its instruction number. Any DBG_INSTR_REF
+referring to it will produce an empty variable location instead, that appears
+as "optimised out" in the debugger.
+
+For the latter flavour of optimisation, to increase coverage you should record
+an instruction number substitution: a mapping from the old instruction number /
+operand pair to new instruction number / operand pair. Consider if we replace
+a three-address add instruction with a two-address add:
+
+  %2:gr32 = ADD32rr %0, %1, debug-instr-number 1
+
+becomes
+
+  %2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
+
+With a substitution from "instruction number 1 operand 0" to "instruction number
+2 operand 0" recorded in the MachineFunction. In LiveDebugValues, DBG_INSTR_REFs
+will be mapped through the substitution table to find the most recent
+instruction number / operand number of the value it refers to.
+
+Use MachineFunction::substituteDebugValuesForInst to automatically produce
+substitutions between an old and new instruction. It assumes that any operand
+that is a def in the old instruction is a def in the new instruction at the
+same operand position. This works most of the time, for example in the example
+above.
+
+If operand numbers do not line up between the old and new instruction, use
+MachineInstr::getDebugInstrNum to acquire the instruction number for the new
+instruction, and MachineFunction::makeDebugValueSubstitution to record the
+mapping between register definitions in the old and new instructions. If some
+values computed by the old instruction are no longer computed by the new
+instruction, record no substitution -- LiveDebugValues will safely drop the
+now unavailable variable value.
+
+Should your target clone instructions, much the same as the TailDuplicator
+optimisation pass, do not attempt to preserve the instruction numbers or
+record any substitutions. MachineFunction::CloneMachineInstr should drop the
+instruction number of any cloned instruction, to avoid duplicate numbers
+appearing to LiveDebugValues. Dealing with duplicated instructions is a
+natural extension to instruction referencing that's currently unimplemented.
+
+[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations
diff --git a/llvm/docs/MIRLangRef.rst b/llvm/docs/MIRLangRef.rst
@@ -911,5 +911,6 @@ The additional operands to ``DBG_INSTR_REF`` are identical to ``DBG_VALUE``,
 and the ``DBG_INSTR_REF`` s position records where the variable takes on the
 designated value in the same way.
 
-More information about how these constructs are used will appear on the source
-level debugging page in due course, see also :doc:`SourceLevelDebugging` and :doc:`HowToUpdateDebugInfo`.
+More information about how these constructs are used is available in
+:doc:`InstrRefDebugInfo`. The related documents :doc:`SourceLevelDebugging` and
+:doc:`HowToUpdateDebugInfo` may be useful as well.
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
@@ -38,6 +38,7 @@ intermediate LLVM representation.
    HowToCrossCompileBuiltinsOnArm
    HowToCrossCompileLLVM
    HowToUpdateDebugInfo
+   InstrRefDebugInfo
    LinkTimeOptimization
    LoopTerminology
    MarkdownQuickstartTemplate
@@ -160,6 +161,15 @@ Optimizations
    This document describes the design and philosophy behind the LLVM
    source-level debugger.
 
+:doc:`How to Update Debug Info <HowToUpdateDebugInfo>`
+   This document specifies how to correctly update debug info in various kinds
+   of code transformations.
+
+:doc:`InstrRefDebugInfo`
+   This document explains how LLVM uses value tracking, or instruction
+   referencing, to determine variable locations for debug info in the final
+   stages of compilation.
+
 Code Generation
 ---------------