|
| 1 | +# Instruction referencing for debug info |
| 2 | + |
| 3 | +This document explains how LLVM uses value tracking, or instruction |
| 4 | +referencing, to determine variable locations for debug info in the code |
| 5 | +generation stage of compilation. This content is aimed at those working on code |
| 6 | +generation targets and optimisation passes. It may also be of interest to anyone |
| 7 | +curious about low-level debug info handling. |
| 8 | + |
| 9 | +# Problem statement |
| 10 | + |
| 11 | +At the end of compilation, LLVM must produce a DWARF location list (or similar) |
| 12 | +describing what register or stack location a variable can be found in, for each |
| 13 | +instruction in that variable's lexical scope. We could track the virtual |
| 14 | +register that the variable resides in through compilation, however this is |
| 15 | +vulnerable to register optimisations during regalloc, and instruction |
| 16 | +movements. |
| 17 | + |
| 18 | +# Solution: instruction referencing |
| 19 | + |
| 20 | +Rather than identify the virtual register that a variable value resides in, |
| 21 | +instead in instruction referencing mode, LLVM refers to the machine instruction |
| 22 | +and operand position that the value is defined in. Consider the LLVM IR way of |
| 23 | +referring to instruction values: |
| 24 | + |
| 25 | + %2 = add i32 %0, %1 |
| 26 | + call void @llvm.dbg.value(metadata i32 %2, |
| 27 | + |
| 28 | +In LLVM IR, the IR Value is synonymous with the instruction that computes the |
| 29 | +value, to the extent that in memory a Value is a pointer to the computing |
| 30 | +instruction. Instruction referencing implements this relationship in the |
| 31 | +codegen backend of LLVM, after instruction selection. Consider the X86 assembly |
| 32 | +below and instruction referencing debug info, corresponding to the earlier |
| 33 | +LLVM IR: |
| 34 | + |
| 35 | + %2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1 |
| 36 | + DBG_INSTR_REF 1, 0, !123, !456, debug-location !789 |
| 37 | + |
| 38 | +While the function remains in SSA form, virtual register %2 is sufficient to |
| 39 | +identify the value computed by the instruction -- however the function |
| 40 | +eventually leaves SSA form, and register optimisations will obscure which |
| 41 | +register the desired value is in. Instead, a more consistent way of identifying |
| 42 | +the instruction's value is to refer to the MachineOperand where the value is |
| 43 | +defined: independently of which register is defined by that MachineOperand. In |
| 44 | +the code above, the DBG_INSTR_REF instruction refers to instruction number one, |
| 45 | +operand zero, while the ADD32rr has a debug-instr-number attribute attached |
| 46 | +indicating that it is instruction number one. |
| 47 | + |
| 48 | +De-coupling variable locations from registers avoids difficulties involving |
| 49 | +register allocation and optimisation, but requires additional instrumentation |
| 50 | +when the instructions are optimised instead. Optimisations that replace |
| 51 | +instructions with optimised versions that compute the same value must either |
| 52 | +preserve the instruction number, or record a substitution from the old |
| 53 | +instruction / operand number pair to the new instruction / operand pair -- see |
| 54 | +MachineFunction::substituteDebugValuesForInst. If debug info maintenance is not |
| 55 | +performed, or an instruction is eliminated as dead code, the variable location |
| 56 | +is safely dropped and marked "optimised out". The exception is instructions |
| 57 | +that are mutated rather than replaced, which always need debug info |
| 58 | +maintenance. |
| 59 | + |
| 60 | +# Register allocator considerations |
| 61 | + |
| 62 | +When the register allocator runs, debugging instructions do not directly refer |
| 63 | +to any virtual registers, and thus there is no need for expensive location |
| 64 | +maintenance during regalloc (i.e., LiveDebugVariables). Debug instructions are |
| 65 | +unlinked from the function, then linked back in after register allocation |
| 66 | +completes. |
| 67 | + |
| 68 | +The exception is PHI instructions: these become implicit definitions at control |
| 69 | +flow merges once regalloc finishes, and any debug numbers attached to PHI |
| 70 | +instructions are lost. To circumvent this, debug numbers of PHIs are recorded |
| 71 | +at the start of register allocation (phi-node-elimination), then DBG_PHI |
| 72 | +instructions are inserted after regalloc finishes. This requires some |
| 73 | +maintenance of which register a variable is located in during regalloc, but at |
| 74 | +single positions (block entry points) rather than ranges of instructions. |
| 75 | + |
| 76 | +An example, before regalloc: |
| 77 | + |
| 78 | + bb.2: |
| 79 | + %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1 |
| 80 | + |
| 81 | +After: |
| 82 | + |
| 83 | + bb.2: |
| 84 | + DBG_PHI $rax, 1 |
| 85 | + |
| 86 | +# LiveDebugValues |
| 87 | + |
| 88 | +After optimisations and code layout complete, information about variable |
| 89 | +values must be translated into variable locations, i.e. registers and stack |
| 90 | +slots. This is performed in the [LiveDebugValues pass][LiveDebugValues], where |
| 91 | +the debug instructions and machine code are separated out into two independent |
| 92 | +functions: |
| 93 | + * One that assigns values to variable names, |
| 94 | + * One that assigns values to machine registers and stack slots. |
| 95 | + |
| 96 | +LLVM's existing SSA tools are used to place PHIs for each function, between |
| 97 | +variable values and the values contained in machine locations, with value |
| 98 | +propagation eliminating any un-necessary PHIs. The two can then be joined up |
| 99 | +to map variables to values, then values to locations, for each instruction in |
| 100 | +the function. |
| 101 | + |
| 102 | +Key to this process is being able to identify the movement of values between |
| 103 | +registers and stack locations, so that the location of values can be preserved |
| 104 | +for the full time that they are resident in the machine. |
| 105 | + |
| 106 | +# Required target support and transition guide |
| 107 | + |
| 108 | +Instruction referencing will work on any target, but likely with poor coverage. |
| 109 | +Supporting instruction referencing well requires: |
| 110 | + * Target hooks to be implemented to allow LiveDebugValues to follow values through the machine, |
| 111 | + * Target-specific optimisations to be instrumented, to preserve instruction numbers. |
| 112 | + |
| 113 | +## Target hooks |
| 114 | + |
| 115 | +TargetInstrInfo::isCopyInstrImpl must be implemented to recognise any |
| 116 | +instructions that are copy-like -- LiveDebugValues uses this to identify when |
| 117 | +values move between registers. |
| 118 | + |
| 119 | +TargetInstrInfo::isLoadFromStackSlotPostFE and |
| 120 | +TargetInstrInfo::isStoreToStackSlotPostFE are needed to identify spill and |
| 121 | +restore instructions. Each should return the destination or source register |
| 122 | +respectively. LiveDebugValues will track the movement of a value from / to |
| 123 | +the stack slot. In addition, any instruction that writes to a stack spill |
| 124 | +should have a MachineMemoryOperand attached, so that LiveDebugValues can |
| 125 | +recognise that a slot has been clobbered. |
| 126 | + |
| 127 | +## Target-specific optimisation instrumentation |
| 128 | + |
| 129 | +Optimisations come in two flavours: those that mutate a MachineInstr to make |
| 130 | +it do something different, and those that create a new instruction to replace |
| 131 | +the operation of the old. |
| 132 | + |
| 133 | +The former _must_ be instrumented -- the relevant question is whether any |
| 134 | +register def in any operand will produce a different value, as a result of the |
| 135 | +mutation. If the answer is yes, then there is a risk that a DBG_INSTR_REF |
| 136 | +instruction referring to that operand will end up assigning the different |
| 137 | +value to a variable, presenting the debugging developer with an unexpected |
| 138 | +variable value. In such scenarios, call MachineInstr::dropDebugNumber() on the |
| 139 | +mutated instruction to erase its instruction number. Any DBG_INSTR_REF |
| 140 | +referring to it will produce an empty variable location instead, that appears |
| 141 | +as "optimised out" in the debugger. |
| 142 | + |
| 143 | +For the latter flavour of optimisation, to increase coverage you should record |
| 144 | +an instruction number substitution: a mapping from the old instruction number / |
| 145 | +operand pair to new instruction number / operand pair. Consider if we replace |
| 146 | +a three-address add instruction with a two-address add: |
| 147 | + |
| 148 | + %2:gr32 = ADD32rr %0, %1, debug-instr-number 1 |
| 149 | + |
| 150 | +becomes |
| 151 | + |
| 152 | + %2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2 |
| 153 | + |
| 154 | +With a substitution from "instruction number 1 operand 0" to "instruction number |
| 155 | +2 operand 0" recorded in the MachineFunction. In LiveDebugValues, DBG_INSTR_REFs |
| 156 | +will be mapped through the substitution table to find the most recent |
| 157 | +instruction number / operand number of the value it refers to. |
| 158 | + |
| 159 | +Use MachineFunction::substituteDebugValuesForInst to automatically produce |
| 160 | +substitutions between an old and new instruction. It assumes that any operand |
| 161 | +that is a def in the old instruction is a def in the new instruction at the |
| 162 | +same operand position. This works most of the time, for example in the example |
| 163 | +above. |
| 164 | + |
| 165 | +If operand numbers do not line up between the old and new instruction, use |
| 166 | +MachineInstr::getDebugInstrNum to acquire the instruction number for the new |
| 167 | +instruction, and MachineFunction::makeDebugValueSubstitution to record the |
| 168 | +mapping between register definitions in the old and new instructions. If some |
| 169 | +values computed by the old instruction are no longer computed by the new |
| 170 | +instruction, record no substitution -- LiveDebugValues will safely drop the |
| 171 | +now unavailable variable value. |
| 172 | + |
| 173 | +Should your target clone instructions, much the same as the TailDuplicator |
| 174 | +optimisation pass, do not attempt to preserve the instruction numbers or |
| 175 | +record any substitutions. MachineFunction::CloneMachineInstr should drop the |
| 176 | +instruction number of any cloned instruction, to avoid duplicate numbers |
| 177 | +appearing to LiveDebugValues. Dealing with duplicated instructions is a |
| 178 | +natural extension to instruction referencing that's currently unimplemented. |
| 179 | + |
| 180 | +[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations |
0 commit comments