You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.
We need to know the exact location of the EBB headers before we can emit binary machine code for branches to those EBB headers. Branches have limited ranges, and extending the range changes the size of the branch, so a non-trivial algorithm is needed to compute the final offsets of all the EBBs.
Branch relaxation
The range of branches varies a lot by ISA, and there are sometimes multiple branch encodings with different ranges:
ISA
Cond. branch
Uncond. branch
Intel
256 B - 4 GB
256 B - 4 GB
RISC-V
512 B - 8 KB
4 KB - 2 MB
ARM 32
512 B - 64 MB
4 KB - 64 MB
ARM 64
64 KB - 2 MB
256 MB
The range is the span of code reachable by a branch that sits approximately in the middle of the span.
Branch relaxation is the process of selecting the smallest possible branch encoding, and making sure that all branch targets are in range. Since some conditional branches have shorter ranges than unconditional branches on RISC architectures, it is sometimes necessary to replace a conditional branch with a negated branch that skips over a jump instruction to the original branch destination:
The result of the branch relaxation pass should be a table of EBB offsets and chosen encodings for all branch instructions such that the displacements can be encoded. The instruction compression in #72 does something similar for non-branch instructions. The encoding tables can probably be shared.
Optimistic algorithm
A simple iterative algorithm goes like this:
Start by selecting the smallest available encoding for all branch instructions. Compute the resulting EBB offsets using the optimistic branch encodings.
Visit all branch instructions and check that their target is in range. If not, repair by using a more relaxed encoding or by replacing with the branch-over-jump pattern.
Repeat until convergence.
This should arrive at an optimal result (given the fixed EBB order), but requires iteration. This is the algorithm LLVM uses, except for the branch-over-jump transformation which happens earlier in ARM-specific passes.
Pessimistic algorithm
Start by selecting branch encodings with the largest available range. Compute the resulting EBB offsets.
Visit all branch instructions and switch to a smaller encoding where possible. Also insert branch-over-jump patterns where necessary.
Repeat if any branches grew, i.e. branch-over-jump patterns were created.
This only needs iteration when branch-over-jump patterns are needed, which should not be very often. It is not guaranteed to find an optimal solution, though.
Cretonne algorithm
When implementing branch relaxation in Cretonne, we have a few special concerns:
We don't have a handy list of branches or basic blocks. Cretonne uses extended basic blocks which can contain multiple internal conditional branches. The data flow graph can provide lists of branch instructions in the form of EBB predecessor sets. An alternative is a linear scan of all instructions.
Even though we haven't emitted any binary machine code yet, we do have exact per-instruction code size information. The encoding recipes attached to all instructions each correspond to a unique instruction byte size.
We need to generate a list of EBB header offsets for use by the following binary emission pass.
This suggests the following variation of the optimistic algorithm:
Select the smallest-range encoding for branches during legalization. This relaxation algorithm will fix any out-of-range branches.
Assign a zero offset to all EBB headers. The offset is the distance in bytes from the function entry to the EBB header.
Scan through all instructions in the function in layout order. Keep track of the instruction offset by using the byte size of the attached encoding recipes.
When visiting a branch instruction where the target EBB is not in range, relax the encoding. Don't change the encoding if the target EBB offset hasn't been computed yet.
When visiting an EBB header, update its offset with the current instruction offset. If the EBB oftset changed, make a note to iterate again.
Iterate from step 3 until convergence.
This algorithm will complete in one pass for functions with a single EBB (and so only backwards branches). It will usually complete in three passes for the general case:
Compute approximate offsets for all EBBs and relax backward branches.
Relax forward branches,
Check that all branch targets are in range.
If no forward branches need relaxing, it will complete in two passes.
It would be possible to build an auxiliary table that summarizes the non-branch instructions as just their code size, but given typical branch density and how fast it is to look up code size from encoding recipes, it is probably not worth it.
The text was updated successfully, but these errors were encountered:
If it makes a difference, for riscv you will need to be able to relax the same branch more than once: c.jal (2 bytes, 4KiB) -> jal (4 bytes, 1MiB) -> auipc+jalr (8 bytes, 2GiB). Hopefully cretonne will never need to support functions larger than 2GiB but I cannot say the same thing about 1MiB functions; the Go regression test suite generates a function that large.
We need to know the exact location of the EBB headers before we can emit binary machine code for branches to those EBB headers. Branches have limited ranges, and extending the range changes the size of the branch, so a non-trivial algorithm is needed to compute the final offsets of all the EBBs.
Branch relaxation
The range of branches varies a lot by ISA, and there are sometimes multiple branch encodings with different ranges:
The range is the span of code reachable by a branch that sits approximately in the middle of the span.
Branch relaxation is the process of selecting the smallest possible branch encoding, and making sure that all branch targets are in range. Since some conditional branches have shorter ranges than unconditional branches on RISC architectures, it is sometimes necessary to replace a conditional branch with a negated branch that skips over a
jump
instruction to the original branch destination:The result of the branch relaxation pass should be a table of EBB offsets and chosen encodings for all branch instructions such that the displacements can be encoded. The instruction compression in #72 does something similar for non-branch instructions. The encoding tables can probably be shared.
Optimistic algorithm
A simple iterative algorithm goes like this:
This should arrive at an optimal result (given the fixed EBB order), but requires iteration. This is the algorithm LLVM uses, except for the branch-over-jump transformation which happens earlier in ARM-specific passes.
Pessimistic algorithm
This only needs iteration when branch-over-jump patterns are needed, which should not be very often. It is not guaranteed to find an optimal solution, though.
Cretonne algorithm
When implementing branch relaxation in Cretonne, we have a few special concerns:
This suggests the following variation of the optimistic algorithm:
This algorithm will complete in one pass for functions with a single EBB (and so only backwards branches). It will usually complete in three passes for the general case:
If no forward branches need relaxing, it will complete in two passes.
It would be possible to build an auxiliary table that summarizes the non-branch instructions as just their code size, but given typical branch density and how fast it is to look up code size from encoding recipes, it is probably not worth it.
The text was updated successfully, but these errors were encountered: