Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Branch resolution and relaxation #73

Closed
stoklund opened this issue Apr 4, 2017 · 2 comments
Closed

Branch resolution and relaxation #73

stoklund opened this issue Apr 4, 2017 · 2 comments

Comments

@stoklund
Copy link
Contributor

stoklund commented Apr 4, 2017

We need to know the exact location of the EBB headers before we can emit binary machine code for branches to those EBB headers. Branches have limited ranges, and extending the range changes the size of the branch, so a non-trivial algorithm is needed to compute the final offsets of all the EBBs.

Branch relaxation

The range of branches varies a lot by ISA, and there are sometimes multiple branch encodings with different ranges:

ISA Cond. branch Uncond. branch
Intel 256 B - 4 GB 256 B - 4 GB
RISC-V 512 B - 8 KB 4 KB - 2 MB
ARM 32 512 B - 64 MB 4 KB - 64 MB
ARM 64 64 KB - 2 MB 256 MB

The range is the span of code reachable by a branch that sits approximately in the middle of the span.

Branch relaxation is the process of selecting the smallest possible branch encoding, and making sure that all branch targets are in range. Since some conditional branches have shorter ranges than unconditional branches on RISC architectures, it is sometimes necessary to replace a conditional branch with a negated branch that skips over a jump instruction to the original branch destination:

    brz v1, ebb17

=>

    brnz v1, ebb23
    jump ebb17
ebb23:
    ...

The result of the branch relaxation pass should be a table of EBB offsets and chosen encodings for all branch instructions such that the displacements can be encoded. The instruction compression in #72 does something similar for non-branch instructions. The encoding tables can probably be shared.

Optimistic algorithm

A simple iterative algorithm goes like this:

  1. Start by selecting the smallest available encoding for all branch instructions. Compute the resulting EBB offsets using the optimistic branch encodings.
  2. Visit all branch instructions and check that their target is in range. If not, repair by using a more relaxed encoding or by replacing with the branch-over-jump pattern.
  3. Repeat until convergence.

This should arrive at an optimal result (given the fixed EBB order), but requires iteration. This is the algorithm LLVM uses, except for the branch-over-jump transformation which happens earlier in ARM-specific passes.

Pessimistic algorithm

  1. Start by selecting branch encodings with the largest available range. Compute the resulting EBB offsets.
  2. Visit all branch instructions and switch to a smaller encoding where possible. Also insert branch-over-jump patterns where necessary.
  3. Repeat if any branches grew, i.e. branch-over-jump patterns were created.

This only needs iteration when branch-over-jump patterns are needed, which should not be very often. It is not guaranteed to find an optimal solution, though.

Cretonne algorithm

When implementing branch relaxation in Cretonne, we have a few special concerns:

  • We don't have a handy list of branches or basic blocks. Cretonne uses extended basic blocks which can contain multiple internal conditional branches. The data flow graph can provide lists of branch instructions in the form of EBB predecessor sets. An alternative is a linear scan of all instructions.
  • Even though we haven't emitted any binary machine code yet, we do have exact per-instruction code size information. The encoding recipes attached to all instructions each correspond to a unique instruction byte size.
  • We need to generate a list of EBB header offsets for use by the following binary emission pass.

This suggests the following variation of the optimistic algorithm:

  1. Select the smallest-range encoding for branches during legalization. This relaxation algorithm will fix any out-of-range branches.
  2. Assign a zero offset to all EBB headers. The offset is the distance in bytes from the function entry to the EBB header.
  3. Scan through all instructions in the function in layout order. Keep track of the instruction offset by using the byte size of the attached encoding recipes.
  4. When visiting a branch instruction where the target EBB is not in range, relax the encoding. Don't change the encoding if the target EBB offset hasn't been computed yet.
  5. When visiting an EBB header, update its offset with the current instruction offset. If the EBB oftset changed, make a note to iterate again.
  6. Iterate from step 3 until convergence.

This algorithm will complete in one pass for functions with a single EBB (and so only backwards branches). It will usually complete in three passes for the general case:

  1. Compute approximate offsets for all EBBs and relax backward branches.
  2. Relax forward branches,
  3. Check that all branch targets are in range.

If no forward branches need relaxing, it will complete in two passes.

It would be possible to build an auxiliary table that summarizes the non-branch instructions as just their code size, but given typical branch density and how fast it is to look up code size from encoding recipes, it is probably not worth it.

@sorear
Copy link

sorear commented Apr 25, 2017

If it makes a difference, for riscv you will need to be able to relax the same branch more than once: c.jal (2 bytes, 4KiB) -> jal (4 bytes, 1MiB) -> auipc+jalr (8 bytes, 2GiB). Hopefully cretonne will never need to support functions larger than 2GiB but I cannot say the same thing about 1MiB functions; the Go regression test suite generates a function that large.

@Amanieu Amanieu mentioned this issue May 21, 2018
@sunfishcode
Copy link
Member

This is implemented.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants