Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Compressed instruction encodings #72

Closed
stoklund opened this issue Apr 4, 2017 · 5 comments
Closed

Compressed instruction encodings #72

stoklund opened this issue Apr 4, 2017 · 5 comments
Labels
goal:optimize-speed Focus area: the speed of the code produced by Cranelift.

Comments

@stoklund
Copy link
Contributor

stoklund commented Apr 4, 2017

It is common for instruction set architectures to provide alternative encodings for common instructions that use less code space. The compressed encodings have additional constraints on the range of immediate operands and sometimes tighter register operand constraints too.

  • RISC-V has the RVC extension which provides a 16-bit encoding for some instructions.
  • ARM's 32-bit Thumb-2 instruction set has 16-bit versions of some instructions.
  • Intel instructions often have 8-bit and 32-bit immediate operand variants.
  • 64-bit Intel instructions can sometimes be encoded without a REX prefix if the register operands are right.
  • Intel iadd can be encoded as an lea or as a smaller add instruction which has tied register operands.

Representation

Cretonne represents encoding variants of the same instruction using separate encoding recipes and bits. This also means that each recipe maps to an exact instruction size in bytes. All of the valid encodings for an instruction appear in the encoding tables used by TargetIsa::encode(). The legalizer will always pick the first encoding with a matching predicate, but there can be more encodings after that.

We should add a TargetIsa method that makes it possible to enumerate all of the valid encodings for an instruction. Probably by returning an iterator.

Selecting the best encoding

Sometimes, the legalizer is able to select a compact encoding up front. This is the case with Intel 8-bit immediates when the immediate operand is known, for example. Often, the legalizer does not have enough information to determine if a compact encoding would be valid:

  • The immediate operand may encode a stack frame offset, and the location of stack slots has not yet been determined.
  • The immediate operand my be a branch displacement, and we don't know the exact position of EBB headers yet.
  • The compact encoding would constrain the register allocator too much. This goes for both RVC and Thumb-2 encodings as well as REX prefix omission for x86-64. It is better to start with the unconstrained encoding and then decay to the compact encoding after register allocation.

Instruction compression pass

The legalizer selects a compact encoding when it has enough information to do so, and it won't constrain the register allocator.

The instruction compression pass is a pure optimization pass — it can be skipped without violating correctness. It runs late when we have more information:

  • Registers have been assigned so the tighter register operand constraints of compact encodings can be checked.
  • The stack frame layout is complete, so load/store offsets are known for stack accesses.

Compressing instructions obviously changes code size, so the final offsets of EBB headers are not known, and so branch displacements are not yet known.

stoklund pushed a commit that referenced this issue Apr 5, 2017
Compute exact EBB header offsets and check that branches are in range.

Not implemented yet: Relax branches that are not in range.

Invoke the relax_branches() pass from the 'test binemit' file tests so
they can verify the proper encoding of branch instructions too.
@angusholder
Copy link
Contributor

@stoklund I'd like to get back into the project, could I start on the compressed RISC-V encodings?

@stoklund
Copy link
Contributor Author

Sure, go for it!

@stoklund
Copy link
Contributor Author

@angusholder has just landed compressed instruction support for the binemit test driver. The exact same mechanism should form the basis for the instruction compression pass.

@sunfishcode sunfishcode added the goal:optimize-speed Focus area: the speed of the code produced by Cranelift. label Nov 8, 2017
@sunfishcode
Copy link
Member

Another possible approach here would be to change the concept of a recipe such that recipes have a fixed maximum size, with their actual size being computed, considering their immediates and register assignments. That would mean we wouldn't need separate REX and non-REX versions of many x86-64 recipes, for example, and we wouldn't need a separate pass to update all the encodings after register allocation.

Having a separate pass has the advantage of making it easy to disable, for debugging, or for minimizing compile time in the case where we don't care about optimizing the encodings. However, having it integrated would make it easier for the register allocator to ask an instruction encoding what registers it prefers. And it might have slightly faster compile time for the case where we do want to optimize the encodings. So it's worth considering.

@sunfishcode
Copy link
Member

1bb06bf implements the shrink-instructions pass.

I decided not to do the idea above, as it's simpler to have a separate pass for now, though we could revisit this in the future.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
goal:optimize-speed Focus area: the speed of the code produced by Cranelift.
Projects
None yet
Development

No branches or pull requests

3 participants