Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Parse encodings #24

Closed
stoklund opened this issue Nov 5, 2016 · 15 comments
Closed

Parse encodings #24

stoklund opened this issue Nov 5, 2016 · 15 comments
Labels
E-easy Issues suitable for newcomers to investigate, including Rust newcomers!

Comments

@stoklund
Copy link
Contributor

stoklund commented Nov 5, 2016

The write_instruction() function will print out the encoding of an instruction if it has been set:

[R#0c]        v5 = iadd v1, v2

The parser should decode these annotations and fill out the Function::encodings table.

Since the encoding recipe (R) is ISA-dependent, we probably can't support encodings in a test file with multiple ISAs.

See the DisplayEncoding struct for details of the format.

@stoklund stoklund added the E-easy Issues suitable for newcomers to investigate, including Rust newcomers! label Nov 5, 2016
@angusholder
Copy link
Contributor

I've been reading the codebase and think I'd be able to tackle this

@angusholder
Copy link
Contributor

So the recipe system is currently only used with the RISC-V backend, am I right in thinking all the other backends are very much incomplete at this point?

@angusholder
Copy link
Contributor

This encoding would surely have to be variable length? One ISA instruction doesn't necessarily correspond to one Cretonne opcode, eg RISC-V doesn't have IaddCarry so you need a branch and second add.

@stoklund
Copy link
Contributor Author

Yes, at the moment only RISC-V has any encodings. The other ISAs are very incomplete. RISC-V is also very incomplete, just a little bit less.

The encoding that gets printed out like [R#0c] is a representation of an Encoding as defined in isa/encoding.rs It consists of a recipe and some bits:

pub struct Encoding {
    recipe: u16,
    bits: u16,
}

The recipe is printed out as a name, R in this case. The bits are printed in hexadecimal as the #0c part. The recipe names depend on the ISA, so they are represented as a u16 that indexes into the slice returned by TargetIsa::recipe_names(). (Also TargetIsa::recipe_constraints(), but that is only for the register allocator. These are the current RISC-V recipe names (from the generated encoding-riscv.rs):

pub static RECIPE_NAMES: [&'static str; 4] = [
    "R",
    "I",
    "Rshamt",
    "Iret",
];

The Encoding values assigned to instructions cont correspond directly to the final machine code encoding. Instead, each recipe maps to a function that can emit the machine code for an instruction given the following:

  • The InstructionData object with the Opcode and values of immediate operands.
  • The bits from the Encoding.
  • The ValueLoc (usually a physical register) that was assigned to each input operand and the results.

It is the job of the legalizer to make sure that every Cretonne instruction in use maps to a single ISA opcode. For example, IaddCarry, will be expanded on RISC-V using this transformation:

expand.legalize(
        (a, c) << iadd_carry(x, y, c_in),
        Rtl(
            (a1, c1) << iadd_cout(x, y),
            (a, c2) << iadd_cout(a1, c_in),
            c << bor(c1, c2)
        ))

@stoklund
Copy link
Contributor Author

I am working on the register allocator right now, and I just pushed an extension to the encoding notation. It may now include value locations for the instruction's results:

[R#0c,%x2]              v0 = iadd vx0, vx1
[Iret#19]               return_reg v0

The first line means that the result value v0 is assigned to ValueLoc::Reg(%x2). Register names are always prefixed with %. Stack slots are just printed as is:

[S#14,ss7]              v0 = spill v1

The value locations are optional, but if they are present, there should be exactly one per result value. Unassigned values are represented with -, just like unencoded instructions:

[-,-]              v0 = spill v1

Lexical tokens

This introduces two new kinds of tokens that I would like to be able to use more generally:

  • %rrr register names. The lexer should just recognize this as a "percent-quoted" identifier, where the rrr part can be any sequence of alphanumerical characters and _. It would be different from a normal identifier because any word is ok: %v0, %function, %0 would all be valid. I am considering using this syntax for function names too. Right now, you can't have a function called v0, which is crazy.
  • #xxxx hexadecimal bits. The lexer should accept any sequence of hex digits following a #. Besides the encoding bits, I think this will be useful for encoding arbitrary data in the future. Things like 512-bit AVX vector constants, for example.

I'd be happy to review and merge just a lexer patch. You don't have to do everything in one PR.

@angusholder
Copy link
Contributor

For the hexadecimal bits, do you think I should follow what you did with scan_number, that is to leave the literal unparsed until it reaches whoever wants it so they can reject it if it's too big?

@stoklund
Copy link
Contributor Author

Yes, that's a good idea. For example, a three-digit encoding is OK: [R#14d], but if it's a hexadecimal representation of a sequence of bytes (see #47), an odd number of digits should be rejected.

@angusholder
Copy link
Contributor

angusholder commented Feb 23, 2017

I'm naming them HexSequence and Name if that sounds alright. Given what you said in #47

With these changes, the parser should stop accepting unquoted identifiers as function names.

should we then aim to remove Identifier, and expect every alphanumeric sequence now to be a valid keyword?

@stoklund
Copy link
Contributor Author

stoklund commented Feb 23, 2017 via email

@angusholder
Copy link
Contributor

What do you think of adding the IsaSpec to the Parser or Context? parse_instruction() is going to need access to it to recognise the encoding strings. If I encounter an encoding at the start of an instruction and we've been given multiple IsaSpec's I assume that should be a parse error?

@stoklund
Copy link
Contributor Author

stoklund commented Feb 23, 2017 via email

@stoklund
Copy link
Contributor Author

stoklund commented Mar 8, 2017

This is almost done, but we still need to parse the value locations following the encoding:

[R#0c,%x2]              v1 = iadd vx0, vx1
[S#14,ss7]              v2 = spill v1

The register value location %x2 means that v1 is assigned to a ValueLoc::Reg(%x2) location. The stack slot ss7 means that v2 is assigned to a ValueLoc::Stack(ss7) location.

The value locations are stored in the locations map in the Function. Register names like %x2 can be translated to RegUnits like this:

let reginfo = isa.register_info();
let regunit = reginfo.parse_regunit("x2").unwrap();
let loc = ValueLoc::Reg(regunit);
*ctx.function.locations.ensure(result) = loc;

@angusholder
Copy link
Contributor

I'm working on this now. Should I make it a parse error if any value locations are specified when there isn't a unique isa?

@stoklund
Copy link
Contributor Author

stoklund commented Mar 9, 2017

No, I think both encodings and value locations can be ignored if there's no unique ISA. I can imagine cases where you want to cut-and-paste test cases with encodings.

It should be an error if the number of value locations doesn't match the number of results produced by the instruction. (Unless the number of value locations is 0, which is ok)

@stoklund
Copy link
Contributor Author

stoklund commented Mar 9, 2017

This was fixed by @angusholder

@stoklund stoklund closed this as completed Mar 9, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
E-easy Issues suitable for newcomers to investigate, including Rust newcomers!
Projects
None yet
Development

No branches or pull requests

2 participants