Parse encodings #24

stoklund · 2016-11-05T00:25:57Z

The write_instruction() function will print out the encoding of an instruction if it has been set:

[R#0c]        v5 = iadd v1, v2

The parser should decode these annotations and fill out the Function::encodings table.

Since the encoding recipe (R) is ISA-dependent, we probably can't support encodings in a test file with multiple ISAs.

See the DisplayEncoding struct for details of the format.

The text was updated successfully, but these errors were encountered:

angusholder · 2017-02-22T22:17:17Z

I've been reading the codebase and think I'd be able to tackle this

angusholder · 2017-02-22T22:19:34Z

So the recipe system is currently only used with the RISC-V backend, am I right in thinking all the other backends are very much incomplete at this point?

angusholder · 2017-02-22T22:30:47Z

This encoding would surely have to be variable length? One ISA instruction doesn't necessarily correspond to one Cretonne opcode, eg RISC-V doesn't have IaddCarry so you need a branch and second add.

stoklund · 2017-02-22T23:11:03Z

Yes, at the moment only RISC-V has any encodings. The other ISAs are very incomplete. RISC-V is also very incomplete, just a little bit less.

The encoding that gets printed out like [R#0c] is a representation of an Encoding as defined in isa/encoding.rs It consists of a recipe and some bits:

pub struct Encoding {
    recipe: u16,
    bits: u16,
}

The recipe is printed out as a name, R in this case. The bits are printed in hexadecimal as the #0c part. The recipe names depend on the ISA, so they are represented as a u16 that indexes into the slice returned by TargetIsa::recipe_names(). (Also TargetIsa::recipe_constraints(), but that is only for the register allocator. These are the current RISC-V recipe names (from the generated encoding-riscv.rs):

pub static RECIPE_NAMES: [&'static str; 4] = [
    "R",
    "I",
    "Rshamt",
    "Iret",
];

The Encoding values assigned to instructions cont correspond directly to the final machine code encoding. Instead, each recipe maps to a function that can emit the machine code for an instruction given the following:

The InstructionData object with the Opcode and values of immediate operands.
The bits from the Encoding.
The ValueLoc (usually a physical register) that was assigned to each input operand and the results.

It is the job of the legalizer to make sure that every Cretonne instruction in use maps to a single ISA opcode. For example, IaddCarry, will be expanded on RISC-V using this transformation:

expand.legalize(
        (a, c) << iadd_carry(x, y, c_in),
        Rtl(
            (a1, c1) << iadd_cout(x, y),
            (a, c2) << iadd_cout(a1, c_in),
            c << bor(c1, c2)
        ))

stoklund · 2017-02-22T23:38:23Z

I am working on the register allocator right now, and I just pushed an extension to the encoding notation. It may now include value locations for the instruction's results:

[R#0c,%x2]              v0 = iadd vx0, vx1
[Iret#19]               return_reg v0

The first line means that the result value v0 is assigned to ValueLoc::Reg(%x2). Register names are always prefixed with %. Stack slots are just printed as is:

[S#14,ss7]              v0 = spill v1

The value locations are optional, but if they are present, there should be exactly one per result value. Unassigned values are represented with -, just like unencoded instructions:

[-,-]              v0 = spill v1

Lexical tokens

This introduces two new kinds of tokens that I would like to be able to use more generally:

%rrr register names. The lexer should just recognize this as a "percent-quoted" identifier, where the rrr part can be any sequence of alphanumerical characters and _. It would be different from a normal identifier because any word is ok: %v0, %function, %0 would all be valid. I am considering using this syntax for function names too. Right now, you can't have a function called v0, which is crazy.
#xxxx hexadecimal bits. The lexer should accept any sequence of hex digits following a #. Besides the encoding bits, I think this will be useful for encoding arbitrary data in the future. Things like 512-bit AVX vector constants, for example.

I'd be happy to review and merge just a lexer patch. You don't have to do everything in one PR.

angusholder · 2017-02-23T00:08:02Z

For the hexadecimal bits, do you think I should follow what you did with scan_number, that is to leave the literal unparsed until it reaches whoever wants it so they can reject it if it's too big?

stoklund · 2017-02-23T00:12:10Z

Yes, that's a good idea. For example, a three-digit encoding is OK: [R#14d], but if it's a hexadecimal representation of a sequence of bytes (see #47), an odd number of digits should be rejected.

angusholder · 2017-02-23T00:23:46Z

I'm naming them HexSequence and Name if that sounds alright. Given what you said in #47

With these changes, the parser should stop accepting unquoted identifiers as function names.

should we then aim to remove Identifier, and expect every alphanumeric sequence now to be a valid keyword?

stoklund · 2017-02-23T00:29:34Z

On Feb 22, 2017, at 16:23, angusholder ***@***.***> wrote: I'm naming them HexSequence and Name if that sounds alright.

Yep, sounds good.

Given what you said in #47 <#47> With these changes, the parser should stop accepting unquoted identifiers as function names. should we then aim to remove Identifier, and expect alphanumeric sequence now to be a valid identifier?

No, I think we should keep `Identifier` because it is used for a number of context-sensitive keywords at the moment (`stack_slot`, `function`, etc). It’s possible we can switch back to real keywords after #47 is fixed, but let’s wait with that for now.

angusholder · 2017-02-23T20:11:01Z

What do you think of adding the IsaSpec to the Parser or Context? parse_instruction() is going to need access to it to recognise the encoding strings. If I encounter an encoding at the start of an instruction and we've been given multiple IsaSpec's I assume that should be a parse error?

stoklund · 2017-02-23T20:34:05Z

What do you think of adding the IsaSpec to the Parser or Context? parse_instruction() is going to need access to it to recognise the encoding strings. If I encounter an encoding at the start of an instruction and we've been given multiple IsaSpec's I assume that should be a parse error?

It’s valid to have test files with no ISA spec, and it is valid to have multiple ISAs, see http://cretonne.readthedocs.io/en/latest/testing.html#file-tests I think that encodings and register specs only make sense when a file has a single unique ISA. When there are none or multiple ISAs, just ignore the encodings. I don’t think we need to fail the parse. It would make sense to add the `IsaSpec` to the `Parser`, I think.

stoklund · 2017-03-08T21:17:15Z

This is almost done, but we still need to parse the value locations following the encoding:

[R#0c,%x2]              v1 = iadd vx0, vx1
[S#14,ss7]              v2 = spill v1

The register value location %x2 means that v1 is assigned to a ValueLoc::Reg(%x2) location. The stack slot ss7 means that v2 is assigned to a ValueLoc::Stack(ss7) location.

The value locations are stored in the locations map in the Function. Register names like %x2 can be translated to RegUnits like this:

let reginfo = isa.register_info();
let regunit = reginfo.parse_regunit("x2").unwrap();
let loc = ValueLoc::Reg(regunit);
*ctx.function.locations.ensure(result) = loc;

angusholder · 2017-03-09T00:31:09Z

I'm working on this now. Should I make it a parse error if any value locations are specified when there isn't a unique isa?

stoklund · 2017-03-09T00:51:51Z

No, I think both encodings and value locations can be ignored if there's no unique ISA. I can imagine cases where you want to cut-and-paste test cases with encodings.

It should be an error if the number of value locations doesn't match the number of results produced by the instruction. (Unless the number of value locations is 0, which is ok)

stoklund · 2017-03-09T16:11:27Z

This was fixed by @angusholder

stoklund added the E-easy Issues suitable for newcomers to investigate, including Rust newcomers! label Nov 5, 2016

stoklund mentioned this issue Feb 22, 2017

Remove Opcode::NotAnOpcode and replace with use of Option where applicable #45

Merged

stoklund mentioned this issue Feb 23, 2017

Binary function names #47

Closed

angusholder mentioned this issue Feb 23, 2017

Lexer can now scan names, hex sequences, brackets and minus signs. #48

Merged

stoklund mentioned this issue Feb 24, 2017

Write and parse ABI annotations on function signatures #49

Closed

angusholder mentioned this issue Mar 9, 2017

Parse ValueLoc for each SSA value result of instructions #55

Merged

stoklund closed this as completed Mar 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse encodings #24

Parse encodings #24

stoklund commented Nov 5, 2016

angusholder commented Feb 22, 2017

angusholder commented Feb 22, 2017

angusholder commented Feb 22, 2017

stoklund commented Feb 22, 2017

stoklund commented Feb 22, 2017

angusholder commented Feb 23, 2017

stoklund commented Feb 23, 2017

angusholder commented Feb 23, 2017 •

edited

Loading

stoklund commented Feb 23, 2017 via email

angusholder commented Feb 23, 2017

stoklund commented Feb 23, 2017 via email

stoklund commented Mar 8, 2017

angusholder commented Mar 9, 2017

stoklund commented Mar 9, 2017

stoklund commented Mar 9, 2017

Parse encodings #24

Parse encodings #24

Comments

stoklund commented Nov 5, 2016

angusholder commented Feb 22, 2017

angusholder commented Feb 22, 2017

angusholder commented Feb 22, 2017

stoklund commented Feb 22, 2017

stoklund commented Feb 22, 2017

Lexical tokens

angusholder commented Feb 23, 2017

stoklund commented Feb 23, 2017

angusholder commented Feb 23, 2017 • edited Loading

stoklund commented Feb 23, 2017 via email

angusholder commented Feb 23, 2017

stoklund commented Feb 23, 2017 via email

stoklund commented Mar 8, 2017

angusholder commented Mar 9, 2017

stoklund commented Mar 9, 2017

stoklund commented Mar 9, 2017

angusholder commented Feb 23, 2017 •

edited

Loading