-
Notifications
You must be signed in to change notification settings - Fork 201
Added Intel x86-64 encodings for 64bit loads and store instructions #127
Added Intel x86-64 encodings for 64bit loads and store instructions #127
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
It generally looks good, you just need to add 32-bit tests and deal with the weirdness around the ABCD constraints.
@@ -271,7 +271,7 @@ | |||
'istore16', r""" | |||
Store the low 16 bits of ``x`` to memory at ``p + Offset``. | |||
|
|||
This is equivalent to ``ireduce.i16`` followed by ``store.i8``. | |||
This is equivalent to ``ireduce.i16`` followed by ``store.i16``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome.
filetests/isa/intel/binary64.cton
Outdated
; asm: movzwl (%rcx), %edi | ||
[-,%rdi] v402 = uload16.i32 v1 ; bin: 40 0f b7 39 | ||
; asm: movzwl (%rsi), %edx | ||
[-,%rdx] v403 = uload16.i32 v2 ; bin: 40 0f b7 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the loads producing a 32-bit value should be in the %I32
function below.
Here, you want to test the loads producing 64-bit values: uload32.i64
and uload16.i64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oups, my mistake.
filetests/isa/intel/binary64.cton
Outdated
; asm: movq (%rcx), %rdi | ||
[-,%rdi] v460 = load.i64 v1 ; bin: 48 8b 39 | ||
; asm: movq (%rsi), %rdx | ||
[-,%rdx] v461 = load.i64 v2 ; bin: 48 8b 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to keep the value numbers monotonically increasing, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I'll do a final value renumbering pass when all modifications are done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's fine to leave gaps, BTW.
filetests/isa/intel/binary64.cton
Outdated
; asm: movsbq 50(%rcx), %rdi | ||
[-,%rdi] v418 = sload8.i64 v1+50 ; bin: 48 0f be 79 32 | ||
; asm: movsbq -50(%rsi), %rdx | ||
[-,%rdx] v419 = sload8.i64 v2-50 ; bin: 48 0f be 56 ce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, this is the right set of instructions to test here. You should do the same for the no-displacement ones above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
filetests/isa/intel/binary64.cton
Outdated
; asm: movsbq 50000(%rcx), %rdi | ||
[-,%rdi] v428 = sload8.i64 v1+50000 ; bin: 48 0f be b9 0000c350 | ||
; asm: movsbq -50000(%rsi), %rdx | ||
[-,%rdx] v429 = sload8.i64 v2-50000 ; bin: 48 0f be 96 ffff3cb0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, but please add similar tests for *.i32
instructions to the %I32
function below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
""" | ||
Add encodings for `inst.i32` to I32. | ||
Add encodings for `inst.i32` to I64 with and without REX. | ||
Add encodings for `inst.i64` to I64 with a REX.W prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. Please add an explanation for the w_bit
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
I64.enc(base.istore32.i64.any, *r.st.rex(0x89)) | ||
I64.enc(base.istore32.i64.any, *r.stDisp8.rex(0x89)) | ||
I64.enc(base.istore32.i64.any, *r.stDisp32.rex(0x89)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice simplification. Very cool!
Do you think it makes sense to handle the three types of displacement that way too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand your question. Is there another way of handling the three types of displacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I meant is that the code is very repetitive, so maybe a function call could reduce the redundancy. It's not necessary.
|
||
enc_i32_i64_ld_st(base.istore8, 0, r.st_abcd, 0x88) | ||
enc_i32_i64_ld_st(base.istore8, 0, r.stDisp8_abcd, 0x88) | ||
enc_i32_i64_ld_st(base.istore8, 0, r.stDisp32_abcd, 0x88) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, so this part is extra tricky because you're using 8-bit registers.
In 32-bit mode, and in 64-bit instructions without a REX prefix, the following 8-bit registers can be addressed: AL, CL, DL, BL, AH, CH, DH, BH. The last 4 are the "high" byte registers, corresponding to bits 8-15 of the ABCD GPRs.
In 64-bit mode with a REX prefix, the accessible 8-bit registers are: AL, CL, DL, BL, SPL, BPL, SIL, DIL, R8B-R15B. Note the difference in registers 4-7 — they are the low bytes of GPRs instead of the high bytes of ABCD.
In Cretonne, we don't model the high-byte registers. They are never used. This is the reason the I32
byte store instructions have an ABCD
register constraint instead of GPR
like the other instructions. When storing a byte, the value to store must be in one of the ABCD registers because without a REX prefix, we can't store the low byte of %esi
for example.
However, with a REX prefix, a byte store can access the low byte of all 16 registers, so we just want a normal GPR
register constraint which is much easier for the register allocator to satisfy.
The end result is:
- When accessing 8-bit registers without a REX prefix, use the
ABCD
register constraint. - When accessing 8-bit registers with a REX prefix, use the
GPR
register constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it. I won't use enc_i32_i64_ld_st
for istore8
but rather declare the I32.enc
and I64.enc
directly with the correct recipes.
In the binary64.cton, several instructions appear to have REX prefixes that don't require them, like this:
Is this a known limitation? |
Removing redundant REX prefixes is treated as an optimization in Cretonne. Such an instruction will have two valid encodings: With and without a REX prefix. The encoding process goes like this:
This approach means that redundant REX prefixes are removed after register allocation. On ARM, Thumb compression works the same way. The test binemit mode could test either choice. Currently it picks the first valid encoding, which is the encoding that would be used before register allocation. You could argue that it shold pick the compressed encoding instead since it knows the register assignments. |
Ideally, I would like to be able to test both:
|
Fixed testing of 64bit intel encoding
Tell me if it's good before merging and I will renumber the values in |
Looks good! |
For 64bits instructions for which the REX prefix is optional (i.e. the REX.W bit is set to 0), should I generate only the encoding with REX prefix, only the encoding without or both ? |
You should generate both, the first one without REX, and the second one with REX. You will only be able to test the first encoding, per the comments above and #128. |
Ok, got it. |
Value renumbering in binary64.cton
14a0f1f
to
fb7ba9d
Compare
Encoding recipes factored with 32 bits instructions, corresponding tests added.