Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Added Intel x86-64 encodings for 64bit loads and store instructions #127

Merged
merged 3 commits into from
Jul 31, 2017

Conversation

denismerigoux
Copy link
Contributor

Encoding recipes factored with 32 bits instructions, corresponding tests added.

Copy link
Contributor

@stoklund stoklund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

It generally looks good, you just need to add 32-bit tests and deal with the weirdness around the ABCD constraints.

@@ -271,7 +271,7 @@
'istore16', r"""
Store the low 16 bits of ``x`` to memory at ``p + Offset``.

This is equivalent to ``ireduce.i16`` followed by ``store.i8``.
This is equivalent to ``ireduce.i16`` followed by ``store.i16``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome.

; asm: movzwl (%rcx), %edi
[-,%rdi] v402 = uload16.i32 v1 ; bin: 40 0f b7 39
; asm: movzwl (%rsi), %edx
[-,%rdx] v403 = uload16.i32 v2 ; bin: 40 0f b7 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the loads producing a 32-bit value should be in the %I32 function below.

Here, you want to test the loads producing 64-bit values: uload32.i64 and uload16.i64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, my mistake.

; asm: movq (%rcx), %rdi
[-,%rdi] v460 = load.i64 v1 ; bin: 48 8b 39
; asm: movq (%rsi), %rdx
[-,%rdx] v461 = load.i64 v2 ; bin: 48 8b 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to keep the value numbers monotonically increasing, if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'll do a final value renumbering pass when all modifications are done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's fine to leave gaps, BTW.

; asm: movsbq 50(%rcx), %rdi
[-,%rdi] v418 = sload8.i64 v1+50 ; bin: 48 0f be 79 32
; asm: movsbq -50(%rsi), %rdx
[-,%rdx] v419 = sload8.i64 v2-50 ; bin: 48 0f be 56 ce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, this is the right set of instructions to test here. You should do the same for the no-displacement ones above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

; asm: movsbq 50000(%rcx), %rdi
[-,%rdi] v428 = sload8.i64 v1+50000 ; bin: 48 0f be b9 0000c350
; asm: movsbq -50000(%rsi), %rdx
[-,%rdx] v429 = sload8.i64 v2-50000 ; bin: 48 0f be 96 ffff3cb0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but please add similar tests for *.i32 instructions to the %I32 function below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""
Add encodings for `inst.i32` to I32.
Add encodings for `inst.i32` to I64 with and without REX.
Add encodings for `inst.i64` to I64 with a REX.W prefix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. Please add an explanation for the w_bit argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


I64.enc(base.istore32.i64.any, *r.st.rex(0x89))
I64.enc(base.istore32.i64.any, *r.stDisp8.rex(0x89))
I64.enc(base.istore32.i64.any, *r.stDisp32.rex(0x89))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice simplification. Very cool!

Do you think it makes sense to handle the three types of displacement that way too?

Copy link
Contributor Author

@denismerigoux denismerigoux Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand your question. Is there another way of handling the three types of displacement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I meant is that the code is very repetitive, so maybe a function call could reduce the redundancy. It's not necessary.


enc_i32_i64_ld_st(base.istore8, 0, r.st_abcd, 0x88)
enc_i32_i64_ld_st(base.istore8, 0, r.stDisp8_abcd, 0x88)
enc_i32_i64_ld_st(base.istore8, 0, r.stDisp32_abcd, 0x88)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, so this part is extra tricky because you're using 8-bit registers.

In 32-bit mode, and in 64-bit instructions without a REX prefix, the following 8-bit registers can be addressed: AL, CL, DL, BL, AH, CH, DH, BH. The last 4 are the "high" byte registers, corresponding to bits 8-15 of the ABCD GPRs.

In 64-bit mode with a REX prefix, the accessible 8-bit registers are: AL, CL, DL, BL, SPL, BPL, SIL, DIL, R8B-R15B. Note the difference in registers 4-7 — they are the low bytes of GPRs instead of the high bytes of ABCD.

In Cretonne, we don't model the high-byte registers. They are never used. This is the reason the I32 byte store instructions have an ABCD register constraint instead of GPR like the other instructions. When storing a byte, the value to store must be in one of the ABCD registers because without a REX prefix, we can't store the low byte of %esi for example.

However, with a REX prefix, a byte store can access the low byte of all 16 registers, so we just want a normal GPR register constraint which is much easier for the register allocator to satisfy.

The end result is:

  • When accessing 8-bit registers without a REX prefix, use the ABCD register constraint.
  • When accessing 8-bit registers with a REX prefix, use the GPR register constraint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it. I won't use enc_i32_i64_ld_st for istore8 but rather declare the I32.enc and I64.enc directly with the correct recipes.

@sunfishcode
Copy link
Member

In the binary64.cton, several instructions appear to have REX prefixes that don't require them, like this:

; asm: movl %ecx, (%rsi)
istore32 v1, v2                             ; bin: 40 89 0e

Is this a known limitation?

@stoklund
Copy link
Contributor

Removing redundant REX prefixes is treated as an optimization in Cretonne. Such an instruction will have two valid encodings: With and without a REX prefix. The encoding process goes like this:

  1. Before register allocation: Pick the first valid encoding for each instruction. The encodings should be arranged such that this constraints the register allocator as little as possible.
  2. After register allocation: Pick the smallest valid encoding for each instruction that has register constraints compatible with the register allocator's choices.

This approach means that redundant REX prefixes are removed after register allocation. On ARM, Thumb compression works the same way.

The test binemit mode could test either choice. Currently it picks the first valid encoding, which is the encoding that would be used before register allocation. You could argue that it shold pick the compressed encoding instead since it knows the register assignments.

@stoklund
Copy link
Contributor

Ideally, I would like to be able to test both:

; asm: movl %ecx, (%rsi)
istore32 v1, v2                             ; bin[early]: 40 89 0e
                                            ; bin[late]:  89 0e

@denismerigoux
Copy link
Contributor Author

Tell me if it's good before merging and I will renumber the values in binary64.cton.

@stoklund
Copy link
Contributor

Looks good!

@denismerigoux
Copy link
Contributor Author

For 64bits instructions for which the REX prefix is optional (i.e. the REX.W bit is set to 0), should I generate only the encoding with REX prefix, only the encoding without or both ?

@stoklund
Copy link
Contributor

You should generate both, the first one without REX, and the second one with REX.

You will only be able to test the first encoding, per the comments above and #128.

@denismerigoux
Copy link
Contributor Author

Ok, got it.

@stoklund stoklund merged commit b74723c into bytecodealliance:master Jul 31, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants