Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Prologue/epilogue doesn't adjust stack pointer #187

Closed
sunfishcode opened this issue Nov 8, 2017 · 7 comments
Closed

Prologue/epilogue doesn't adjust stack pointer #187

sunfishcode opened this issue Nov 8, 2017 · 7 comments
Labels
goal:native-ABI Focus area: Interoperate with native platform ABIs and calling conventions.

Comments

@sunfishcode
Copy link
Member

Prologues and epilogues don't yet include code to adjust the stack pointer.

For example, running cton-util compile on this input:

function %bar(f32, f32, f32, f32) -> f32 native {
    sig0 = () -> f32 native
    fn0 = sig0 %foo

ebb0(v0: f32, v1: f32, v2: f32, v3: f32):
    v4 = call fn0()
    v5 = fadd v4, v0
    v6 = fadd v5, v1
    v7 = fadd v6, v2
    v8 = fadd v7, v3
    return v8
}

Produces this output:

function %bar(f32 [%xmm0], f32 [%xmm1], f32 [%xmm2], f32 [%xmm3]) -> f32 [%xmm0] native {
    ss0 = spill_slot 4, offset -4
    ss1 = spill_slot 4, offset -8
    ss2 = spill_slot 4, offset -12
    ss3 = spill_slot 4, offset -16
    sig0 = () -> f32 [%xmm0] native
    fn0 = sig0 %foo

                                ebb0(v9: f32 [%xmm0], v10: f32 [%xmm1], v11: f32 [%xmm2], v12: f32 [%xmm3]):
[RexMp2fspSib32#57e,ss0]            v0 = spill v9
[RexMp2fspSib32#57e,ss1]            v1 = spill v10
[RexMp2fspSib32#57e,ss2]            v2 = spill v11
[RexMp2fspSib32#57e,ss3]            v3 = spill v12
[Op1call_id#e8,%xmm0]               v4 = call fn0()
[RexMp2ffiSib32#56e,%xmm1]          v13 = fill v0
[RexMp2fa#658,%xmm0]                v5 = fadd v4, v13
[RexMp2ffiSib32#56e,%xmm1]          v14 = fill v1
[RexMp2fa#658,%xmm0]                v6 = fadd v5, v14
[RexMp2ffiSib32#56e,%xmm1]          v15 = fill v2
[RexMp2fa#658,%xmm0]                v7 = fadd v6, v15
[RexMp2ffiSib32#56e,%xmm1]          v16 = fill v3
[RexMp2fa#658,%xmm0]                v8 = fadd v7, v16
[Op1ret#c3]                         return v8
}

which is this machine code:

0000000000000000 <bar>:
   0:	66 40 0f 7e 84 24 0c 00 00 00 	rex movd %xmm0,0xc(%rsp)
   a:	66 40 0f 7e 8c 24 08 00 00 00 	rex movd %xmm1,0x8(%rsp)
  14:	66 40 0f 7e 94 24 04 00 00 00 	rex movd %xmm2,0x4(%rsp)
  1e:	66 40 0f 7e 9c 24 00 00 00 00 	rex movd %xmm3,0x0(%rsp)
  28:	e8 00 00 00 00       	callq  2d <bar+0x2d>	29: R_X86_64_PLT32	foo-0x4
  2d:	66 40 0f 6e 8c 24 0c 00 00 00 	rex movd 0xc(%rsp),%xmm1
  37:	f3 40 0f 58 c1       	rex addss %xmm1,%xmm0
  3c:	66 40 0f 6e 8c 24 08 00 00 00 	rex movd 0x8(%rsp),%xmm1
  46:	f3 40 0f 58 c1       	rex addss %xmm1,%xmm0
  4b:	66 40 0f 6e 8c 24 04 00 00 00 	rex movd 0x4(%rsp),%xmm1
  55:	f3 40 0f 58 c1       	rex addss %xmm1,%xmm0
  5a:	66 40 0f 6e 8c 24 00 00 00 00 	rex movd 0x0(%rsp),%xmm1
  64:	f3 40 0f 58 c1       	rex addss %xmm1,%xmm0
  69:	c3                   	retq   

which has spills but no stack allocation.

@stoklund
Copy link
Contributor

stoklund commented Nov 8, 2017

Yep. We also need to set up a frame pointer and restore the old one in the epilogue. The spiderwasm functions depend on SpiderMonkey's prologue and epilogue code, but other calling conventions should insert their own.

We don't currently have instructions that can manipulate the stack and frame pointer registers. I don't think we should track those registers as SSA values since there are so many implicit uses of them. There is a ArgumentPurpose::FramePointer variant, but I think that if the frame pointer register is reserved, it may cause trouble to model the frame pointer explicitly with it.

I expect that prologues and epilogues will often contain ISA-specific instructions (like enter/leave), so the whole thing is delegated to TargetIsa::prologue_epilogue() where ISAs can override the default behavior.

@stoklund
Copy link
Contributor

stoklund commented Nov 8, 2017

Oh, and functions with a native calling convention should save and restore callee-saved registers too.

@sunfishcode sunfishcode added the goal:native-ABI Focus area: Interoperate with native platform ABIs and calling conventions. label Nov 8, 2017
@stoklund
Copy link
Contributor

Here's an overview of what's required to fix this issue and #189.

TargetIsa hook

Prologue and epilogue code is quite dependent on both the ISA and the function's calling convention, so the responsibility for generating this code is delegated to the TargetIsa::prologue_epilogue() trait method which every ISA must implement. This method is responsible for:

  • Add fixed stack slots for special areas of the stack frame that can't be used by normal stack slots.
  • Compute the stack frame layout by calling stack_layout::layout_stack().
  • Insert prologue code at the top of the entry EBB.
  • Insert epilogue code before each return instruction. (There can be more than one, although that doesn't happen currently.)

Currently, the TargetIsa trait provides a default implementation of prologue_epilogue which only works correctly for spiderwasm functions on x86-64 which depend on SpiderMonkey's prologue/epilogue code being wrapped around the code generated by Cretonne. The Intel ISA module should provide its own implementation, and it should delegate to a function in isa/intel/abi.rs, like the related allocatable_registers() and legalize_signature() trait methods already do.

Prologues

The prologue code must:

  • Save any callee-saved registers to the stack so they are available for the register allocator to use in the function body. This includes the caller's frame pointer.
  • Set up a frame pointer for this function.
  • Adjust the stack pointer to make room for the function's stack frame.

For most ABIs, the frame pointer should point to the stack address where the previous frame pointer was saved so the frames form a linked list:

struct Frame {
    struct Frame *caller_fp;
    void *return_address;
};

See figure 3.3 in the x86-64 ABI specification.

Epilogues

The epilogues before each return instruction must:

  • Restore the stack pointer so the return instruction pops the right return address.
  • Restore the values of callee-saved registers, including the frame pointer.

Callee-saved registers

The x86-64 ABI specifies that %rbx, %rbp, and %r12 through %r15 are preserved across function calls, but the exact set depends on both the calling convention and OS. Windows has a different set, and spiderwasm functions don't save any registers.

For an initial implementation, we can save the full set of callee-saved registers. This will work correctly, but a common optimization is to only save those registers that are clobbered by the function body. This requires some collaboration with the register allocator.

... more implementation notes to follow ...

@stoklund
Copy link
Contributor

Example for x86-64

This is what I imagine the prologue and epilogue code would look like in a function that has 168 bytes of local variables and that uses the callee-saved %rbx and %r12 registers:

function %foo(f64 [%xmm0], i64 fp [%rbp], i64 csr [%rbx], i64 csr [%r12]) -> f64 [%xmm0], i64 fp [%rbp], i64 csr [%rbx], i64 csr [%r12] {
    ss0 = local 168, offset 0
    ss1 = incoming_arg 32, offset -32
    
ebb0(v0: f64 [%xmm0], v1: i64 [%rbp], v2: i64 [%rbx], v3: i64 [%r12]):
    x86_push v1
    copy_special %rsp -> %rbp
    x86_push v2
    x86_push v3
    adjust_sp_imm -168

    ; ... function body ...

    adjust_sp_imm 168
    [xx,%r12] v100 = x86_pop
    [xx,%rbx] v101 = x86_pop
    [xx,%rbp] v102 = x86_pop
    return v99, v102, v101, v100
}
  • The caller's frame pointer and callee-saved registers are added to the signature as incoming function arguments and as return values. This is done by the prologue_epilogue() hook and not by legalize_signature() because we want the freedom to omit a frame pointer or ignore callee-saved registers that aren't touched by the function (such as %r13%r15 in the example).
  • The ss0 stack slot represents the function's local variables. There will usually be many spill slots too. These stack slots are created before prologue_epilogue() is called, and prologue_epilogue() is responsible for fixing the offset of each by calling layout_stack().
  • The ss1 stack slot is created by prologue_epilogue() before calling layout_stack() in order to reserve stack space for the pushed return address and CSRs. Stack slots with type incoming_arg have offsets that are relative to the caller's stack pointer. (This is not technically an incoming argument on the stack; we reuse the stack slot type for fixed stack reservations.).
  • The new x86_push instruction is Intel-specific. Most ISAs have some kind of shortcut that can be used, but they are all different: Intel has push instructions, ARM AArch32 has store-multiple instructions, and AArch64 requires registers to be pushed in pairs to maintain a 128-bit stack pointer alignment at all times. We'll use ISA-specific instructions for these operations.
  • The new copy_special instruction is used to set up the frame pointer. It is a new ISA-independent opcode used for copying between special registers.
  • Finally, adjust_sp_imm adds an immediate constant to the stack pointer. This is also a new ISA-independent opcode. This function's frame size is 200 bytes, measured from the caller's stack pointer.

New instructions

We need to define some new instructions for manipulating the stack pointer and the frame pointer. Some are ISA-independent instructions defined in meta/base/instructions.rs:

  • copy_special. Very similar to regmove, but for manipulating reserved/special registers, not for normal SSA values. Encodings are going to be the same as for regmove.
  • adjust_sp_imm. Add an immediate value to the stack pointer. For ARM we may also need an adjust_sp version which adds an SSA value in a register, but Intel has sufficient immediate range in a single add or sub instruction.

New instructions specific to Intel ISAs are defined in meta/isa/intel/instructions.py:

  • x86_push. Pushes a single SSA value to the stack.
  • x86_pop. Pops a single SSA value from the stack.

All of these new instructions need to be marked as other_side_effects=True because they modify reserved registers like the stack pointer.

Optimizations

We should start out focusing on creating correct code for prologues and epilogues. Later we can add some optimizations:

  • Initially we'll just save all CSRs in every function prologue. Later we can limit this to just the CSRs that are actually clobbered by the function body.
  • In some environments, frame pointer elimination may be desirable. This frees up the frame pointer register for the register allocator's use, but also makes it much harder to generate back-traces.
  • Leaf functions can sometimes get away with omitting parts of the prologue.

@tyler
Copy link
Member

tyler commented Jan 18, 2018

Hey Jakob! I'm interested in starting to work on the optimizations you listed at the end here. If you have plans or ideas for how these should be implemented, I'm all ears.

@stoklund
Copy link
Contributor

@tyler, I've commented over in #189. I'll flesh it out more later.

@stoklund
Copy link
Contributor

stoklund commented Feb 1, 2018

Closing this since we have native prologues now. Leaving #189 open for the optimizations

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
goal:native-ABI Focus area: Interoperate with native platform ABIs and calling conventions.
Projects
None yet
Development

No branches or pull requests

3 participants