Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

fix Issue 7954 - x86_64 Windows fibers do not save nonvolatile XMM registers #810

Merged
merged 3 commits into from
May 27, 2014
Merged

Conversation

jblume
Copy link
Contributor

@jblume jblume commented May 26, 2014

https://issues.dlang.org/show_bug.cgi?id=7954

This makes context switches for fibers on Win64 a little more expensive, but there is no way around it without forcing fibers to not use SSE registers and not call any external code. The calling convention demands it: http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx

I have also included unit tests to check all nonvolatile registers. The functions I have defined for that could probably be easily reused to make calling convention tests for other platforms.

movdqu [RSP + 48], XMM12;
movdqu [RSP + 32], XMM13;
movdqu [RSP + 16], XMM14;
movdqu [RSP], XMM15;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take care of alignment you could use movdqa. On function entry the stack is 16-byte aligned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about YMM registers for CPUs with AVX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nameless non-XMM part of YMM registers is - surprisingly - explicitly marked as volatile by Microsoft, so they don't need to be saved.

Regarding alignment: On Win64, the stack is guaranteed to be misaligned on function entry because of the return address on the stack. In normal functions, the initial "push rbp" balances it again. And herein lies the problem:

On entry, the context switch function expects a misaligned stack. When returning from a normal function, the caller would expect an aligned stack. But when calling a fiber the first time, we are not really returning, but jumping to the beginning of the fiber function, which - not being aware of that little trick - expects a misaligned stack.

This is only a problem the first time we switch to a fiber, from then on the source and destination stack both have the same alignment. I'm working on handling that special case.

…nstructions

Implemented a trampoline function to prevent differently-aligned stacks on first call to fiber entry point.
@jblume
Copy link
Contributor Author

jblume commented May 27, 2014

I have solved the problem of stack alignment for efficient access of XMM registers by implementing a tiny trampoline function which can be called with an aligned stack, which was not possible with a raw fiber_entryPoint.

// NOTE: When changing the layout of registers on the stack,
// make sure that the XMM registers are still aligned.
// On function entry, the stack is guaranteed to be
// misaligned by 8 bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack is aligned to 16-byte on function call, the 8 byte are the return address. So misaligned is a little confusing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I call it? "Not aligned to 16 bytes"?

@jblume
Copy link
Contributor Author

jblume commented May 27, 2014

Here a small summary of the problem:

-when returning from a "call", the stack must be aligned to 16 bytes after the "return"
-when entering a function, the stack must not be aligned to 16 bytes
-the first time we "jmp" in the context switch, we jump to a function => must not be aligned
-from then on, our "jmp" is a return from a "call" the fiber did => must be aligned

That's why the first call must be handled specially and that's why the stacks had different alignments before.

@jblume
Copy link
Contributor Author

jblume commented May 27, 2014

An alternative solution to this problem is to make fiber_entryPoint naked, so it can be called with an aligned stack. The trampoline would be unnecessary then, but this would probably need changes in all other context switch functions (unless that change is versioned to Win64). Having a conditional in the Win64 context switch is also an option, but would be a bad idea performance-wise.

{
naked;
sub RSP, 32; // Reserve shadow space for callee
call R12; // R12 is set to fiber_entryPoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could call fiber_entryPoint directly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it's a "call", not a "jmp". It pushes the address on the stack, bringing it out of alignment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean call fiber_entryPoint rather than indirect through R12.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about the side effects of doing that. I don't know much about how function references and closures are handled internally in D yet, so I was worried I might create a GC problem or something like that. If you say it's safe to do, I will change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's just a plain symbol reference that the linker will resolve.

@MartinNowak
Copy link
Member

OK, got it. It appears that this should be a problem on other platforms too.

@jblume
Copy link
Contributor Author

jblume commented May 27, 2014

Yeah, but only if their calling convention demands something nasty like this. I don't think the XMM registers are nonvolatile on Posix.

@MartinNowak
Copy link
Member

An alternative solution to this problem is to make fiber_entryPoint naked, so it can be called with an aligned stack.

This is what you just did with the trampoline ;).

asm
{
naked;
sub RSP, 32; // Reserve shadow space for callee
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the shadow space for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the Win64 calling convention. It's normally there to let the callee swap out the arguments from RCX, RDX, R8 and R9, but he may use it for anything. This space must be provided even if the function doesn't take any arguments. It was a bug that this was missing before.

You can see it here:
http://msdn.microsoft.com/en-us/library/ew5tede7.aspx
On that site it isn't named, but in other parts of the documentation, it is called "shadow space".

@MartinNowak
Copy link
Member

Auto-merge toggled on

MartinNowak added a commit that referenced this pull request May 27, 2014
fix Issue 7954 - x86_64 Windows fibers do not save nonvolatile XMM registers
@MartinNowak MartinNowak merged commit 4189f4d into dlang:master May 27, 2014
@jblume jblume deleted the fibers-xmm branch May 28, 2014 15:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants