-
-
Notifications
You must be signed in to change notification settings - Fork 422
fix Issue 7954 - x86_64 Windows fibers do not save nonvolatile XMM registers #810
Conversation
| movdqu [RSP + 48], XMM12; | ||
| movdqu [RSP + 32], XMM13; | ||
| movdqu [RSP + 16], XMM14; | ||
| movdqu [RSP], XMM15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take care of alignment you could use movdqa. On function entry the stack is 16-byte aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about YMM registers for CPUs with AVX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nameless non-XMM part of YMM registers is - surprisingly - explicitly marked as volatile by Microsoft, so they don't need to be saved.
Regarding alignment: On Win64, the stack is guaranteed to be misaligned on function entry because of the return address on the stack. In normal functions, the initial "push rbp" balances it again. And herein lies the problem:
On entry, the context switch function expects a misaligned stack. When returning from a normal function, the caller would expect an aligned stack. But when calling a fiber the first time, we are not really returning, but jumping to the beginning of the fiber function, which - not being aware of that little trick - expects a misaligned stack.
This is only a problem the first time we switch to a fiber, from then on the source and destination stack both have the same alignment. I'm working on handling that special case.
…nstructions Implemented a trampoline function to prevent differently-aligned stacks on first call to fiber entry point.
|
I have solved the problem of stack alignment for efficient access of XMM registers by implementing a tiny trampoline function which can be called with an aligned stack, which was not possible with a raw fiber_entryPoint. |
| // NOTE: When changing the layout of registers on the stack, | ||
| // make sure that the XMM registers are still aligned. | ||
| // On function entry, the stack is guaranteed to be | ||
| // misaligned by 8 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack is aligned to 16-byte on function call, the 8 byte are the return address. So misaligned is a little confusing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I call it? "Not aligned to 16 bytes"?
|
Here a small summary of the problem: -when returning from a "call", the stack must be aligned to 16 bytes after the "return" That's why the first call must be handled specially and that's why the stacks had different alignments before. |
|
An alternative solution to this problem is to make fiber_entryPoint naked, so it can be called with an aligned stack. The trampoline would be unnecessary then, but this would probably need changes in all other context switch functions (unless that change is versioned to Win64). Having a conditional in the Win64 context switch is also an option, but would be a bad idea performance-wise. |
| { | ||
| naked; | ||
| sub RSP, 32; // Reserve shadow space for callee | ||
| call R12; // R12 is set to fiber_entryPoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could call fiber_entryPoint directly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it's a "call", not a "jmp". It pushes the address on the stack, bringing it out of alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean call fiber_entryPoint rather than indirect through R12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure about the side effects of doing that. I don't know much about how function references and closures are handled internally in D yet, so I was worried I might create a GC problem or something like that. If you say it's safe to do, I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's just a plain symbol reference that the linker will resolve.
|
OK, got it. It appears that this should be a problem on other platforms too. |
|
Yeah, but only if their calling convention demands something nasty like this. I don't think the XMM registers are nonvolatile on Posix. |
This is what you just did with the trampoline ;). |
| asm | ||
| { | ||
| naked; | ||
| sub RSP, 32; // Reserve shadow space for callee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the shadow space for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of the Win64 calling convention. It's normally there to let the callee swap out the arguments from RCX, RDX, R8 and R9, but he may use it for anything. This space must be provided even if the function doesn't take any arguments. It was a bug that this was missing before.
You can see it here:
http://msdn.microsoft.com/en-us/library/ew5tede7.aspx
On that site it isn't named, but in other parts of the documentation, it is called "shadow space".
|
Auto-merge toggled on |
fix Issue 7954 - x86_64 Windows fibers do not save nonvolatile XMM registers
https://issues.dlang.org/show_bug.cgi?id=7954
This makes context switches for fibers on Win64 a little more expensive, but there is no way around it without forcing fibers to not use SSE registers and not call any external code. The calling convention demands it: http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
I have also included unit tests to check all nonvolatile registers. The functions I have defined for that could probably be easily reused to make calling convention tests for other platforms.