New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify calling conventions for profiler Enter callback #19023

Open
noahfalk opened this Issue Jul 19, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@noahfalk
Member

noahfalk commented Jul 19, 2018

@dotnet/jit-contrib @sywhang

While investigating #18977 I'm seeing a number of things that look inconsistent and probably need to be fixed or better documented. Jit folks, can you let me know what you think?

  1. The FunctionEnter3/FunctionLeave3/FunctionTailcall3 methods are a publicly exposed and have a documented ABI. On Linux x64 we pass FunctionIDOrClientID in R14, MSDN documentation doesn't mention a custom calling convention so developers would expect RDI. I believe we picked R14 for good reason so I propose we change MSDN to match.
  2. The runtime sometimes provides the implementation of the ProfileEnter call as an intermediary between the jitted code and other forms of the profiler callback. On Linux x64 that gives us 4 non-agreeing definitions of the register preservation requirements:

I don't have a good sense of exactly what the JIT expects to be preserved across this call for the code to run correctly, but whatever it is I'd like to bring our own comments, implementation, and MSDN docs into alignment with it. I suspect there may be discrepancies for the register preservation requirements on other architectures, but I'm happy to start with Linux x64.

Thanks!
-Noah

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Aug 8, 2018

Contributor

Note that the document that JIT depends most on for ABI related questions is the "CLR ABI". It has a section on the profiler hooks: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#profiler-hooks. It could certainly be expanded to be more clear, and answer more questions like you have here.

In the JIT, the most interesting parts of the implementation are genProfilingEnterCallback and genProfilingLeaveCallback.

Generally, documentation probably was originally written for x86 -- the first architecture -- and not updated very much to handle the other architectures (x64, Linux x64, arm32, arm64, Linux x86).

It looks to me that for Linux x64:

  1. for the enter hook we pass R14 = ProfilerMethodHnd (I guess this is FunctionIDOrClientID?), R15 = caller's SP. (For Windows x64, it's the normal first 2 argument registers, RCX/RDX). It looks like we don't document the 2nd argument? Or maybe that's what FunctionEnter3WithInfo (and friends) are, and the JIT just always generates the same code.

There is no documentation in the code or "CLR ABI" to explain why R14/R15 were picked. Presumably it is because there is no caller-provided "home" space for the argument registers, as on Windows x64. So we don't want to trash the incoming registers. On Windows, we first home all the register argument, and then we can trash them.

Regarding register preservation:

  • On Windows x64, I believe any volatile integer register can be trashed. Callee-saved must be preserved. This is the usual function calling convention.
  • On Linux x64, all argument and callee-saved registers must be preserved. (Note there are no callee-saved floating-point registers.)

The asmhelper.S comment that says rax/rdx/xmm0/xmm1 need to be preserved should, I believe, only apply to the "leave" helper, which needs to preserve the function return value.

These statements should really be backed up by testing! And extended to other platforms.

Contributor

BruceForstall commented Aug 8, 2018

Note that the document that JIT depends most on for ABI related questions is the "CLR ABI". It has a section on the profiler hooks: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#profiler-hooks. It could certainly be expanded to be more clear, and answer more questions like you have here.

In the JIT, the most interesting parts of the implementation are genProfilingEnterCallback and genProfilingLeaveCallback.

Generally, documentation probably was originally written for x86 -- the first architecture -- and not updated very much to handle the other architectures (x64, Linux x64, arm32, arm64, Linux x86).

It looks to me that for Linux x64:

  1. for the enter hook we pass R14 = ProfilerMethodHnd (I guess this is FunctionIDOrClientID?), R15 = caller's SP. (For Windows x64, it's the normal first 2 argument registers, RCX/RDX). It looks like we don't document the 2nd argument? Or maybe that's what FunctionEnter3WithInfo (and friends) are, and the JIT just always generates the same code.

There is no documentation in the code or "CLR ABI" to explain why R14/R15 were picked. Presumably it is because there is no caller-provided "home" space for the argument registers, as on Windows x64. So we don't want to trash the incoming registers. On Windows, we first home all the register argument, and then we can trash them.

Regarding register preservation:

  • On Windows x64, I believe any volatile integer register can be trashed. Callee-saved must be preserved. This is the usual function calling convention.
  • On Linux x64, all argument and callee-saved registers must be preserved. (Note there are no callee-saved floating-point registers.)

The asmhelper.S comment that says rax/rdx/xmm0/xmm1 need to be preserved should, I believe, only apply to the "leave" helper, which needs to preserve the function return value.

These statements should really be backed up by testing! And extended to other platforms.

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk Aug 9, 2018

Member

Thanks for looking into this Bruce! I agree on the testing. My thinking here is we could write a trivial profiler that registers ELT callbacks in order to deliberately trash every register we believe we can. If we can have this profiler loaded and pass all the CoreCLR tests then it would be good evidence the analysis was accurate.

for the enter hook we pass R14 = ProfilerMethodHnd (I guess this is FunctionIDOrClientID?), R15 = caller's SP. (For Windows x64, it's the normal first 2 argument registers, RCX/RDX). It looks like we don't document the 2nd argument?

That is intentional. The public contract is only on the 1st argument. The second argument is private contract between JIT and runtime so that the runtime can implement FunctionEnter3WithInfo.

Member

noahfalk commented Aug 9, 2018

Thanks for looking into this Bruce! I agree on the testing. My thinking here is we could write a trivial profiler that registers ELT callbacks in order to deliberately trash every register we believe we can. If we can have this profiler loaded and pass all the CoreCLR tests then it would be good evidence the analysis was accurate.

for the enter hook we pass R14 = ProfilerMethodHnd (I guess this is FunctionIDOrClientID?), R15 = caller's SP. (For Windows x64, it's the normal first 2 argument registers, RCX/RDX). It looks like we don't document the 2nd argument?

That is intentional. The public contract is only on the 1st argument. The second argument is private contract between JIT and runtime so that the runtime can implement FunctionEnter3WithInfo.

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk Aug 20, 2018

Member

@BruceForstall - I've been looking at this a bit more and it raised a few (hopefully quick) additional questions:

  1. Are there any scenarios where the JIT needs the upper 64 bits of the XMM arguments preserved? As far as I know the largest floating point type that could be passed as an argument is 8 bytes, and the profiler is only designed to expose 8 byte arguments. I am guessing save/restore on the low 8 bytes is sufficient.
  2. All the callbacks currently preserve 16 bytes for XMM0/XMM1 return values. I wasn't planning to change this for Leave/Tailcall functions, but if you knew I was curious if we use larger return values?
Member

noahfalk commented Aug 20, 2018

@BruceForstall - I've been looking at this a bit more and it raised a few (hopefully quick) additional questions:

  1. Are there any scenarios where the JIT needs the upper 64 bits of the XMM arguments preserved? As far as I know the largest floating point type that could be passed as an argument is 8 bytes, and the profiler is only designed to expose 8 byte arguments. I am guessing save/restore on the low 8 bytes is sufficient.
  2. All the callbacks currently preserve 16 bytes for XMM0/XMM1 return values. I wasn't planning to change this for Leave/Tailcall functions, but if you knew I was curious if we use larger return values?
@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Aug 20, 2018

Contributor

The questions are specific to x64, I believe.

We don't support __vectorcall convention, so:

  1. only the low 64 bits of XMM arguments need be preserved.
  2. I believe we also only support 64-bit return values in XMM0. For Linux/x64, it's a little more complicated: XMM0 and XMM1 can return two members of a struct of two doubles. I can't recall what happens for a struct of 2 floats in this case.

Maybe @CarolEidt can comment to verify.

Contributor

BruceForstall commented Aug 20, 2018

The questions are specific to x64, I believe.

We don't support __vectorcall convention, so:

  1. only the low 64 bits of XMM arguments need be preserved.
  2. I believe we also only support 64-bit return values in XMM0. For Linux/x64, it's a little more complicated: XMM0 and XMM1 can return two members of a struct of two doubles. I can't recall what happens for a struct of 2 floats in this case.

Maybe @CarolEidt can comment to verify.

@CarolEidt

This comment has been minimized.

Show comment
Hide comment
@CarolEidt

CarolEidt Aug 21, 2018

Member

@BruceForstall is right about the handling of the upper bits of XMM arguments, though for anything that's not classified as a call, we expect them to be preserved.

On Linux/x64, I believe it's the case that a struct of 2 floats would be returned in XMM0, but a struct of 2 doubles or 3 or 4 floats would be returned in XMM0 and XMM1.

There's no support for using more than 2 registers for returns.

Member

CarolEidt commented Aug 21, 2018

@BruceForstall is right about the handling of the upper bits of XMM arguments, though for anything that's not classified as a call, we expect them to be preserved.

On Linux/x64, I believe it's the case that a struct of 2 floats would be returned in XMM0, but a struct of 2 doubles or 3 or 4 floats would be returned in XMM0 and XMM1.

There's no support for using more than 2 registers for returns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment