-
Notifications
You must be signed in to change notification settings - Fork 2.7k
i386: Fixed definition with declaration in eetoprofinterfaceimpl.cpp #18792
Conversation
src/vm/i386/unixstubs.cpp
Outdated
@@ -6,17 +6,17 @@ | |||
|
|||
extern "C" | |||
{ | |||
void ProfileEnterNaked(FunctionIDOrClientID functionIDOrClientID) | |||
void __stdcall ProfileEnterNaked(FunctionIDOrClientID functionIDOrClientID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the JIT on Linux x86 going to call these with __stdcall calling convention? I thought we have converted everything to cdecl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one appears documented as though it was __stdcall: https://github.com/dotnet/coreclr/blob/master/src/jit/codegencommon.cpp#L6723
and from my admitedly limited understanding of the JIT code that is how its currently implemented.
@dotnet/jit-contrib - does that sound right?
Assuming it is, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have in the file src/vm/eetoprofinterfaceimpl.cc
// Declarations for asm wrappers of profiler callbacks
EXTERN_C void __stdcall ProfileEnterNaked(FunctionIDOrClientID functionIDOrClientID);
EXTERN_C void __stdcall ProfileLeaveNaked(FunctionIDOrClientID functionIDOrClientID);
EXTERN_C void __stdcall ProfileTailcallNaked(FunctionIDOrClientID functionIDOrClientID);
so, when we try to do some profiling work (for example memory usage etc.) in the optimized version of coreclr we get inconsistent stack after calling one of these stubs on x86. We don't need now assembler variants of these stubs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to fix this in the JIT to use cdecl, so that we are using cdecl across the board on Linux x86? Or is there a good reason why this should be an outlier on Linux x86?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas Do you mean that I need to add #ifdef x86 && linux
with cdecl
declarations in the file src/vm/eetoprofinterfaceimpl.cc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete the stdcall
there and let it use the default (ie stdcall on Windows and cdecl on Linux). It is what we was done in number of places during Linux x86 bring up.
The key place to get in sync is the JIT though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the rationale was to use cdecl on Linux so I don't know what advantage we'd get by switching it (other than the knowledge we are consistent). From the standpoint of codesize, changing this to cdecl would require adding a pop instruction to the prologue of every managed method so probably a net performance loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a long discussion and stream of fixes about this during the Linux x86 bring up. Here are a few related PRs:
The summary is:
- ESP is 16-byte aligned on Linux x86. It makes use of stdcall awkward and inefficient. For example, when we were still using stdcall, the JIT ended up generating code like this:
sub esp, 12 // necessary to maintain stack alignment
push <argument>
call method
add esp, 12
- cdecl is the native calling convention on Linux x86. We have found that stdcall is handled poorly throughout the system. For example, unwinders do not understand it, the C/C++ compiler does not generate good code for it - it calls the method as cdecl and then compensates for callee poped arguments like this:
call method_with_stdcall_convention
sub esp, <size of arguments>
It is true that stdcall code w/o the stack alignment requirement can be smaller. However, stdcall is one of the reasons why x86 is different from most other platforms out there. I guess that the Linux folks decided long time ago that it is not worth the on-going pain, it is better to take a small hit and just use cdecl for simplicity.
We have not optimized the JIT for cdecl. The focus of Linux x86 port has been functionality, not performance so far. The JIT does not generate as good code as it can on Linux x86. https://github.com/dotnet/coreclr/issues/10012 is the key part of that.
cc @noahfalk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the JIT need adjustment as well?
@jkotas We've checked this fix with our profiler on x86 device emulator for tizen. It works well now. |
It may work by accident. We should check that he JIT generates correct code with the right calling convention and stack alignment. Could you please share a Linux x86 disassembly for a simple managed method with these call backs? |
Thanks for all the info Jan!
Do you mean libunwind simply fails to unwind from a __stdcall callee to caller in all circumstances (and thus lldb/gdb fail to create stacktraces) or something less dire? If you aren't sure we can do some testing so no worries. The existing API on windows x86 expects the callee to preserve all registers, including the registers __stdcall/__cdecl would normally be allowed to trash. We expect that a profiler writer will need to hand author assembly code to achieve this. There are some older and simpler ELT variants for profiler writers that care more about ease-of-use than performance, so if the profiler author is using ELT3 callbacks they probably value not having an inefficient ABI. If we aren't ruining the ability to debug/profile inside these callbacks I remain interested to declare the ABI is callee popped register, non-16 byte aligned. On the other hand if we are ruining the ability to debug it then I'd agree with you: caller popped, 16 byte aligned stack.
Makes sense. This ABI is part of a public contract with profilers so once set I don't expect we'd ever change it. If we conclude we don't have time to implement the right long term solution I think the short term move would be to leave the feature disabled. The __stdcall-like API appears to be what we've got now in the JIT so implementation-wise its easy whereas switching to the __cdecl-like convention would take some work. |
src/vm/eetoprofinterfaceimpl.cpp
Outdated
EXTERN_C void __stdcall ProfileTailcallNaked(FunctionIDOrClientID functionIDOrClientID); | ||
EXTERN_C void ProfileEnterNaked(FunctionIDOrClientID functionIDOrClientID); | ||
EXTERN_C void ProfileLeaveNaked(FunctionIDOrClientID functionIDOrClientID); | ||
EXTERN_C void ProfileTailcallNaked(FunctionIDOrClientID functionIDOrClientID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did these just become __cdecl on windows x86 now? Sorry I don't recall what the windows compiler emits when you aren't specific. (on windows they must be __stdcall for correctness with existing profilers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has no impact on Windows. We compile with stdcall as the default convention on Windows. Many places depend on this default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on adding a comment above these declarations that the default calling convention is used to dictate these? Something along the lines of:
// The calling convention should not be set explicitly for these callbacks. The calling convention is defined implicitly by the default set during compilation (i.e. Windows => stdcall).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure we've reached an agreement on what the ABI should be before this goes ahead. If there is any short term need that I am holding up do let me know - I don't want to be needlessly slowing things down)
It depends on the shape of the callsite:
Note that this is more efficient only if the callbacks are 100% assembly. If these callbacks ever call into C/C++ code, they have to re-align the stack at 16-bytes. Re-aligning mistaligned stack is more severe efficiency hit than having the stack aligned by the ABI. |
@jkotas Sorry, I was wrong. It does not work, fails with SigSegv after executing the first call of there are some fragments of disassembled code
may be this is wrong
|
OK then I'm on board for __cdecl-style calling convention. Sounds like we'll need to work in the JIT, in the asm stubs, and in the documentation to get this working E2E. Thanks! |
The callbacks callsites in the JIT should look like this:
@sergign60 Could you please fix the JIT accordingly? |
@jkotas To be honest, I don't quite understand your code fragment. Could you please add some explanation? Should I delete this instruction before your code?
Thanks in advance |
I have added a few comments.
This instruction belongs to the previous call it should stay there. The code emitted for Linux x86 currently is not very efficient. It follows the right calling convention, but it has a lot of unnecessary |
@jkotas Thanks! I've found in codegenxarch.cpp |
Yes, it is the idea. The prolog/epilog that the profiler callbacks are part of are emitted directly as instructions. You may just emit |
@jkotas please review. Unfortunately it's still not working. I'm trying to understand, why |
The change looks good to me. I do not see anything obviously wrong in it. |
@jkotas it fails now in
because of |
Try adjusting |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@jkotas Just now I've found two places, where
Could you help me with this question:
Thanks in advance! |
Would it work to use |
@jkotas As I see
|
@jkotas @Dmitri-Botcharnikov @alpencolt clang5.0 emits the following error on x86
I guess that it's because we have in src/pal/prebuilt/inc/coreprof.h
|
You can change these |
Will close & reopen to pick up current CI definitions |
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test please |
@BruceForstall You're absolutely right. Very much for your help! |
This looks wrong. Why did you use |
@BruceForstall I saw it in
https://github.com/dotnet/coreclr/blob/master/src/jit/codegencommon.cpp#L2446 your method isn't working and gives just this assert (when I use With my fix tests run successfully |
@sergign60 Check out #19700. I tweaked your change to include my suggestion, and it seems to work for me now. (I can't see how to get GitHub to just show me the difference between your last change and mine.) Note that I found that the assert you were encountering was far too generous for Linux/x86 due to misunderstanding between count of bytes and count of ints, so I corrected that as well. It's possible that will lead to new asserts that would need to be investigated. |
@BruceForstall I've checked your variant #19700 with |
@BruceForstall I've added your changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if all the tests pass
src/inc/corprof.idl
Outdated
@@ -125,6 +125,12 @@ import "wtypes.idl"; | |||
import "unknwn.idl"; | |||
#endif | |||
|
|||
#ifdef PLATFORM_UNIX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas Does this section need to be under defined(_TARGET_X86_) && defined(PLATFORM_UNIX)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be just:
#define STDMETHODCALLTYPE
without any ifdefs to make MIDL compiler happy. Defines in .idl
files work in weird way.
@@ -438,50 +438,50 @@ typedef struct _COR_PRF_METHOD | |||
mdMethodDef methodId; | |||
} COR_PRF_METHOD; | |||
|
|||
typedef void __stdcall __stdcall FunctionEnter( | |||
typedef void STDMETHODCALLTYPE STDMETHODCALLTYPE FunctionEnter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MIDL compiler is duplicating this for some reason. There is nothing to fix.
385b8ff
to
5eea3c9
Compare
@BruceForstall @jkotas is it ok? |
Thanks |
cc: @janvorli @jkotas @Dmitri-Botcharnikov @alpencolt
Please review