-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Support storing Swift error register value to SwiftError struct #98586
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsAdds JIT support for storing the error result of a Swift call to the provided cc @jakobbotsch @kotlarmilos @jkoritzinsky
|
The Swift error handling tests are passing locally for me on win-x64. Here's a disassembly of the call to
@jakobbotsch is there a way to run the arm64_x64 JIT as an altjit, and get it to print its disasm on x64? I recall seeing you do something like this in a demo... |
src/coreclr/jit/codegenarmarch.cpp
Outdated
@@ -441,6 +441,12 @@ void CodeGen::genCodeForTreeNode(GenTree* treeNode) | |||
break; | |||
#endif // TARGET_ARM64 | |||
|
|||
#ifdef SWIFT_SUPPORT | |||
case GT_SWIFT_ERROR: | |||
treeNode->SetRegNum(REG_SWIFT_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be implemented similarly to genCodeForPhysReg
(in particular genProduceReg
needs to be called)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not wholly relevant, but is there a reason why the platform-agnostic implementation of genCodeForPhysReg
is in codegenxarch.cpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, seems like it could be in codegencommon.cpp or codegenlinear.cpp instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it in this PR if you think it makes sense to
Edit: Sorry never mind, my IDE's search function wasn't showing me the ARM-specific implementation; all the implementations seem to be in the right place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the two implementations are identical, so I still think they could be unified in a common file. But probably best to leave it alone since RISCV64/LA64 might break from such a refactoring.
src/coreclr/jit/codegenxarch.cpp
Outdated
@@ -2107,6 +2107,12 @@ void CodeGen::genCodeForTreeNode(GenTree* treeNode) | |||
case GT_NOP: | |||
break; | |||
|
|||
#ifdef SWIFT_SUPPORT | |||
case GT_SWIFT_ERROR: | |||
treeNode->SetRegNum(REG_SWIFT_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
src/coreclr/jit/importercalls.cpp
Outdated
#ifdef SWIFT_SUPPORT | ||
// We are importing an unmanaged Swift call, which might require special parameter handling: | ||
// - SwiftError* is passed to capture the address of an error thrown during the Swift call. | ||
if (call->AsCall()->unmgdCallConv == CorInfoCallConvExtension::Swift) | ||
{ | ||
// Check the signature of the Swift call for the special types. | ||
CORINFO_ARG_LIST_HANDLE sigArg = sig->args; | ||
|
||
for (unsigned short argIndex = 0; argIndex < sig->numArgs; | ||
sigArg = info.compCompHnd->getArgNext(sigArg), argIndex++) | ||
{ | ||
CORINFO_CLASS_HANDLE argClass; | ||
CorInfoType argType = strip(info.compCompHnd->getArgType(sig, sigArg, &argClass)); | ||
bool argIsByrefOrPtr = false; | ||
|
||
if ((argType == CORINFO_TYPE_BYREF) || (argType == CORINFO_TYPE_PTR)) | ||
{ | ||
argClass = info.compCompHnd->getArgClass(sig, sigArg); | ||
argType = info.compCompHnd->getChildType(argClass, &argClass); | ||
argIsByrefOrPtr = true; | ||
} | ||
|
||
impPopArgsForUnmanagedCall(call->AsCall(), sig); | ||
if ((argType != CORINFO_TYPE_VALUECLASS) || !info.compCompHnd->isIntrinsicType(argClass)) | ||
{ | ||
continue; | ||
} | ||
|
||
const char* namespaceName; | ||
const char* className = info.compCompHnd->getClassNameFromMetadata(argClass, &namespaceName); | ||
|
||
if ((strcmp(className, "SwiftError") == 0) && | ||
(strcmp(namespaceName, "System.Runtime.InteropServices.Swift") == 0)) | ||
{ | ||
// For error handling purposes, we expect a pointer to a SwiftError to be passed | ||
assert(argIsByrefOrPtr); | ||
if (swiftErrorIndex != -1) | ||
{ | ||
BADCODE("Duplicate SwiftError* parameter"); | ||
} | ||
|
||
swiftErrorIndex = argIndex; | ||
spillAllArgs = true; | ||
} | ||
// TODO: Handle SwiftSelf, SwiftAsync | ||
} | ||
} | ||
#endif // SWIFT_SUPPORT | ||
|
||
impPopArgsForUnmanagedCall(call->AsCall(), sig, spillAllArgs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to move all of this handling into impPopArgsForUnmanagedCall
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to do that initially, but I'm not sure what the best way is to communicate which CallArgs correspond to special Swift types back to impImportCall
. Are you ok with impPopArgsForUnmanagedCall
having some out params that are only used in the Swift case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that sounds just fine.
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
src/coreclr/jit/importercalls.cpp
Outdated
GenTree* argNodeCopy = gtNewLclvNode(argNode->AsLclVar()->GetLclNum(), argNode->TypeGet()); | ||
GenTreeStoreInd* swiftErrorStore = gtNewStoreIndNode(argNodeCopy->TypeGet(), argNodeCopy, errorRegNode); | ||
impAppendTree(swiftErrorStore, CHECK_SPILL_ALL, impCurStmtDI, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenTree* argNodeCopy = gtNewLclvNode(argNode->AsLclVar()->GetLclNum(), argNode->TypeGet()); | |
GenTreeStoreInd* swiftErrorStore = gtNewStoreIndNode(argNodeCopy->TypeGet(), argNodeCopy, errorRegNode); | |
impAppendTree(swiftErrorStore, CHECK_SPILL_ALL, impCurStmtDI, false); | |
GenTreeStoreInd* swiftErrorStore = gtNewStoreIndNode(argNodeCopy->TypeGet(), argNode, errorRegNode); | |
impAppendTree(swiftErrorStore, CHECK_SPILL_ALL, impCurStmtDI, false); | |
call->AsCall()->gtArgs.Remove(errorArg); |
// Indicate the error register will be checked after this call returns | ||
call->AsCall()->gtCallMoreFlags |= GTF_CALL_M_SWIFT_ERROR_HANDLING; | ||
} | ||
#endif // SWIFT_SUPPORT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps introduce a method for this just to avoid growing impImportCall
even more
// the register can be used again only if there is a GT_SWIFT_ERROR node to consume it | ||
// (i.e. the register's value is saved to a SwiftError) | ||
addRefsForPhysRegMask(RBM_SWIFT_ERROR, currentLoc, RefTypeKill, true); | ||
BuildDef(tree, RBM_SWIFT_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RBM_SWIFT_ERROR
as the candidate set here is strictly speaking not necessary, but it might produce better code (no change needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on how we could do this differently, and why this might produce better code, please? I got this from your async2 prototype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The candidate set here is saying that the GT_SWIFT_ERROR
node itself always needs r12
as its destination register. But in reality any assignment picked by LSRA is fine for the GT_SWIFT_ERROR
node's value -- it just means that we have to insert a copy from r12
into the register that LSRA picked when we generate the GT_SWIFT_ERROR
node (that's exactly what genCodeForPhysReg
is doing).
The codegen might be better because requiring LSRA to pick r12
means we can always skip that copy during codegen. Normally LSRA utilizes various preferencing heuristics to try to make this the case, even without specifying a "hard" requirement on the destination candidates, but nothing here is communicating this preference to LSRA. I think you could do that instead by:
BuildDef(tree)->getInterval()->mergeRegisterPreferences(RBM_SWIFT_ERROR);
It would be slightly more faithful to the actual constraints we're dealing with here, but I don't think it necessitates a change (in particular since we ensure that r12
is busy until the point of the def we know it's going to be available to assign).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation! I'll update the comment to clarify this decision.
killMask |= RBM_SWIFT_ERROR; | ||
} | ||
#endif // SWIFT_SUPPORT | ||
|
||
return killMask; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need something that marks the swift error register as busy-until-next-kill when we see get to the appropriate ref position during allocation itself. For async2 I did that by adding a busyUntilNextKill
bit on RefPosition
If you set
you should see arm64 disassembly. |
I'm still a bit worried that we're going to accidentally unmark |
Diff results for #98586Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-0.02% to +0.01%)
MinOpts (-0.04% to +0.04%)
FullOpts (-0.02% to +0.01%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.00% to +0.02%)
MinOpts (-0.02% to +0.03%)
FullOpts (+0.00% to +0.02%)
Details here Throughput diffs for linux/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.03% to +0.04%)
MinOpts (+0.03% to +0.04%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.07%)
FullOpts (+0.04% to +0.06%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.08%)
FullOpts (+0.03% to +0.06%)
Details here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Diff results for #98586Throughput diffsThroughput diffs for linux/x64 ran on linux/x64Overall (-0.00% to +0.02%)
MinOpts (-0.02% to +0.03%)
FullOpts (+0.00% to +0.02%)
Details here Throughput diffs for linux/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.03% to +0.04%)
MinOpts (+0.03% to +0.04%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.08%)
FullOpts (+0.03% to +0.06%)
Details here |
@jakobbotsch thanks for all the reviews! @kotlarmilos should we fix and enable the error handling tests in another PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LSRA changes LGTM except we should use setDelayFree()
which tracks the pendingDelayFree
bit.
src/coreclr/jit/lsrabuild.cpp
Outdated
RegRecord* swiftErrorRegRec = getRegisterRecord(REG_SWIFT_ERROR); | ||
RefPosition* pos = swiftErrorRegRec->lastRefPosition; | ||
assert(pos != nullptr); | ||
assert(pos->refType = RefTypeKill); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(pos->refType = RefTypeKill); | |
assert(pos->refType == RefTypeKill); |
src/coreclr/jit/lsraxarch.cpp
Outdated
// We could use RefTypeKill, but RefTypeFixedReg is used less commonly, so the check for delayRegFree | ||
// during register allocation should be cheaper in terms of TP. | ||
RefPosition* pos = newRefPosition(REG_SWIFT_ERROR, currentLoc, RefTypeFixedReg, call, RBM_SWIFT_ERROR); | ||
pos->delayRegFree = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pos->delayRegFree = true; | |
setDelayFree(pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise in lsraarmarch.cpp
I would try to add a Swift function that releases the memory and a corresponding P/Invoke declaration. If the issue persists, feel free to proceed and we will fix it in a subsequent PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I added a Swift method for deallocating the string. @kotlarmilos do you know how I can get the mangled name of this method on Windows? I'm afraid I don't have access to a Mac right now. |
Diff results for #98586Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-0.02% to +0.01%)
MinOpts (-0.04% to +0.04%)
FullOpts (-0.02% to +0.01%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.00% to +0.02%)
MinOpts (-0.02% to +0.03%)
FullOpts (+0.00% to +0.02%)
Details here Throughput diffs for linux/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.03% to +0.04%)
MinOpts (+0.03% to +0.04%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.08%)
FullOpts (+0.04% to +0.06%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.08%)
FullOpts (+0.03% to +0.06%)
Details here |
Diff results for #98586Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.03% to +0.04%)
MinOpts (+0.03% to +0.04%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.08%)
FullOpts (+0.04% to +0.06%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.08%)
FullOpts (+0.03% to +0.06%)
Details here |
@kotlarmilos thanks for getting the mangled method name to me offline. I've fixed the error handling test to deallocate the string on the Swift side, and I think it's enabled now -- I removed it from the exclude list in |
Diff results for #98586Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,554,585 contexts (1,019,526 MinOpts, 1,535,059 FullOpts). MISSED contexts: 172 (0.01%) Overall (+0 bytes)
FullOpts (+0 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,543,224 contexts (988,245 MinOpts, 1,554,979 FullOpts). MISSED contexts: 177 (0.01%) Overall (-3 bytes)
FullOpts (-3 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,317,543 contexts (945,402 MinOpts, 1,372,141 FullOpts). MISSED contexts: 170 (0.01%) Overall (+12 bytes)
FullOpts (+12 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,402,908 contexts (955,693 MinOpts, 1,447,215 FullOpts). MISSED contexts: 174 (0.01%) Overall (+0 bytes)
FullOpts (+0 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.03% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.03% to +0.04%)
MinOpts (+0.03% to +0.04%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.07%)
FullOpts (+0.04% to +0.06%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.08%)
FullOpts (+0.03% to +0.06%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.02% to +0.01%)
MinOpts (-0.04% to +0.04%)
FullOpts (-0.02% to +0.01%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.00% to +0.01%)
MinOpts (-0.02% to +0.03%)
FullOpts (+0.00% to +0.02%)
Details here |
Diff results for #98586Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,554,585 contexts (1,019,526 MinOpts, 1,535,059 FullOpts). MISSED contexts: 172 (0.01%) Overall (+0 bytes)
FullOpts (+0 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,543,224 contexts (988,245 MinOpts, 1,554,979 FullOpts). MISSED contexts: 177 (0.01%) Overall (-3 bytes)
FullOpts (-3 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,402,908 contexts (955,693 MinOpts, 1,447,215 FullOpts). MISSED contexts: 174 (0.01%) Overall (+0 bytes)
FullOpts (+0 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.03% to +0.05%)
MinOpts (+0.05% to +0.09%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.03% to +0.04%)
MinOpts (+0.03% to +0.04%)
FullOpts (+0.03% to +0.05%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.04% to +0.05%)
MinOpts (+0.05% to +0.08%)
FullOpts (+0.03% to +0.06%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.02% to +0.01%)
MinOpts (-0.04% to +0.04%)
FullOpts (-0.02% to +0.01%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.00% to +0.01%)
MinOpts (-0.02% to +0.03%)
FullOpts (+0.00% to +0.02%)
Details here |
Thanks! Please change the
|
I'm able to build and run the test with Native AOT, using |
If I search for the error handling tests in the Tests dashboard, I can see they ran on OS X x64/arm64 and passed, so I think the tests are good to go. (Sorry for the ping, Milos.) |
Adds JIT support for storing the error result of a Swift call to the provided
SwiftError
struct, assuming the caller passed aSwiftError*
argument. The LSRA changes assume the presence of aSwiftError*
argument in a Swift call indicates the error register may be trashed, and thus should be killed until consumed byGT_SWIFT_ERROR
, a new GenTree node for representing the value of the error register post-Swift call. Similarly, these changes also assume the lack of aSwiftError*
argument indicates the Swift call cannot throw, and thus will not trash the error register; thus, the Swift call should not block the register's usage.cc @jakobbotsch @kotlarmilos @jkoritzinsky