-
Notifications
You must be signed in to change notification settings - Fork 5k
Write barrier without any RWX pages #114982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write barrier without any RWX pages #114982
Conversation
…rrier_WithoutCodegen
…east, somewhat close.
Tagging subscribers to this area: @mangod9 |
… If it does... I'll likely do this logic for all architectures
…as well. If it does... I'll likely do this logic for all architectures" This reverts commit a882fbf.
…_WRITE_WATCH_FOR_GC_HEAP
@@ -179,6 +179,10 @@ GLOBAL_LABEL RhpCheckedAssignRefAVLocation | |||
LEAF_END RhpCheckedAssignRef\EXPORT_REG_NAME, _TEXT | |||
.endm | |||
|
|||
LEAF_ENTRY RhpWriteBarriers, _TEXT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this would not work anyway... we compile all code on Apple platforms as .subsections_via_symbols
and the Apple linker is free to reorder the functions (or discard unused ones if told to). Compiling without subsections_via_symbols
is not an option due to various bugs in different linker versions. The only other option to keep the desired order is to mark the successive symbols as .alt_entry
, which also happens to run into linker bugs if not done extremely carefully...
… as well. If it does... I'll likely do this logic for all architectures" This reverts commit f5a07a7.
.intel_syntax noprefix | ||
#include "AsmMacros_Shared.h" | ||
|
||
// TODO! This is implemented, but not tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll test it since I already have the setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It passed some basic smoke tests. 👍🏿
#if defined(__APPLE__) | ||
// Currently the build is failing without this due to an issue if the first method in the assembly file has an alternate entry at the start of the file. | ||
// Fix, but adding an empty, unused method | ||
LEAF_ENTRY RhpWriteBarriersDoNotFailToBuild, _TEXT | ||
ret | ||
LEAF_END RhpWriteBarriersDoNotFailToBuild, _TEXT | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, this was reported as FB14743667 to Apple. It should be fixed in Xcode 16.3. We should still keep the workaround until we bump the minimum Xcode requirement past this version.
The general issue is that the new linker (ld-prime) didn't correctly handle combination of symbols at overlapping addresses and alt-entry symbols. There are some cases where section symbols are implicitly generated at the beginning of each section in object file so it's easy to run into this. Old linker (ld64) doesn’t handle alt-entry symbols at a start of a section so it's problematic there too.
/azp list |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
…e barrier when using the old write barriers
src/coreclr/vm/excep.cpp
Outdated
#endif | ||
|
||
#ifdef TARGET_ARM | ||
if ((writeBarrierAVLocations[i] | THUMB_CODE) == (uControlPc | THUMB_CODE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are one of these missing the THUMB bit? I would assume that both would have it set.
Anyways, we have PCODEToPINSTR(addr) to strip the thumb bit if it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bunch of unnecessary casts in this space. I'll get rid of them, and use PCODEToPINSTR
, and see if it all works. If not, the fix will be simple..
ASSERT(*(uint8_t*)writeBarrierAVLocations[i] != 0xE9); // jmp XXXXXXXX | ||
#endif | ||
|
||
if (writeBarrierAVLocations[i] == PCODEToPINSTR(uControlPc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still strange to me that the writeBarrierAVLocations[i] would not have the THUMB bit set when it is a pointer to thumb code. Are you sure it is really the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They really don't have it:
nm ./artifacts/obj/coreclr/linux.arm.Debug/vm/wks/CMakeFiles/cee_wks_core.dir/__/__/runtime/arm/WriteBarriers.S.o
00000000 t $t
00000001 T RhpAssignRef
00000004 T RhpAssignRefAVLocation
00000004 T RhpAssignRefAvLocationr1
00000001 T RhpAssignRefr1
00000111 T RhpByRefAssignRef
00000114 T RhpByRefAssignRefAVLocation1
00000116 T RhpByRefAssignRefAVLocation2
00000077 T RhpCheckedAssignRef
0000007a T RhpCheckedAssignRefAVLocation
0000007a T RhpCheckedAssignRefAvLocationr1
00000077 T RhpCheckedAssignRefr1
U __aeabi_unwind_cpp_pr0
U g_card_table
U g_ephemeral_high
U g_ephemeral_low
U g_highest_address
U g_lowest_address
U g_write_watch_table
I think it boils down to how ALTERNATE_ENTRY
is defined:
.macro PATCH_LABEL Name
.thumb_func
.global C_FUNC(\Name)
C_FUNC(\Name):
.endm
.macro ALTERNATE_ENTRY Name
.global C_FUNC(\Name)
.type \Name, %function
C_FUNC(\Name):
.endm
I think the assembler treats it as ARM function label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the macro doesn't have the .thumb_func
, that's why. I wonder if that's intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's somewhat intentional. The definition is used in NativeAOT and there we are comparing it to already stripped instruction pointer. I don't have any strong option on unifying it one way or the other but when I was working on the linux-arm NativeAOT port I thought it makes sense to represent CODE_LOCATION
as the actual memory location (ie. abstract out the Thumb weirdness early on)...
@@ -15,14 +24,14 @@ | |||
|
|||
LEAF_ENTRY RhpInterfaceDispatch\entries, _TEXT | |||
|
|||
// r11 currently contains the indirection cell address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess that this was intentionally done first so that the data dependency below gets satisfied first. Is this really needed with the workaround above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the value of having a specific memory ordering between loading the MethodTable pointer and loading the cache block. I think we can safely load those in any order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is functionally correct to load them in any order.
I meant that the existing order might have been chosen intentionally as a performance micro-optimization, so that [r11 + OFFSETOF__InterfaceDispatchCell__m_pCache]
indirection gets handled a bit sooner or be more likely to be parallelized with [rdi]
indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the order and it still produced correct symbols and unwind info on Xcode 16.3 on ARM64, at least in the object file: https://gist.github.com/filipnavara/bdd6dc77727fa68782bdb314a117f6d3.
That's why I asked for specific Xcode version so I could get to the bottom of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see the issue now in the linked .dylib:
[593] funcOffset=0x0065504C, encoding[ 6]=0x02000000 (no frame, no saved registers ) _RhpInterfaceDispatch2
[594] funcOffset=0x00655050, encoding[ 8]=0x00000000 (no frame, no saved registers ) _RhpInterfaceDispatchAVLocation2
[595] funcOffset=0x00655080, encoding[ 6]=0x02000000 (no frame, no saved registers ) _RhpInterfaceDispatch4
[596] funcOffset=0x00655084, encoding[ 8]=0x00000000 (no frame, no saved registers ) _RhpInterfaceDispatchAVLocation4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird. Both ld64 and ld-prime produce this. unwinddump
treats the has a bug where it's missing a case for the null opcode
as continuation of the previous entry (in this case it prints "no frame, no saved registers", but if I force some DWARF sequence it still shows the DWARF pointer for both entries)null opcode
on ARM64 (the bug is not present for other archs). llvm-libunwind doesn't seem to have any special treatment for the null opcode.
In summary, this really does look like a linker bug. I'll do a bit more poking and file Apple feedback if necessary. One possible workaround in this specific case is to start a new unwind info at the alt_entry location by issuing .cfi_endproc
in front of it and .cfi_startproc
behind it. Unfortunately that doesn't work for the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed with Apple as FB17568654
Repro code: Archive.zip
Let's see if they can provide some clarification or workarounds.
#include "AsmMacros_Shared.h" | ||
|
||
#if defined(__APPLE__) | ||
// Currently the build is failing without this due to an issue if the first method in the assembly file has an alternate entry at the start of the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not failing in NativeAOT build today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look at it, this probably isn't necessary in the amd64 write barrier since it wasn't complaining before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads me to wonder if maybe the issue on MacOS arm64 with having an alternate entry as the first symbol in the object file was that since our alternate entry macro doesn't set the thumb bit, the offset for the alternate entry was before that of the normal function symbol for the first symbol in the object file. Time for an experiment. Sigh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thumb bit is only relevant for arm32 so that should not matter here.
Also, the issue with first symbol in object file is known ld64 bug, see #114982 (comment).
@@ -261,6 +269,14 @@ LEAF_END RhpCheckedXchg, _TEXT | |||
LEAF_ENTRY RhpByRefAssignRef, _TEXT | |||
ALTERNATE_ENTRY RhpByRefAssignRefAVLocation1 | |||
mov rcx, [rsi] | |||
#ifdef TARGET_APPLE | |||
// Apple's linker has issues which break unwind info if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the manifestation of the bad unwind info?
If it is just when single stepping in lldb, do we care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manifestation is that if we AV on this location, we are unable to unwind to the managed frame, so we failfast the process instead of producing a NullReferenceException which can be caught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use the same manual uwnind routine in regular CoreCLR as we use in NAOT to avoid this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another approach where we don't use libunwind for unwinding these entries, instead we use our own unwinder which is special cased to know what to do. (This is what NativeAot does.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Unlike NativeAOT, CoreCLR implements the Apple compact unwinding through a modified copy of the llvm-libunwind code; libunwind should not even come into play for this since it's covered by the compact codes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was doing fixes of the osx-arm64 NativeAOT port I found a bug in llvm-libunwind comparison of the boundary comparison for IP addresses and function start/end (cb6fb60#diff-fa7a2ccbf98ccd844bfc21e48f2bc700137c2903cf97d487fa011e16b1f9b3c0).
I looked at the custom unwinding code in CoreCLR and there are similar comparisons that look sketchy:
runtime/src/coreclr/pal/src/exception/remote-unwind.cpp
Lines 1034 to 1037 in c201ad1
if (ip < funcStart || ip > funcEnd) { | |
ERROR("ip %p not in regular second level\n", (void*)ip); | |
return false; | |
} |
runtime/src/coreclr/pal/src/exception/remote-unwind.cpp
Lines 1064 to 1067 in c201ad1
if (ip < funcStart || ip > funcEnd) { | |
ERROR("ip %p not in compressed second level\n", (void*)ip); | |
return false; | |
} |
(in both cases the condition should presumably be
ip >= funcEnd
, not ip > funcEnd
; shouldn't really cause an error for alt_entry though..)
… if the Apple linker bug can be fixed that way
…rrier_WithoutCodegen
I've been working in a similar area in #111636. Essentially I'm making Arm64 use writebarriers in the same way as AMD64. I've just been rebasing off the top of your work. AIUI, with this PR, when If so, then I think we're good and I don't need to worry about my PR :) |
Add a new config switch DOTNET_UseGCWriteBarrierCopy to control whether or not to use a copy of the WriteBarrier code instead of assembly code embedded in the coreclr binary. Typically we do this to improve performance, but this change enables a new path where we just have the barrier as assembly code in the coreclr binary that does not require any RWX behavior. Also, with this change, the 1 copy of the code which is used for this scenario is shared between NativeAOT and coreclr.
This change will change the write barrier used in some situations on CoreCLR to be the same as the write barrier used on NativeAOT.
In addition, we now have a copy of NativeAOT variant of the write barrier for Linux X86, although the other NativeAOT support is not present.
Finally, the Linux Arm32 version of the NativeAOT write barrier could now be enabled to use software write watch, like CoreCLR does. This is not enabled in this change, although the code has been written into the write barrier (as it was needed for CoreCLR).