From 19abd1e7a0dcdb5fa1fca532302b696135f9becc Mon Sep 17 00:00:00 2001 From: Daryl Maier Date: Mon, 28 Dec 2020 15:46:49 -0500 Subject: [PATCH] Contribute JIT write barrier documentation Describe the purpose and general operation of JIT write barriers for non-realtime garbage collection policies. Signed-off-by: Daryl Maier --- doc/compiler/JitWriteBarriers.md | 343 +++++++++++++++++++++++++++++++ 1 file changed, 343 insertions(+) create mode 100644 doc/compiler/JitWriteBarriers.md diff --git a/doc/compiler/JitWriteBarriers.md b/doc/compiler/JitWriteBarriers.md new file mode 100644 index 00000000000..403d7e1a76f --- /dev/null +++ b/doc/compiler/JitWriteBarriers.md @@ -0,0 +1,343 @@ + + +# JIT Write Barriers + +Storing object pointers into other objects and classes must be communicated to +the garbage collector (GC) so that these newly created reference chains can be +tracked by the various GC policies. This communication occurs at runtime through +a mechanism known as a *write barrier*. + +A good explanation of the mechanics and write barrier requirements of the +various GC policies in OpenJ9 can be found in this +[video](https://www.youtube.com/watch?v=Z2RUw8Clrec). + +A write barrier is required **after** the following situations: + +* The store of a reference to a field in an object (either static or an instance +field) +* The store of a reference into an array element via an `TR::ArrayStoreCHK` node +* An arraycopy of references via a 5-child `TR::arraycopy` node + +The JIT is required to insert a write barrier in the code it generates in these +circumstances for functional correctness. The simplest means of performing a +write barrier is to simply call out to the GC to perform the appropriate write +barrier action depending on the requirements of the GC policy in place. + +Runtime performance can be greatly improved if the JIT inlines part of the +operation of the write barrier. This document describes some of the checks that +can be inlined at runtime to improve performance. + +## Write Barrier Kinds + +The OpenJ9 VM supports a number of different write barriers depending on which +GC policy has been selected. All possible write barriers required by the GC are +defined in +[gc_include/j9modron.h](https://github.com/eclipse/openj9/blob/f0a4235ef0e2e6e1a8d476c4cf8dbf1ea1bc1cdf/runtime/gc_include/j9modron.h#L42-L51). + +In summary: + +| Write Barrier Kind | GC Policy and Description | +|:----|:----| +| j9gc_modron_wrtbar_none | `-Xgcpolicy:optthruput` : No write barrier is required | +| j9gc_modron_wrtbar_always | A write barrier is always required. The JIT always calls the GC write barrier helper for this case. | +| j9gc_modron_wrtbar_cardmark | `-Xgcpolicy:optavgpause` : For non-generational GC policies, a concurrent mark check is required. | +| j9gc_modron_wrtbar_cardmark_incremental | `-Xgcpolicy:balanced` : For region-based GCs, an inter-region barrier is required. | +| j9gc_modron_wrtbar_cardmark_and_oldcheck | `-Xgcpolicy:gencon` : For generational GCs with concurrent mark, a generational check and a concurrent mark check is required. | +| j9gc_modron_wrtbar_oldcheck | `-Xgcpolicy:gencon -Xconcurrentlevel0` : For generational GC policies, a generational check is required. Concurrent mark is not being used. | + +There are two additional barrier kinds required for realtime GC (i.e., +`-Xgcpolicy:metronome`) that are based on a "snapshot-at-the-beginning" or +(SATB) barrier that must precede the store. Metronome is only supported on x86 +Linux and Power AIX and the operation of those barriers is not described in this +document. + +## Write Barrier IL Nodes + +For the purposes of garbage collection, only stores of references into other +objects need to be tracked. These are represented with two IL opcodes: +`awrtbari` and `awrtbar`. + +`awrtbar` is for a direct store into another object (e.g., a store into a static +field) and has two children. The first child is the object being stored (or the +*source* object) and the second child is the object being stored into (or the +*destination* object). + +`awrtbari` is for an indirect store into another object (e.g., a store into an +instance field or an array) and has three children. The first child is the +address being written to, the second child is the object being stored (or the +*source* object) and the third child is the object being stored into (or the +*destination* object). + +## Inline Write Barrier Checks + +Most of the time, the operation performed by a write barrier to keep the GC +state consistent does not actually need to occur. Relatively inexpensive runtime +checks are required to determine that. Because the checks are performed inline +the performance of the write barrier is greatly improved over calling into the +GC each time to determine the correct course of action. The checks required +depend on the GC policy and other features of the GC that are enabled. + +### Nullness Check + +Consider a reference store `A.field = B`. A write barrier is not required if `B` +is provably null. This can be determined both at compile time via the +`isNonNull()` property on the source object node or at runtime with an explicit +nullness check. + +**Note:** the cost of performing a nullness check at runtime needs to be +considered per architecture as the overhead may be too high given the relative +infrequency of null stores. + +### Generational Check + +Generational collectors divide the heap into different spaces depending on the +age of the objects they contain: a nursery (or *new space*) where new objects +are allocated and live until they mature past a certain age, and a tenured space +(or *old space*) where long-lived objects reside. + +Based on the mechanics of the generational collector, if a new space object is +stored into an old space object then the old space object needs to be +"remembered" so that it can be scanned by the GC. + +In pseudocode: +``` + if (A is tenured) { + if (B is not tenured) { + remember(A) + } + } +``` + +There are bits in an object's header that can be queried to determine whether +the object resides in new space or old space. + +Typically, only the generational test is inlined and not the update to the +remembered set. This is acceptable in practice because the remembered set update +occurs relatively infrequently and can be done out of line by calling the +appropriate JIT write barrier helper. + +### Concurrent Mark Check + +Part of the marking phase for global garbage collects can be performed +concurrently with the application. In OpenJ9 this is accomplished with a special +concurrent mark thread that scans and marks live objects. However, if an object +`B` is stored into an object `A` that has already been scanned by the concurrent +mark thread then object `A` needs to be rescanned to ensure functional +correctness. + +To do this efficiently, the heap is partitioned into a number of *cards* of a +fixed size (e.g., 512 bytes) that are represented by a single byte in a *card +table* structure. The mapping of an object address to a particular card is done +with simple arithmetic and shifting on the object address. Rather than +remembering individual objects to rescan, the card representing the memory where +a destination object lives is "dirtied" at runtime (in practice, writing a `1` +into the card table). All objects that reside in dirtied cards are rescanned at +the end of the current concurrent mark cycle. + +This card dirtying only needs to occur if the concurrent mark thread is +currently active and marking objects. This can be tested efficiently at runtime +via flags from the `J9VMThread` structure. + +In the case of a generational collector, the check only applies if the +destination object `A` is not a nursery object. + +### Heap Object Check + +Unlike when interpreting a method, when a method is JIT compiled there is a +possibility that the escape analysis optimization will localize some object +allocations on the method's stack frame. Furthermore, there may also be some +situations where the object being stored to could be either a heap object or a +stack-allocated object that cannot be determined at compile-time. In these +circumstances, a runtime check for whether the object is on the heap must be +inserted because the write barrier checks do not apply to local objects. + +Checking whether an object is on the heap is done with an address comparison on +the heap base and size. In pseudocode: +``` + UDATA base = (UDATA)vmThread->omrVMThread->heapBaseForBarrierRange0; + UDATA objectDelta = (UDATA)object - base; + UDATA size = vmThread->omrVMThread->heapSizeForBarrierRange0; + if (objectDelta < size) { + // Heap object! + } +``` + +In practice, the heap base address and size may be constant at compile time. +Exploiting this could improve the performance of this check at runtime. These +properties can be queried on the `Options` class: +``` + isVariableHeapBaseForBarrierRange0() + isVariableHeapSizeForBarrierRange0() +``` +and if constant the values can be obtained via the `Options` queries: +``` + getHeapBaseForBarrierRange0() + getHeapSizeForBarrierRange0() +``` + +There are some additional properties that might be set on a write barrier node +when it can be definitively proven that an object is either a heap object or a +local object: + +* `isNonHeapObjectWrtBar()` : if `true` then provably a local object; `false` + implies that it cannot be proven to be a local object at compile time even + though it may be. +* `isHeapObjectWrtBar()` : if `true` then provably an object on the heap; + `false` implies that it cannot be proven to be a heap object at compile time + even though it may be. + +These properties should be consulted to determine at compile time if the heap +object checks can be eliminated or if the write barrier can be skipped entirely. + +## Write Barrier Pseudocode + +The ultimate reference for the operation of each write barrier kind is the +garbage collector code itself. These implementations can be found in +[gc_include/ObjectAccessBarrierAPI.hpp](https://github.com/eclipse/openj9/blob/master/runtime/gc_include/ObjectAccessBarrierAPI.hpp). + +The following pseudocode is derived largely from those implementations. + +### j9gc_modron_wrtbar_cardmark_and_oldcheck (gencon) +``` + // Check to see if object is old. If object is not old neither barrier is required + // if ((object - vmThread->omrVMThread->heapBaseForBarrierRange0) < vmThread->omrVMThread->heapSizeForBarrierRange0) then old + + UDATA base = (UDATA)vmThread->omrVMThread->heapBaseForBarrierRange0; + UDATA objectDelta = (UDATA)object - base; + UDATA size = vmThread->omrVMThread->heapSizeForBarrierRange0; + if (objectDelta < size) { + // object is old so check for concurrent barrier + if (0 != (vmThread->privateFlags & J9_PRIVATE_FLAGS_CONCURRENT_MARK_ACTIVE)) { + // concurrent barrier is enabled so dirty the card for object + + /* calculate the delta with in the card table */ + UDATA shiftedDelta = objectDelta >> CARD_SIZE_SHIFT; + + /* find the card and dirty it */ + U_8* card = (U_8*)vmThread->activeCardTableBase + shiftedDelta; + *card = (U_8)CARD_DIRTY; + } + + // generational barrier is required if object is old + // + rememberObject(vmThread, object); + } +``` + +### j9gc_modron_wrtbar_oldcheck (gencon -Xconcurrentlevel0) +``` + /* if value is NULL neither barrier is required */ + if (NULL != value) { + /* Check to see if object is old. + * if ((object - vmThread->omrVMThread->heapBaseForBarrierRange0) < vmThread->omrVMThread->heapSizeForBarrierRange0) then old + * + * Since card dirtying also requires this calculation remember these values + */ + UDATA base = (UDATA)vmThread->omrVMThread->heapBaseForBarrierRange0; + UDATA objectDelta = (UDATA)object - base; + UDATA size = vmThread->omrVMThread->heapSizeForBarrierRange0; + if (objectDelta < size) { + /* generational barrier is required if object is old and value is new */ + /* Check to see if value is new using the same optimization as above */ + UDATA valueDelta = (UDATA)value - base; + if (valueDelta >= size) { + /* value is in new space so do generational barrier */ + rememberObject(vmThread, object); + } + } + } +``` + +### j9gc_modron_wrtbar_cardmark_incremental (balanced) +``` + /* if value is NULL neither barrier is required */ + if (NULL != value) { + /* Check to see if object is within the barrier range. + * if ((object - vmThread->omrVMThread->heapBaseForBarrierRange0) < vmThread->omrVMThread->heapSizeForBarrierRange0) + * + * Since card dirtying also requires this calculation remember these values + */ + UDATA base = (UDATA)vmThread->omrVMThread->heapBaseForBarrierRange0; + UDATA objectDelta = (UDATA)object - base; + UDATA size = vmThread->omrVMThread->heapSizeForBarrierRange0; + if (objectDelta < size) { + /* Object is within the barrier range. Dirty the card */ + + /* calculate the delta with in the card table */ + UDATA shiftedDelta = objectDelta >> CARD_SIZE_SHIFT; + + /* find the card and dirty it */ + U_8* card = (U_8*)vmThread->activeCardTableBase + shiftedDelta; + *card = (U_8)CARD_DIRTY; + } + } +``` + +### j9gc_modron_wrtbar_cardmark (optavgpause) +``` + /* if value is NULL neither barrier is required */ + if (NULL != value) { + /* Check to see if object is old. + * if ((object - vmThread->omrVMThread->heapBaseForBarrierRange0) < vmThread->omrVMThread->heapSizeForBarrierRange0) then old + * + * Since card dirtying also requires this calculation remember these values + */ + UDATA base = (UDATA)vmThread->omrVMThread->heapBaseForBarrierRange0; + UDATA objectDelta = (UDATA)object - base; + UDATA size = vmThread->omrVMThread->heapSizeForBarrierRange0; + if (objectDelta < size) { + /* object is old so check for concurrent barrier */ + if (0 != (vmThread->privateFlags & J9_PRIVATE_FLAGS_CONCURRENT_MARK_ACTIVE)) { + /* concurrent barrier is enabled so dirty the card for object */ + + /* calculate the delta with in the card table */ + UDATA shiftedDelta = objectDelta >> CARD_SIZE_SHIFT; + + /* find the card and dirty it */ + U_8* card = (U_8*)vmThread->activeCardTableBase + shiftedDelta; + *card = (U_8)CARD_DIRTY; + } + } + } +``` + +## Additional Notes + +* Write barrier nodes may have the `skipWrtBar()` flag set which indicates that + the write barrier can be skipped. + +* Reference arraycopies use a batch write barrier at the end of the copy rather + than a barrier after each element copied. While there are separate "object + store" and "batch store" JIT write barrier helpers, in practice the inline + checks to perform are the same. + +* From an implementation perspective, there is obvious overlap between the + operations of the various write barrier kinds and hence sharing code between + kinds when it leads to a simpler, more maintainable design should be a goal to + strive for. + +* The `wrtbar` IL opcode usually exists as a tree top. However, an exception to + that are `ArrayStoreCHK` nodes where the `wrtbar` appears as a child node + representing the destination object store. This was done to improve the + synergy and code quality of the required checks for an array element store, + the element store itself, and any necessary write barrier.