Extend simplified write barrier elimination to work inter-block (not just intra-block) #39164

mkustermann · 2019-10-29T14:29:38Z

Right now our write-barrier elimination is very simple and works only inside one block:

static void WriteBarrierElimination(FlowGraph* flow_graph) {
  for (BlockIterator block_it = flow_graph->reverse_postorder_iterator();
       !block_it.Done(); block_it.Advance()) {
    BlockEntryInstr* block = block_it.Current();
    Definition* last_allocated = nullptr;
    for (ForwardInstructionIterator it(block); !it.Done(); it.Advance()) {
      Instruction* current = it.Current();
      if (!current->CanTriggerGC()) {
        if (StoreInstanceFieldInstr* instr = current->AsStoreInstanceField()) {
          if (instr->instance()->definition() == last_allocated) {
            instr->set_emit_store_barrier(kNoStoreBarrier);
          }
          continue;
        }
      }

      if (AllocationInstr* alloc = current->AsAllocation()) {
        if (alloc->WillAllocateNewOrRemembered()) {
          last_allocated = alloc;
          continue;
        }
      }

      if (current->CanTriggerGC()) {
        last_allocated = nullptr;
      }
    }
  }
}

COMPILER_PASS(WriteBarrierElimination,
              { WriteBarrierElimination(flow_graph); });

To eliminate more barriers, we might want to extend this to work across basic blocks.

/cc @mraleph

mkustermann · 2020-01-07T19:13:00Z

@sjindel-google The write barrier is a big code size contributor. This could be something to investigate. Please self-assign if you want to look into it.

sjindel-google · 2020-01-13T16:31:04Z

Besides looking across blocks, another impediment to write barrier elimination is instructions which can interrupt it because they might trigger GC (calls, allocations, boxes, etc.)

We've discussed two approaches to extending WBE across these kinds of instructions:

Reserved allocations

Several consecutive allocations are "batched", where the first allocation (or an instruction at the beginning) ensures there's enough room in new-space for the consecutive allocations. If there isn't it can (for example) trigger a new-space collection to make room.

For example:

Reserve(sizeof(A) + sizeof(B))
v0 <- Allocate(A)
v1 <- Allocate(B)
StoreInstanceField(v0.f = v1, no write barrier)
StoreInstanceField(v1.f = v0, no write barrier)

Add live temporaries to store buffer

We assume that any temporary defined by an allocation will be either in new space or in the store buffer until the next call. So, the IL would simply look like:

v0 <- Allocate(A)
v1 <- Allocate(B)  // Not a call.
StoreInstanceField(v0.f = v1, no write barrier)
StoreInstanceField(v1,f = v0, no write barrier)

After any collection completes, we scan the stack a second time, and add all live temporaries in frames immediately above an exit frame into the store barrier. This restores the original assumption.

This approach would allow us to remove the store barrier in more cases (for example, across Box instructions) and would not require code for the reserve operation. However, it could extend the lifetimes of some objects longer than necessary.

/cc @rmacnak-google @mraleph @alexmarkov What do you think?

sjindel-google · 2020-01-14T17:48:09Z

Inter-block WBE (without either of the extensions above) produces 0.13% Flutter Gallery reduction on ARM64 and 0.15% on ARM.

sjindel-google · 2020-01-14T17:57:45Z

Inter-block WBE with the "Add live temporaries to store buffer" approach reduces FG code size by 0.88% on ARM64 and 0.98% on ARM.

rmacnak-google · 2020-01-15T01:24:32Z

"Reserved allocations" is similar to an optimization in V8 where some allocations are fused and the GC sees one larger allocation that the caller then divides up.

Overall I like "Add live temporaries to store buffer". Here I'm a bit worried about creating too much remembered set work, but I think if we exclude the allocation of arrays from the optimization, there shouldn't be too much extra to visit. Hopefully from a code size perspective the interesting allocations sites will be for small, fixed-size objects. Also, if in addition to adding these values to store buffer, we add them to the deferred marking queue, we might no longer need the EnsureIsNewOrRemembered logic at the end of the allocation stubs.

sjindel-google · 2020-01-15T12:56:58Z

Inter-block WBE with the "Reserved allocations" approach (V8-style) will yield at most 0.42% on ARM64 and 0.47% on ARM.

The actual gain will be noticeably less because I didn't account for the extra instructions needed to perform the reserve operation.

sjindel-google · 2020-01-15T13:37:19Z

If I exclude arrays from "Add live temporaries to store buffer", the code size gain is unchanged.

This reverts commit e5a2271. Reason for revert: We are seeing some crashes on the bot see https://ci.chromium.org/p/dart/builders/ci.sandbox/vm-kernel-linux-debug-x64/8684 Original change's description: > [vm] Aggressive write-barrier elimination. > > Flutter Gallery code size total: > -0.88% on ARM64 > -0.98% on ARM > > Fixes #39164. > > Change-Id: I8e65b1209ef1150bb73f3474506ebc97d7ab3685 > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/132003 > Reviewed-by: Ryan Macnak <rmacnak@google.com> > Commit-Queue: Samir Jindel <sjindel@google.com> TBR=kustermann@google.com,rmacnak@google.com,sjindel@google.com Change-Id: Ia93b43a423ee982aea550eb9fc24d34509db3dce No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/135421 Reviewed-by: Siva Annamalai <asiva@google.com> Commit-Queue: Siva Annamalai <asiva@google.com>

mkustermann added type-performance Issue relates to performance or code size vm-aot-code-size Related to improvements in AOT code size labels Oct 29, 2019

devoncarew added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Oct 29, 2019

sjindel-google self-assigned this Jan 8, 2020

dart-bot closed this as completed in e5a2271 Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend simplified write barrier elimination to work inter-block (not just intra-block) #39164

Extend simplified write barrier elimination to work inter-block (not just intra-block) #39164

mkustermann commented Oct 29, 2019

mkustermann commented Jan 7, 2020

sjindel-google commented Jan 13, 2020

sjindel-google commented Jan 14, 2020 •

edited

Loading

sjindel-google commented Jan 14, 2020

rmacnak-google commented Jan 15, 2020

sjindel-google commented Jan 15, 2020

sjindel-google commented Jan 15, 2020

Extend simplified write barrier elimination to work inter-block (not just intra-block) #39164

Extend simplified write barrier elimination to work inter-block (not just intra-block) #39164

Comments

mkustermann commented Oct 29, 2019

mkustermann commented Jan 7, 2020

sjindel-google commented Jan 13, 2020

Reserved allocations

Add live temporaries to store buffer

sjindel-google commented Jan 14, 2020 • edited Loading

sjindel-google commented Jan 14, 2020

rmacnak-google commented Jan 15, 2020

sjindel-google commented Jan 15, 2020

sjindel-google commented Jan 15, 2020

sjindel-google commented Jan 14, 2020 •

edited

Loading