Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend simplified write barrier elimination to work inter-block (not just intra-block) #39164

Closed
mkustermann opened this issue Oct 29, 2019 · 7 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size vm-aot-code-size Related to improvements in AOT code size

Comments

@mkustermann
Copy link
Member

Right now our write-barrier elimination is very simple and works only inside one block:

static void WriteBarrierElimination(FlowGraph* flow_graph) {
  for (BlockIterator block_it = flow_graph->reverse_postorder_iterator();
       !block_it.Done(); block_it.Advance()) {
    BlockEntryInstr* block = block_it.Current();
    Definition* last_allocated = nullptr;
    for (ForwardInstructionIterator it(block); !it.Done(); it.Advance()) {
      Instruction* current = it.Current();
      if (!current->CanTriggerGC()) {
        if (StoreInstanceFieldInstr* instr = current->AsStoreInstanceField()) {
          if (instr->instance()->definition() == last_allocated) {
            instr->set_emit_store_barrier(kNoStoreBarrier);
          }
          continue;
        }
      }

      if (AllocationInstr* alloc = current->AsAllocation()) {
        if (alloc->WillAllocateNewOrRemembered()) {
          last_allocated = alloc;
          continue;
        }
      }

      if (current->CanTriggerGC()) {
        last_allocated = nullptr;
      }
    }
  }
}

COMPILER_PASS(WriteBarrierElimination,
              { WriteBarrierElimination(flow_graph); });

To eliminate more barriers, we might want to extend this to work across basic blocks.

/cc @mraleph

@mkustermann mkustermann added type-performance Issue relates to performance or code size vm-aot-code-size Related to improvements in AOT code size labels Oct 29, 2019
@devoncarew devoncarew added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Oct 29, 2019
@mkustermann
Copy link
Member Author

@sjindel-google The write barrier is a big code size contributor. This could be something to investigate. Please self-assign if you want to look into it.

@sjindel-google sjindel-google self-assigned this Jan 8, 2020
@sjindel-google
Copy link
Contributor

Besides looking across blocks, another impediment to write barrier elimination is instructions which can interrupt it because they might trigger GC (calls, allocations, boxes, etc.)

We've discussed two approaches to extending WBE across these kinds of instructions:

Reserved allocations

Several consecutive allocations are "batched", where the first allocation (or an instruction at the beginning) ensures there's enough room in new-space for the consecutive allocations. If there isn't it can (for example) trigger a new-space collection to make room.

For example:

Reserve(sizeof(A) + sizeof(B))
v0 <- Allocate(A)
v1 <- Allocate(B)
StoreInstanceField(v0.f = v1, no write barrier)
StoreInstanceField(v1.f = v0, no write barrier)

Add live temporaries to store buffer

We assume that any temporary defined by an allocation will be either in new space or in the store buffer until the next call. So, the IL would simply look like:

v0 <- Allocate(A)
v1 <- Allocate(B)  // Not a call.
StoreInstanceField(v0.f = v1, no write barrier)
StoreInstanceField(v1,f = v0, no write barrier)

After any collection completes, we scan the stack a second time, and add all live temporaries in frames immediately above an exit frame into the store barrier. This restores the original assumption.

This approach would allow us to remove the store barrier in more cases (for example, across Box instructions) and would not require code for the reserve operation. However, it could extend the lifetimes of some objects longer than necessary.

/cc @rmacnak-google @mraleph @alexmarkov What do you think?

@sjindel-google
Copy link
Contributor

sjindel-google commented Jan 14, 2020

Inter-block WBE (without either of the extensions above) produces 0.13% Flutter Gallery reduction on ARM64 and 0.15% on ARM.

@sjindel-google
Copy link
Contributor

Inter-block WBE with the "Add live temporaries to store buffer" approach reduces FG code size by 0.88% on ARM64 and 0.98% on ARM.

@rmacnak-google
Copy link
Contributor

"Reserved allocations" is similar to an optimization in V8 where some allocations are fused and the GC sees one larger allocation that the caller then divides up.

Overall I like "Add live temporaries to store buffer". Here I'm a bit worried about creating too much remembered set work, but I think if we exclude the allocation of arrays from the optimization, there shouldn't be too much extra to visit. Hopefully from a code size perspective the interesting allocations sites will be for small, fixed-size objects. Also, if in addition to adding these values to store buffer, we add them to the deferred marking queue, we might no longer need the EnsureIsNewOrRemembered logic at the end of the allocation stubs.

@sjindel-google
Copy link
Contributor

Inter-block WBE with the "Reserved allocations" approach (V8-style) will yield at most 0.42% on ARM64 and 0.47% on ARM.

The actual gain will be noticeably less because I didn't account for the extra instructions needed to perform the reserve operation.

@sjindel-google
Copy link
Contributor

If I exclude arrays from "Add live temporaries to store buffer", the code size gain is unchanged.

dart-bot pushed a commit that referenced this issue Feb 12, 2020
This reverts commit e5a2271.

Reason for revert: We are seeing some crashes on the bot see https://ci.chromium.org/p/dart/builders/ci.sandbox/vm-kernel-linux-debug-x64/8684

Original change's description:
> [vm] Aggressive write-barrier elimination.
> 
> Flutter Gallery code size total:
> -0.88% on ARM64
> -0.98% on ARM
> 
> Fixes #39164.
> 
> Change-Id: I8e65b1209ef1150bb73f3474506ebc97d7ab3685
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/132003
> Reviewed-by: Ryan Macnak <rmacnak@google.com>
> Commit-Queue: Samir Jindel <sjindel@google.com>

TBR=kustermann@google.com,rmacnak@google.com,sjindel@google.com

Change-Id: Ia93b43a423ee982aea550eb9fc24d34509db3dce
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/135421
Reviewed-by: Siva Annamalai <asiva@google.com>
Commit-Queue: Siva Annamalai <asiva@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size vm-aot-code-size Related to improvements in AOT code size
Projects
None yet
Development

No branches or pull requests

4 participants