Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flutter test regression caused by copying field guard information to the Slot #47722

Closed
aam opened this issue Nov 17, 2021 · 12 comments
Closed
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@aam
Copy link
Contributor

aam commented Nov 17, 2021

Change that put field guard information on a slot(https://dart-review.googlesource.com/c/sdk/+/218987) resulted in flutter test failures in g3 b/205961373.

@aam aam self-assigned this Nov 17, 2021
@aam aam added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Nov 17, 2021
copybara-service bot pushed a commit that referenced this issue Nov 17, 2021
…guard information that is kept on a slot."

This reverts commit 67baa58 as it
breaks internal test b/205961373.

Addresses #47722

TEST=ci

Change-Id: Idc4a08a0a8cca2faf287a42eab8ff6af43f4e6c6
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/220484
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Slava Egorov <vegorov@google.com>
@a-siva
Copy link
Contributor

a-siva commented Nov 17, 2021

I presume the cherry pick request for this #47710 needs to be withdrawn as the change has been reverted.

@aam
Copy link
Contributor Author

aam commented Nov 22, 2021

I think one issue with original change was that it relied on the fact that field's guard info can be captured on the Slot and used throughout compilation pipeline consistently.
However https://github.com/dart-lang/sdk/blob/main/runtime/vm/compiler/jit/jit_call_specializer.cc#L209 could disable field unboxing by updating (cloned) field guard info directly on the (clone) field during compilation of StoreInstanceField instruction.
That makes slot information outdated, results in optimized code incorrectly assuming that field is still unboxed.

Here is a flow graph for the scenario in question(with StoreInstanceField(v2 . _opacity@598160605 = v3 T{_Double}) being an instruction where _opacity field goes from unboxed to boxed)

Shell: Canonicalize
Shell: ==== package:flutter/src/rendering/proxy_box.dart_RenderOpacity_set_opacity (SetterFunction)
Shell: B0[graph]:0 {
Shell:       v0 <- Constant(#null) T{Null?}
Shell:       v1 <- Constant(#<optimized out>) T{_OneByteString}
Shell:       v10 <- Constant(#true) T{bool}
Shell:       v11 <- Constant(#29717) T{_Smi}
Shell:       v12 <- Constant(#29730) T{_Smi}
Shell:       v15 <- Constant(#0.0) T{_Double}
Shell:       v18 <- Constant(#false) T{bool}
Shell:       v23 <- Constant(#29744) T{_Smi}
Shell:       v24 <- Constant(#29772) T{_Smi}
Shell:       v35 <- Constant(#0) T{_Smi}
Shell:       v58 <- Constant(#1.0) T{_Double}
Shell: }
Shell: B1[function entry]:2 {
Shell:       v2 <- Parameter(0) T{RenderOpacity}
Shell:       v3 <- Parameter(1) T{double}
Shell: }
Shell:     CheckStackOverflow:8(stack=0, loop=0)
Shell:     v4 <- StrictCompare:14(!==, v3, v0) T{bool}
Shell:     v8 <- StaticCall:18( _evaluateAssertion@0150898<0> v4) T{*?}
Shell:     AssertBoolean:20(v8)
Shell:     Branch if StrictCompare:22(===, v8 T{bool}, v10) goto (3, 4)
Shell: B3[target]:26
Shell:     v16 <- RelationalOp:36(>=, v3, v15) T{bool}
Shell:     Branch if StrictCompare:40(===, v16 T{bool}, v10) goto (6, 7)
Shell: B6[target]:44
Shell:     v59 <- RelationalOp:46(<=, v3, v58) T{bool}
Shell:     goto:50 B5
Shell: B7[target]:52
Shell:     goto:54 B5
Shell: B5[join]:34 pred(B6, B7) {
Shell:       v19 <- phi(v59 T{bool}, v18) alive T{bool}
Shell: }
Shell:     v21 <- StaticCall:56( _evaluateAssertion@0150898<0> v19) T{*?}
Shell:     AssertBoolean:58(v21)
Shell:     Branch if StrictCompare:60(===, v21 T{bool}, v10) goto (8, 9)
Shell: B8[target]:64
Shell:     v27 <- LoadField(v2 . _opacity@598160605) T{_Double}
Shell:     v29 <- EqualityCompare:74(v27 == v3) T{bool}
Shell:     Branch if StrictCompare:78(===, v29 T{bool}, v10) goto (10, 11)
Shell: B10[target]:82
Shell:     Return:86(v0)
Shell: B11[target]:88
Shell:     v31 <- StaticCall:90( get:alwaysNeedsCompositing<0> v2) T{bool}
Shell:     v33 <- LoadField(v2 . _alpha@598160605) T{_Smi}
Shell:     v36 <- EqualityCompare:94(v33 != v35) T{bool}
Shell:     GuardFieldClass:98(_opacity@598160605 <not-nullable _Double@0150898>, v3)
Shell:     StoreInstanceField(v2 . _opacity@598160605 = v3 T{_Double})
Shell:     v40 <- LoadField(v2 . _opacity@598160605) T{_Double}
Shell:     v42 <- StaticCall:102( getAlphaFromOpacity<0> v40) T{int}
Shell:     GuardFieldClass:104(_alpha@598160605 <not-nullable _Smi@0150898>, v42)
Shell:     StoreInstanceField(v2 . _alpha@598160605 = v42 T{_Smi})
Shell:     v44 <- StaticCall:106( get:alwaysNeedsCompositing<0> v2) T{bool}
Shell:     v79 <- StrictCompare:10(===, v31, v44) T{bool}
Shell:     AssertBoolean:110(v79)
Shell:     Branch if StrictCompare:112(!==, v79 T{bool}, v10) goto (12, 13)
Shell: B12[target]:116
Shell:     StaticCall:118( markNeedsCompositingBitsUpdate<0> v2, using unchecked entrypoint)
Shell:     goto:124 B14
Shell: B13[target]:120
Shell:     goto:126 B14
Shell: B14[join]:122 pred(B12, B13)
Shell:     StaticCall:128( markNeedsPaint<0> v2, using unchecked entrypoint)
Shell:     v48 <- LoadField(v2 . _alpha@598160605) T{_Smi}
Shell:     CheckSmi:132(v48)
Shell:     CheckSmi:132(v35)
Shell:     v50 <- EqualityCompare:132(v48 == v35) T{bool}
Shell:     AssertBoolean:134(v50)
Shell:     v52 <- BooleanNegate(v50 T{bool}) T{bool}
Shell:     v69 <- StrictCompare:10(===, v36, v52) T{bool}
Shell:     AssertBoolean:138(v69)
Shell:     Branch if StrictCompare:140(!==, v69 T{bool}, v10) goto (15, 18)
Shell: B15[target]:152
Shell:     v56 <- StaticCall:144( get:alwaysIncludeSemantics<0> v2) T{bool}
Shell:     AssertBoolean:146(v56)
Shell:     Branch if StrictCompare:148(!==, v56 T{bool}, v10) goto (16, 19)
Shell: B16[target]:154
Shell:     StaticCall:156( markNeedsSemanticsUpdate<0> v2, using unchecked entrypoint)
Shell:     goto:170 B20
Shell: B19[target]:164
Shell:     goto:166 B17
Shell: B18[target]:160
Shell:     goto:162 B17
Shell: B17[join]:158 pred(B18, B19)
Shell:     goto:172 B20
Shell: B20[join]:168 pred(B16, B17)
Shell:     DebugStepCheck:174()
Shell:     Return:176(v0)
Shell: B9[target]:66
Shell:     v25 <- StaticCall:68( _throwNew@0150898<0> v23, v24, v0) T{*?}
Shell:     Throw:70(v25)
Shell: B4[target]:28
Shell:     v13 <- StaticCall:30( _throwNew@0150898<0> v11, v12, v0) T{*?}
Shell:     Throw:32(v13)
Shell: *** END CFG

Perhaps use of Slot's FieldGuardInfo need to be limited to unopt compilation pipeline.

cc @mkustermann

@mkustermann
Copy link
Member

@aam If the compiler executes Field::field.DisableFieldUnboxing() which will change the original field and the clone the Slot will stay the same. It does so only in optimizing compilations. When trying to install code the optimizing compiler should validate that the assumptions it used during compilation are still valid (compare field guard clone with original).

If the compiler now doesn't rely on information from the clone but exclusively on information from the Slot, it should validate it's assumptions it made in Slot by comparing it with the original Field. (similar to the existing Field::IsConsistentWith). (mentioned that on the cl/218987 - so it seems that the CL actually made the optimizing compiler depend on the Slot for unboxing bit, so it would need to address that comment already).

=> If we do this, the effect would be that after running the compiler, which disables field unboxing, the code installation check would discard the compilation due to relying on outdated information. It does seem a little safer than having some compiler passes see the unboxed bit and later optimizing passes suddendly see the boxed bit.

@aam
Copy link
Contributor Author

aam commented Nov 22, 2021

=> If we do this, the effect would be that after running the compiler, which disables field unboxing, the code installation check would discard the compilation due to relying on outdated information.

Code I pointed out above(https://github.com/dart-lang/sdk/blob/main/runtime/vm/compiler/jit/jit_call_specializer.cc#L209) updates both clone and original fields. That allows current optimizing compiler to install new optimized code it produces.
If we add Slot check to Field::IsConsistentWith to https://dart-review.googlesource.com/c/sdk/+/218987, that would result in dropping what optimized compiler is producing now.

@mkustermann
Copy link
Member

If we add Slot check to Field::IsConsistentWith to https://dart-review.googlesource.com/c/sdk/+/218987, that would result
in dropping what optimized compiler is producing now.

That is correct, it would drop the optimized compilation.
One could also eagerly bailout of the compilation - as we originally did (changed in cl/182669) - to minimize the wasted time.

About the suggestion on either using Slot* (cl/221025) or mutable in const Slot (cl/221200): The Slot should IMHO be immutable. So the compiler sees a consistent world throughout the compilation. Allowing modifying the Slot halfway through optimization - and therefore making it give different answers in later optimization passes seems iffy to me (even if it would be benign in this case). So I'd pick neither of those.

/cc @mraleph @alexmarkov opinions?

@alexmarkov
Copy link
Contributor

Dropping the optimized compilation when the heuristic for boxing fields triggers might hurt performance much more than just not having this heuristic.

To summarize the current state:

  • In JIT mode compiler has a heuristic which flips unboxed fields back to boxed. The change in boxing is performed early during optimization passes and both original Field and clone are updated.
  • This means that field guard state changes during compilation and it works fine.
  • Now we attached field guard state to a Slot which is const throughout the compilation.
  • As field guard state is a new addition to a Slot, and it needs to change during compilation to support the heuristic, so it needs to be mutable. We can express this with mutable, const_cast, or put FieldGuardState* to a Slot instead of FieldGuardState. We can also make Slot and FieldGuardState entirely separate and not referencing each other.

We have a trade-off between having this heuristic and better (simpler) compiler architecture. I think we should investigate if this heuristic for boxing fields bears its weight. Does it provide performance on golem benchmarks including dart2js and analyzer? Without this heuristic the overall architecture would be simpler (FieldGuardState will be immutable).

If we want to keep this heuristic but make field guard state immutable throughout compilation, then we can also consider moving the heuristic out of compilation somewhere (check it before compilation / before Slot is created for the first time / some place in runtime outside of compiler / etc).

@mraleph
Copy link
Member

mraleph commented Nov 25, 2021

I really don't like this heuristic - it is only there to improve performance on the code which mostly reads from double or simd fields and does not write to them as much - and does reading from unoptimized code. Unboxing such fields leads to excessive allocation. I highly doubt it improves performance of anything in our corpus but a bunch of benchmarks. Though there might be users out there which would see regressed performance if we disable this without implementing interprocedural unboxing in JIT mode.

I think the way to fix this is the following: we should move this heuristic to the point where the first canonical slot is created for the given field. We make decision - disable unboxing on the original field if necessary then create a slot out of it. This way slot itself is immutable and heuristic still exists.

For future safety we should also add a consistency check at the end of compilation (after making decision to commit) which would trigger if slots got out of sync with fields - they should not.

@aam
Copy link
Contributor Author

aam commented Nov 25, 2021

With heuristic turned off(so that in jit unboxed fields never go boxed) I don't see significant regression on perf benchmarks. Only thing that stands out is NBodySIMD improvement of over 30%

diff --git a/runtime/vm/compiler/jit/jit_call_specializer.cc b/runtime/vm/compiler/jit/jit_call_specializer.cc
index 191e6bb3e9e..414fd46fe15 100644
--- a/runtime/vm/compiler/jit/jit_call_specializer.cc
+++ b/runtime/vm/compiler/jit/jit_call_specializer.cc
@@ -165,51 +165,8 @@ void JitCallSpecializer::VisitInstanceCall(InstanceCallInstr* instr) {
 void JitCallSpecializer::VisitStoreInstanceField(
     StoreInstanceFieldInstr* instr) {
   if (instr->IsUnboxedDartFieldStore()) {
-    // Determine if this field should be unboxed based on the usage of getter
-    // and setter functions: The heuristic requires that the setter has a
-    // usage count of at least 1/kGetterSetterRatio of the getter usage count.
-    // This is to avoid unboxing fields where the setter is never or rarely
-    // executed.
     const Field& field = instr->slot().field();
-    const String& field_name = String::Handle(Z, field.name());
-    const Class& owner = Class::Handle(Z, field.Owner());
-    const Function& getter =
-        Function::Handle(Z, owner.LookupGetterFunction(field_name));
-    const Function& setter =
-        Function::Handle(Z, owner.LookupSetterFunction(field_name));
-    bool unboxed_field = false;
-    if (!getter.IsNull() && !setter.IsNull()) {
-      if (field.is_double_initialized()) {
-        unboxed_field = true;
-      } else if ((setter.usage_counter() > 0) &&
-                 ((FLAG_getter_setter_ratio * setter.usage_counter()) >=
-                  getter.usage_counter())) {
-        unboxed_field = true;
-      }
-    }
-    if (!unboxed_field) {
-      if (FLAG_trace_optimization || FLAG_trace_field_guards) {
-        THR_Print("Disabling unboxing of %s\n", field.ToCString());
-        if (!setter.IsNull()) {
-          THR_Print("  setter usage count: %" Pd "\n", setter.usage_counter());
-        }
-        if (!getter.IsNull()) {
-          THR_Print("  getter usage count: %" Pd "\n", getter.usage_counter());
-        }
-      }
-      // We determined it's not beneficial for performance to unbox the
-      // field, therefore we mark it as boxed here.
-      //
-      // Calling `DisableFieldUnboxing` will cause transition the field to
-      // boxed and deoptimize dependent code.
-      //
-      // NOTE: It will also, as a side-effect, change our field clone's
-      // `is_unboxing_candidate()` bit. So we assume the compiler has so far
-      // not relied on this bit.
-      field.DisableFieldUnboxing();
-    } else {
-      flow_graph()->parsed_function().AddToGuardedFields(&field);
-    }
+    flow_graph()->parsed_function().AddToGuardedFields(&field);
   }
 }

@mraleph
Copy link
Member

mraleph commented Nov 26, 2021

It can be that benchmarks that benefited from the heuristic were not ported from Dart 1 to Dart 2. (I am thinking it might have been Box2D).

@aam
Copy link
Contributor Author

aam commented Nov 29, 2021

It can be that benchmarks that benefited from the heuristic were not ported from Dart 1 to Dart 2. (I am thinking it might have been Box2D).

If I understand correctly Misc/Box2DOctane is what used to be Box2D and it doesn't show any change on removal of this heuristic on ia32 and x64, small regression on armv7(-0.1) and armv8(-0.3).

@mraleph
Copy link
Member

mraleph commented Nov 29, 2021

I'd say then maybe we should just rip this heuristic out. I can artificially construct the example where it causes a regression (574ms -> 699ms), but it is super contrived. Noticed that f has to be non-final (otherwise it just stays boxed anyway).

@pragma('vm:never-inline')
void foo(double x) {}

class X {
  double f;
  X(this.f);
}

@pragma('vm:never-inline')
void bar(X x) {
  X(1.9);
  foo(x.f);
}

void main() {
  for (var i = 0; i < 10; i++) {
    final sw = Stopwatch()..start();
    final x = X(0.1);
    for (int i = 0; i < 100000000; i++) bar(x);
    print(sw.elapsedMilliseconds);
  }
}

@aam
Copy link
Contributor Author

aam commented Nov 29, 2021

I'd say then maybe we should just rip this heuristic out.

https://dart-review.googlesource.com/c/sdk/+/221544

copybara-service bot pushed a commit that referenced this issue Nov 30, 2021
…k to boxed.

The heuristic was introduced in b870daf, but gets in the way of maintaining Slot(extended with field guard information) immutable during compilation.

Existing set of benchmarks doesn't seem to show any reasonable regression with this heuristic removed.

Bug: #47722
Change-Id: Ie94050d96852b7ffef2dc81248cda23e84d41f4b
TEST=ci
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/221544
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Alexander Aprelev <aam@google.com>
@aam aam closed this as completed Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

5 participants