New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

microbenchmarks velocity_tracker_iteration regression #31071

Closed
Hixie opened this Issue Oct 11, 2017 · 25 comments

Comments

Projects
None yet
9 participants
@Hixie
Contributor

Hixie commented Oct 11, 2017

microbenchmarks velocity_tracker_iteration regressed:
screen shot 2017-10-11 at 9 45 23 am

The Flutter engine roll in flutter/flutter#12400 contained two engine commits, flutter/engine@6388e92 and flutter/engine@f7685dd. The second one there is a Dart roll from 3d21a46 to 4b64a83.

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 11, 2017

Contributor

It also looks like velocity_tracker_iteration 10% regression originated at dart roll back on Aug, 16th flutter/flutter@2e57189. That is when it went up from ~3000ms to ~3300ms. It has been hovering within 1% of 3300ms goal mark since then
fd027naeloq

Contributor

aam commented Oct 11, 2017

It also looks like velocity_tracker_iteration 10% regression originated at dart roll back on Aug, 16th flutter/flutter@2e57189. That is when it went up from ~3000ms to ~3300ms. It has been hovering within 1% of 3300ms goal mark since then
fd027naeloq

@Hixie

This comment has been minimized.

Show comment
Hide comment
@Hixie

Hixie Oct 13, 2017

Contributor

That was #30521, which we decided was acceptable. This new regression is a separate issue.

We should probably revert the roll if it's not possible to quickly figure out the fix because otherwise it'll make it really hard to notice additional regressions that we get while we're trying to pin down this one.

Contributor

Hixie commented Oct 13, 2017

That was #30521, which we decided was acceptable. This new regression is a separate issue.

We should probably revert the roll if it's not possible to quickly figure out the fix because otherwise it'll make it really hard to notice additional regressions that we get while we're trying to pin down this one.

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 16, 2017

Contributor

Running bisect seems to point to 5efa08b, cc @rmacnak-google

flutter tools engine dart rev  rev timestamp run 1  run 2    run 3 run 4  run 5   run 6    run 7
dfa13a14f 6388e928801dece14033b9ca99421f473e1d2fd0 0bb41de 9/27 2:45:15 1257.02 1242.5 1241.76 1245.174 1251.01 1262.09 1237.95
dfa13a14f 6388e928801dece14033b9ca99421f473e1d2fd0 5efa08b 9/27 2:50:35 1286.63 1282.25 1281.21 1303.34 1299.88 1269.73 1282.29
Contributor

aam commented Oct 16, 2017

Running bisect seems to point to 5efa08b, cc @rmacnak-google

flutter tools engine dart rev  rev timestamp run 1  run 2    run 3 run 4  run 5   run 6    run 7
dfa13a14f 6388e928801dece14033b9ca99421f473e1d2fd0 0bb41de 9/27 2:45:15 1257.02 1242.5 1241.76 1245.174 1251.01 1262.09 1237.95
dfa13a14f 6388e928801dece14033b9ca99421f473e1d2fd0 5efa08b 9/27 2:50:35 1286.63 1282.25 1281.21 1303.34 1299.88 1269.73 1282.29
@rmacnak-google

This comment has been minimized.

Show comment
Hide comment
@rmacnak-google

rmacnak-google Oct 16, 2017

Member

Hm, I don't see significant change on my Nexus 4

before 3736.0 3744.8 3692.6 3708.0 3718.1
after 3710.8 3724.6 3691.6 3694.7 3858.1
(-0.43%)

and the top functions are the same in the profiler before and after.

The change should only affect hot reload; even the additional member in Field doesn't change the allocation size because it doesn't cross the next double-word boundary.

Member

rmacnak-google commented Oct 16, 2017

Hm, I don't see significant change on my Nexus 4

before 3736.0 3744.8 3692.6 3708.0 3718.1
after 3710.8 3724.6 3691.6 3694.7 3858.1
(-0.43%)

and the top functions are the same in the profiler before and after.

The change should only affect hot reload; even the additional member in Field doesn't change the allocation size because it doesn't cross the next double-word boundary.

@rmacnak-google

This comment has been minimized.

Show comment
Hide comment
@rmacnak-google

rmacnak-google Oct 16, 2017

Member

To clarify, I mean before and after 4b64a83. I'll try to see if I get a different commit when bisecting.

Member

rmacnak-google commented Oct 16, 2017

To clarify, I mean before and after 4b64a83. I'll try to see if I get a different commit when bisecting.

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 17, 2017

Contributor

Sorry, why 4b64a83 ? Did you mean 5efa08b ?

Contributor

aam commented Oct 17, 2017

Sorry, why 4b64a83 ? Did you mean 5efa08b ?

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 18, 2017

Contributor

From what I can see single line from 5efa08b that adds end_token_pos_ to RawField drops performance of the benchmark by about nine tenth of a percent. Considering that we need that field for better hot reload diagnostic not sure if anything can be done about this.

avg run1 run2 run3 run4 run5
1268.62 1258.52 1259.45 1278.44 1278.28 1268.41
1280.52 1272.04 1276.02 1282.94 1291.59 1280.03
diff --git a/runtime/vm/raw_object.h b/runtime/vm/raw_object.h
index fc4150b3f5..402064f184 100644
--- a/runtime/vm/raw_object.h
+++ b/runtime/vm/raw_object.h
@@ -1000,6 +1000,7 @@ class RawField : public RawObject {
   }
 
   TokenPosition token_pos_;
+  TokenPosition end_token_pos_;
   classid_t guarded_cid_;
   classid_t is_nullable_;  // kNullCid if field can contain null value and
                            // any other value otherwise.
Contributor

aam commented Oct 18, 2017

From what I can see single line from 5efa08b that adds end_token_pos_ to RawField drops performance of the benchmark by about nine tenth of a percent. Considering that we need that field for better hot reload diagnostic not sure if anything can be done about this.

avg run1 run2 run3 run4 run5
1268.62 1258.52 1259.45 1278.44 1278.28 1268.41
1280.52 1272.04 1276.02 1282.94 1291.59 1280.03
diff --git a/runtime/vm/raw_object.h b/runtime/vm/raw_object.h
index fc4150b3f5..402064f184 100644
--- a/runtime/vm/raw_object.h
+++ b/runtime/vm/raw_object.h
@@ -1000,6 +1000,7 @@ class RawField : public RawObject {
   }
 
   TokenPosition token_pos_;
+  TokenPosition end_token_pos_;
   classid_t guarded_cid_;
   classid_t is_nullable_;  // kNullCid if field can contain null value and
                            // any other value otherwise.
@Hixie

This comment has been minimized.

Show comment
Hide comment
@Hixie

Hixie Oct 18, 2017

Contributor

Can we exclude this (and maybe others!) in release builds, if it has such a large impact?

Contributor

Hixie commented Oct 18, 2017

Can we exclude this (and maybe others!) in release builds, if it has such a large impact?

@mraleph

This comment has been minimized.

Show comment
Hide comment
@mraleph

mraleph Oct 18, 2017

Contributor

We need to understand how adding a field to a Field that is neither accessed from generated code on the hot path nor scanned by the GC impacts the performance.

Have you looked at the profiles?

What happens on Golem when you submit your patch? Does any benchmarks there show similar regression?

Contributor

mraleph commented Oct 18, 2017

We need to understand how adding a field to a Field that is neither accessed from generated code on the hot path nor scanned by the GC impacts the performance.

Have you looked at the profiles?

What happens on Golem when you submit your patch? Does any benchmarks there show similar regression?

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 18, 2017

Contributor

@mraleph , no I haven't.
Regarding Golem - since 5efa08b is already submitted how do I check impact of adding just one line from that patch? Do I need to submit and compare two patches - one with full revert of 5efa08b and another with full revert less that line?

Contributor

aam commented Oct 18, 2017

@mraleph , no I haven't.
Regarding Golem - since 5efa08b is already submitted how do I check impact of adding just one line from that patch? Do I need to submit and compare two patches - one with full revert of 5efa08b and another with full revert less that line?

@mraleph

This comment has been minimized.

Show comment
Hide comment
@mraleph

mraleph Oct 18, 2017

Contributor

@aam I think two patch approach is the best bet. Alternatively you can submit patch with adding a single field against revision prior to 5efa08b

Contributor

mraleph commented Oct 18, 2017

@aam I think two patch approach is the best bet. Alternatively you can submit patch with adding a single field against revision prior to 5efa08b

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 18, 2017

Contributor

The patches are running on the dart-vm golem under rev 64071.

The profiles look very similar the one on the left is on 0bb41de("velocity_tracker_iteration": 1243.1414), the one on the right is 0bb41de with TokenPosition end_token_pos_; ("velocity_tracker_iteration": 1358.8443). Faster run has VelocityTracker.getVelocity under top _Float64List.[]. Not sure what to make of this, whether this relevant or not.

screenshot from 2017-10-18 12 53 42

Contributor

aam commented Oct 18, 2017

The patches are running on the dart-vm golem under rev 64071.

The profiles look very similar the one on the left is on 0bb41de("velocity_tracker_iteration": 1243.1414), the one on the right is 0bb41de with TokenPosition end_token_pos_; ("velocity_tracker_iteration": 1358.8443). Faster run has VelocityTracker.getVelocity under top _Float64List.[]. Not sure what to make of this, whether this relevant or not.

screenshot from 2017-10-18 12 53 42

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 19, 2017

Contributor
Contributor

aam commented Oct 19, 2017

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 19, 2017

Contributor

@alexmarkov I attached dumps of microbenchmark runs with --verbose-gc vm option.
It seems that at least number of gc were the same between slow and fast runs.

out-fast.txt
out-slow.txt

Contributor

aam commented Oct 19, 2017

@alexmarkov I attached dumps of microbenchmark runs with --verbose-gc vm option.
It seems that at least number of gc were the same between slow and fast runs.

out-fast.txt
out-slow.txt

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 19, 2017

Contributor

when running this benchmark in release mode, performance difference between two versions disappears(within margin of error).

  release run1 release run2
original("fast") 1109.3568 1128.546
with new field("slow") 1062.4998 1144.4327
Contributor

aam commented Oct 19, 2017

when running this benchmark in release mode, performance difference between two versions disappears(within margin of error).

  release run1 release run2
original("fast") 1109.3568 1128.546
with new field("slow") 1062.4998 1144.4327
@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 20, 2017

Contributor

So far we know that addition of end_token_pos_ expanded the size of RawField from 48 bytes to 52 bytes, that led to increase of size of GC heap for the test by 2k. The test running in aot mode does large number of collections(due to large number of temp floats objects being used in the code) - 5652 collections in 13 seconds of app runtime, taking roughly 1 second doing the collections.

There seems to be no performance issues with release mode. So the issue seems to be specific to allocation profiling that is done as part of profile mode. @alexmarkov mentioned he has some thoughts how to speed allocation profiling up.

It's not clear though whether it makes to continue to run performance benchmarks on profile build, perhaps instead we really should switch to release mode. There is even a bug for that already flutter/flutter#11885.

Contributor

aam commented Oct 20, 2017

So far we know that addition of end_token_pos_ expanded the size of RawField from 48 bytes to 52 bytes, that led to increase of size of GC heap for the test by 2k. The test running in aot mode does large number of collections(due to large number of temp floats objects being used in the code) - 5652 collections in 13 seconds of app runtime, taking roughly 1 second doing the collections.

There seems to be no performance issues with release mode. So the issue seems to be specific to allocation profiling that is done as part of profile mode. @alexmarkov mentioned he has some thoughts how to speed allocation profiling up.

It's not clear though whether it makes to continue to run performance benchmarks on profile build, perhaps instead we really should switch to release mode. There is even a bug for that already flutter/flutter#11885.

@Hixie

This comment has been minimized.

Show comment
Hide comment
@Hixie

Hixie Oct 20, 2017

Contributor

The point of profile mode is to test and debug performance issues, it would be sad if profile mode couldn't be used for benchmarks...

Contributor

Hixie commented Oct 20, 2017

The point of profile mode is to test and debug performance issues, it would be sad if profile mode couldn't be used for benchmarks...

@Hixie

This comment has been minimized.

Show comment
Hide comment
@Hixie

Hixie Oct 20, 2017

Contributor

(Having said that, please do feel free to address this by just moving the benchmarks to release mode.)

Contributor

Hixie commented Oct 20, 2017

(Having said that, please do feel free to address this by just moving the benchmarks to release mode.)

@alexmarkov

This comment has been minimized.

Show comment
Hide comment
@alexmarkov

alexmarkov Oct 20, 2017

Contributor

The change 841b1ad may offset this regression a little bit.

Still, we don't know the root cause of the regression for sure...

Contributor

alexmarkov commented Oct 20, 2017

The change 841b1ad may offset this regression a little bit.

Still, we don't know the root cause of the regression for sure...

@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 20, 2017

Contributor

@yjbanov https://github.com/flutter/flutter/blob/master/dev/devicelab/lib/tasks/microbenchmarks.dart#L36 suggests that switching away from profile to release will break running benchmarks on ios . Is that still a concern(do we, for example, run benchmarks on ios)?

Contributor

aam commented Oct 20, 2017

@yjbanov https://github.com/flutter/flutter/blob/master/dev/devicelab/lib/tasks/microbenchmarks.dart#L36 suggests that switching away from profile to release will break running benchmarks on ios . Is that still a concern(do we, for example, run benchmarks on ios)?

@yjbanov

This comment has been minimized.

Show comment
Hide comment
@yjbanov

yjbanov Oct 20, 2017

The iOS benchmarks were disabled temporarily. We should reenable them. The issue around our ability to run in release mode might have been resolved since then.

yjbanov commented Oct 20, 2017

The iOS benchmarks were disabled temporarily. We should reenable them. The issue around our ability to run in release mode might have been resolved since then.

@kasperl kasperl added the area-vm label Oct 25, 2017

@kasperl

This comment has been minimized.

Show comment
Hide comment
@kasperl

kasperl Oct 25, 2017

Contributor

Was this fixed through this engine roll?

https://github.com/flutter/engine/commit/59f7f5038d4af72b0b2d66944c7b4784b5b0b646
Contributor

kasperl commented Oct 25, 2017

Was this fixed through this engine roll?

https://github.com/flutter/engine/commit/59f7f5038d4af72b0b2d66944c7b4784b5b0b646
@aam

This comment has been minimized.

Show comment
Hide comment
@aam

aam Oct 25, 2017

Contributor

Yes, the benchmark was brought down back to green by @alexmarkov's 841b1ad.
Ideally we still follow-up with

  • moving the microbenchmarks to release;
  • confirm how slightly larger heap (due to 8 bytes bigger vm Field) contributed to regression.
Contributor

aam commented Oct 25, 2017

Yes, the benchmark was brought down back to green by @alexmarkov's 841b1ad.
Ideally we still follow-up with

  • moving the microbenchmarks to release;
  • confirm how slightly larger heap (due to 8 bytes bigger vm Field) contributed to regression.
@a-siva

This comment has been minimized.

Show comment
Hide comment
@a-siva

a-siva Dec 18, 2017

Contributor

Any updates on follow-ups proposed above, If not we can close the bug?

Contributor

a-siva commented Dec 18, 2017

Any updates on follow-ups proposed above, If not we can close the bug?

@yjbanov

This comment has been minimized.

Show comment
Hide comment
@yjbanov

yjbanov Dec 19, 2017

This has improved and now hovering around 3100, which is even better than before the regression. So I'm closing.

yjbanov commented Dec 19, 2017

This has improved and now hovering around 3100, which is even better than before the regression. So I'm closing.

@yjbanov yjbanov closed this Dec 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment