Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dart::GCCompactor makes search on pub.dev slow (3.0.0 regression) #52513

Closed
jonasfj opened this issue May 25, 2023 · 9 comments
Closed

dart::GCCompactor makes search on pub.dev slow (3.0.0 regression) #52513

jonasfj opened this issue May 25, 2023 · 9 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P3 A lower priority bug or feature request triaged Issue has been triaged by sub team type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@jonasfj
Copy link
Member

jonasfj commented May 25, 2023

We recently observed that if we do a loop that makes a search request to pub.dev every 100ms, the requests are fast (~60ms for a simple query).
But if we do the request every 1s instead, the requests are slow (> 1s).

The issue appears to have started after we upgraded from 2.19.6 to 3.0.0.
And debugging it locally, we also find that 2.19.6 doesn't suffer from this behavior.

Credits @mkustermann for tracing this to dart::GCCompactor::VisitPointers

Context: Our search process has a sizable amount of data in-memory and lots of Dart objects / Map / List / Set / String / etc.. So it's not surprising that compacting the heap could be slow. And each search does allocate a bunch of objects that is released when the search request is done.

See dump from perf report --no-children
+   27.30%  DartWorker       dart              [.] non-virtual thunk to dart::GCCompactor::VisitPointers(dart::ObjectPtr*, dart::ObjectPtr*)
+   18.31%  DartWorker       dart              [.] 0x0000000002405f07
+    8.87%  DartWorker       dart              [.] dart::CompactorTask::SlideBlock
+    7.61%  DartWorker       dart              [.] 0x0000000002405f82
+    6.05%  DartWorker       dart              [.] dart::CompactorTask::PlanBlock
+    5.95%  DartWorker       dart              [.] 0x0000000002404172
+    4.29%  DartWorker       dart              [.] dart::UntaggedObject::VisitPointersPredefined
+    3.24%  DartWorker       dart              [.] 0x0000000002405ef9
+    1.91%  DartWorker       dart              [.] 0x0000000002234494
+    1.38%  DartWorker       libc.so.6         [.] pthread_mutex_lock@@GLIBC_2.2.5
+    1.31%  DartWorker       dart              [.] 0x0000000002405efc
+    1.29%  DartWorker       libc.so.6         [.] __GI___pthread_mutex_unlock_usercnt
+    1.04%  DartWorker       dart              [.] 0x0000000002405f0c
     0.41%  DartWorker       dart              [.] 0x0000000002405f05
     0.40%  DartWorker       dart              [.] 0x0000000002405ef4
     0.34%  DartWorker       libc.so.6         [.] _int_free
     0.33%  DartWorker       dart              [.] 0x0000000002404153
     0.31%  DartWorker       dart              [.] 0x000000000240413f
     0.29%  DartWorker       [JIT] tid 856820  [.] *dart.collection___Set&_HashVMBase&SetMixin&_HashBase&_OperatorEqualsAndHashCode&_LinkedHashSetMixin@3220832__getKeyO
     0.28%  DartWorker       [JIT] tid 856820  [.] *PackageNameIndex_searchWord
Recorded request latency

Using Dart 2.19.6

$ for i in {1..30}; do sleep 1; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=retry -o /dev/null; done
0.081092
0.082060
0.064795
0.079013
0.076508
0.087800
0.665303 # <-- looks like it happens here too, just less frequently
0.078545
0.072124
0.088390
0.073081
0.086967
0.088025
0.082290
0.077925
0.076373
0.093406
0.073708
0.087810
0.074978
0.076830
0.077917
0.076213
0.087249
0.076254
0.083859
0.077098
0.084421
0.070183
0.082409

Using Dart 3.0.0

# With sleep 1s
$ for i in {1..30}; do sleep 1; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=retry -o /dev/null; done
0.108693
1.156937
1.123688
1.130786
1.258260
1.195012
1.178660
1.098746
1.093142
1.131269
1.110011
1.154830
1.149059
1.059845
1.131827
1.143921
1.139439
1.125857
1.147304
1.104930
1.113691
1.140321
1.194772
1.129498
1.108069
1.096956
1.182429
1.179889
1.112675
1.109692

# With sleep 0.1s
$ for i in {1..30}; do sleep 0.1; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=retry -o /dev/null; done
0.067815
0.049691
0.069296
0.071330
0.068390
0.089099
0.065786
0.072415
0.063212
0.065470
0.065002
0.065758
0.069631
0.065540
0.056589
0.078862
0.066153
0.064349
0.067683
0.062685
0.065035
0.062776
0.066715
0.063428
0.067500
0.062801
0.065540
0.068821
0.068419
0.065610
@jonasfj jonasfj added the type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) label May 25, 2023
@mkustermann mkustermann added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label May 25, 2023
@mkustermann
Copy link
Member

Maybe the heuristics we use for determining when to run compactor are not ideal (idle time estimation, expected time it takes to compact heap, expected fragmentation of heap, ...).

This can cause pretty big issues for server apps that get low traffic (i.e. mostly idle) - as the compactor may take 1 second so any request that does come gets delayed by that much.

As a siden-note, we may want to allow customers to disable compaction entirely (e.g. --compactor-tasks=0 -- as --use-compactor seems to be off by default and mean something different)

@mkustermann
Copy link
Member

@rmacnak-google could you take a look?

@isoos
Copy link

isoos commented May 25, 2023

Note: the issue may have started with the 2.19.0 -> 2.19.6 upgrade, but the increase in latencies was not that high to look into it.

@rmacnak-google
Copy link
Contributor

Requests every 1s is just the right frequency to get the worst behavior from the standalone VM's idle policy because this is the default value of --idle_duration_micros. If requests are more frequent, the VM won't think it's idle; less frequent and the idle GC will (at least partially) finish before the next request.

Short term, in increasing order of severity

  • Idle compaction can indirectly be disabled while leaving idle scavenges on by setting --idle_duration_micros to a value that's only enough time for scavenges. Something like 30ms, though one could look at either the timeline or --verbose_gc to find something specific to this program.
  • The idle mechanism can be throttled by setting --idle_timeout_micros to a much larger value like 1 minute.
  • The idle mechanism can be completely disabled by setting --idle_timeout_micros=0.

Long term, the compactor will be made incremental.

@jonasfj Could you provide either a timeline from the slow run or instructions on how to reproduce it locally?

jonasfj added a commit to jonasfj/pub-dev that referenced this issue May 26, 2023
Makes it easy to reproduce:
dart-lang/sdk#52513

To start do:
```
cd app
dart pub get
DEBUG=* dart bin/fake_server.dart run --read-only
```

Wait for it to print: `LOADED DATA`

Compare results from requests with an without 1s sleep:
```
for i in {1..30}; do sleep 1; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=ante -o /dev/null; done
0.907602
1.801583
1.825277
1.718818
1.825318
1.745459
1.753224
1.744140
1.753854
1.760531

for i in {1..30}; do sleep 0; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=ante -o /dev/null; done
1.105817
0.925765
0.875859
0.881853
0.933122
0.937502
0.890751
0.894597
0.924083
0.916186
1.146219
```

Fair warning: This data is total degenerate garbage.
The issue is even more extreme in our production data,
where some requests takes 2-300ms, and other request takes 3-4s.

Reproducing our production data locally gives us something like
60ms vs 1s.

The garbage data in this setup only produces 900ms vs 1.8s,
but it clearly reproduces the issue.
@jonasfj
Copy link
Member Author

jonasfj commented May 26, 2023

To start do:

wget https://github.com/jonasfj/pub-dev/archive/refs/heads/reproduce-52513.zip
unzip reproduce-52513.zip 
cd pub-dev-reproduce-52513/
cd app/
dart pub get
DEBUG=* dart bin/fake_server.dart run --read-only

Wait for it to print: LOADED DATA

Compare results from requests with an without 1s sleep:

# Sleeping 1s between requests is causing them to be slow
for i in {1..30}; do sleep 1; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=ante -o /dev/null; done
0.907602
1.801583
1.825277
...

# Sleeping 0s between requests makes them less slow
for i in {1..30}; do sleep 0; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=ante -o /dev/null; done
1.105817
0.925765
0.875859
0.881853
...

Fair warning I hacked the code a lot to somewhat reproduce this without using data from production :D

The difference between sleep 1s and not, is not as big as we see it in production where we have 300ms vs 3-4s. But it's still very clear that sleeping 1s is a problem.

copybara-service bot pushed a commit that referenced this issue May 30, 2023
2b8ca6f increased the idle policy's willingness to perform longer GC. This CL increases the amount of evidence the policy requires to consider itself idle to decrease the probability of a mispredict / wakeup during an idle GC. 61s is chosen to decrease the likelihood of resonance with periodic tasks.

Eventually compaction should not have an O(heap) blocking time, allowing compaction to be coupled to the growth policy instead of the idle policy.

TEST=manually inspect timeline
Bug: #52513
Change-Id: I5f2b02834413089545612a7fce26da928597d611
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/306308
Reviewed-by: Siva Annamalai <asiva@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
copybara-service bot pushed a commit that referenced this issue Jun 2, 2023
- Represent cards with one bit instead of one byte.
- Shrink the region tracked by one card.

TEST=ci
Bug: #52513 (benchmark)
Change-Id: Ieba8d77bc9ff0dd3b7747329d6e446a5b41969e7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/245908
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
copybara-service bot pushed a commit that referenced this issue Jun 16, 2023
TEST=ci
Bug: #52513 (benchmark)
Change-Id: I4c30e9f148e90255e616bc8ea23f0778c1117b81
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/308164
Reviewed-by: Siva Annamalai <asiva@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
@a-siva
Copy link
Contributor

a-siva commented Nov 30, 2023

@rmacnak-google Can this issue be closed as fixed?

@isoos
Copy link

isoos commented Nov 30, 2023

@a-siva: was there any change that we need to be aware of? We are still running pub.dev with a timer that prevents the compactor.

@rmacnak-google
Copy link
Contributor

@a-siva No, disabling idle compaction with --idle_timeout_micros=0 is a work-around. The proper fix is incremental compaction.

@a-siva
Copy link
Contributor

a-siva commented Nov 30, 2023

@a-siva No, disabling idle compaction with --idle_timeout_micros=0 is a work-around. The proper fix is incremental compaction.

Can we create an issue for implementation of incremental compaction and close this as a duplicate of that issue.

@a-siva a-siva added triaged Issue has been triaged by sub team P3 A lower priority bug or feature request labels Dec 7, 2023
copybara-service bot pushed a commit that referenced this issue May 16, 2024
At the beginning of a major GC cycle, select some mostly-empty pages to be evacuated. Mark the pages and the objects on these pages. Apply a write barrier for stores creating old -> evacuation candidate pointers, and discover any such pointers that already exist during marking.

At the end of a major GC cycle, evacuate objects from these pages. Forward pointers of objects in the remembered set and new-space. Free the evacuated pages.

This compaction is incremental in the sense that creating the remembered set is interleaved with mutator execution. The evacuation step, however, is stop-the-world.

Write-barrier elimination for x.slot = x is removed. Write-barrier elimination for x.slot = constant is removed in the JIT, kept for AOT but snapshot pages are marked as never-evacuate.

TEST=ci
Bug: #52513
Change-Id: Icbc29ef7cb662ef8759b8c1d7a63b7af60766281
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/357760
Reviewed-by: Alexander Aprelev <aam@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
copybara-service bot pushed a commit that referenced this issue May 17, 2024
This reverts commit bc0f02e.

Reason for revert: #55754

Original change's description:
> [vm, gc] Incremental compaction.
>
> At the beginning of a major GC cycle, select some mostly-empty pages to be evacuated. Mark the pages and the objects on these pages. Apply a write barrier for stores creating old -> evacuation candidate pointers, and discover any such pointers that already exist during marking.
>
> At the end of a major GC cycle, evacuate objects from these pages. Forward pointers of objects in the remembered set and new-space. Free the evacuated pages.
>
> This compaction is incremental in the sense that creating the remembered set is interleaved with mutator execution. The evacuation step, however, is stop-the-world.
>
> Write-barrier elimination for x.slot = x is removed. Write-barrier elimination for x.slot = constant is removed in the JIT, kept for AOT but snapshot pages are marked as never-evacuate.
>
> TEST=ci
> Bug: #52513
> Change-Id: Icbc29ef7cb662ef8759b8c1d7a63b7af60766281
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/357760
> Reviewed-by: Alexander Aprelev <aam@google.com>
> Commit-Queue: Ryan Macnak <rmacnak@google.com>

Bug: #52513
Change-Id: I565ad6c0fca283d33f605c10f181bc0a59e7d2b2
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/366965
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Commit-Queue: Alexander Aprelev <aam@google.com>
Auto-Submit: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
copybara-service bot pushed a commit that referenced this issue May 21, 2024
- Fix missing store buffer flush when --marker_tasks=0.
- Fix passing untagged pointer to store barrier check on ARM/ARM64 (6bc417d).
- Fix passing uninitialized header to store barrier check on ARM64/RISCV (1447193).

TEST=ci
Bug: #52513
Bug: #55754
Change-Id: Id2aa95b6d776b82d83464cde0d00e6f3b29b7b77
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367202
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
copybara-service bot pushed a commit that referenced this issue May 23, 2024
This reverts commit 9077bf9.

Reason for revert: CBuild and TGP crashes in random Dart code which
look a lot like arbitrary memory corruption.

Original change's description:
> [vm, gc] Incremental compaction, take 2.
>
> - Fix missing store buffer flush when --marker_tasks=0.
> - Fix passing untagged pointer to store barrier check on ARM/ARM64 (6bc417d).
> - Fix passing uninitialized header to store barrier check on ARM64/RISCV (1447193).
>
> TEST=ci
> Bug: #52513
> Bug: #55754
> Change-Id: Id2aa95b6d776b82d83464cde0d00e6f3b29b7b77
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367202
> Commit-Queue: Ryan Macnak <rmacnak@google.com>
> Reviewed-by: Alexander Aprelev <aam@google.com>

Bug: #52513
Bug: #55754
Change-Id: I1d70d33c65fe6bf7089b8c1422d59f9146ae7ebf
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367962
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Auto-Submit: Daco Harkes <dacoharkes@google.com>
Commit-Queue: Slava Egorov <vegorov@google.com>
copybara-service bot pushed a commit that referenced this issue May 29, 2024
This reverts commit 9077bf9.

Reason for revert: CBuild and TGP crashes in random Dart code which
look a lot like arbitrary memory corruption.

Original change's description:
> [vm, gc] Incremental compaction, take 2.
>
> - Fix missing store buffer flush when --marker_tasks=0.
> - Fix passing untagged pointer to store barrier check on ARM/ARM64 (6bc417d).
> - Fix passing uninitialized header to store barrier check on ARM64/RISCV (1447193).
>
> TEST=ci
> Bug: #52513
> Bug: #55754
> Change-Id: Id2aa95b6d776b82d83464cde0d00e6f3b29b7b77
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367202
> Commit-Queue: Ryan Macnak <rmacnak@google.com>
> Reviewed-by: Alexander Aprelev <aam@google.com>

Bug: #52513
Bug: #55754
Change-Id: Iac70de4a56e8ce0916eff7defec1e085733d52ff
Cherry-pick: https://dart-review.googlesource.com/c/sdk/+/367962
Cherry-pick-request: #55867
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/368620
Reviewed-by: Alexander Thomas <athom@google.com>
Commit-Queue: Martin Kustermann <kustermann@google.com>
copybara-service bot pushed a commit that referenced this issue May 30, 2024
 - Use atomics to mark remembered cards in the write barrier stub.

TEST=ci
Bug: #52513
Bug: #55754
Change-Id: I1f78c6b680a6ae9170613ba328a244335a6343e2
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/368480
Reviewed-by: Siva Annamalai <asiva@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
copybara-service bot pushed a commit that referenced this issue Jun 12, 2024
…ing idle GC.

TEST=ci
Bug: #52513
Change-Id: I85d861b2eb2ab4b461ae981bf094596af02d4df1
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/370702
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Siva Annamalai <asiva@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P3 A lower priority bug or feature request triaged Issue has been triaged by sub team type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

5 participants