-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dart::GCCompactor makes search on pub.dev slow (3.0.0 regression) #52513
Comments
Maybe the heuristics we use for determining when to run compactor are not ideal (idle time estimation, expected time it takes to compact heap, expected fragmentation of heap, ...). This can cause pretty big issues for server apps that get low traffic (i.e. mostly idle) - as the compactor may take 1 second so any request that does come gets delayed by that much. As a siden-note, we may want to allow customers to disable compaction entirely (e.g. |
@rmacnak-google could you take a look? |
Note: the issue may have started with the 2.19.0 -> 2.19.6 upgrade, but the increase in latencies was not that high to look into it. |
Requests every 1s is just the right frequency to get the worst behavior from the standalone VM's idle policy because this is the default value of Short term, in increasing order of severity
Long term, the compactor will be made incremental. @jonasfj Could you provide either a timeline from the slow run or instructions on how to reproduce it locally? |
Makes it easy to reproduce: dart-lang/sdk#52513 To start do: ``` cd app dart pub get DEBUG=* dart bin/fake_server.dart run --read-only ``` Wait for it to print: `LOADED DATA` Compare results from requests with an without 1s sleep: ``` for i in {1..30}; do sleep 1; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=ante -o /dev/null; done 0.907602 1.801583 1.825277 1.718818 1.825318 1.745459 1.753224 1.744140 1.753854 1.760531 for i in {1..30}; do sleep 0; curl -w '%{time_total}\n' -Ls http://localhost:8082/search?q=ante -o /dev/null; done 1.105817 0.925765 0.875859 0.881853 0.933122 0.937502 0.890751 0.894597 0.924083 0.916186 1.146219 ``` Fair warning: This data is total degenerate garbage. The issue is even more extreme in our production data, where some requests takes 2-300ms, and other request takes 3-4s. Reproducing our production data locally gives us something like 60ms vs 1s. The garbage data in this setup only produces 900ms vs 1.8s, but it clearly reproduces the issue.
To start do:
Wait for it to print: Compare results from requests with an without 1s sleep:
Fair warning I hacked the code a lot to somewhat reproduce this without using data from production :D The difference between sleep 1s and not, is not as big as we see it in production where we have 300ms vs 3-4s. But it's still very clear that sleeping 1s is a problem. |
2b8ca6f increased the idle policy's willingness to perform longer GC. This CL increases the amount of evidence the policy requires to consider itself idle to decrease the probability of a mispredict / wakeup during an idle GC. 61s is chosen to decrease the likelihood of resonance with periodic tasks. Eventually compaction should not have an O(heap) blocking time, allowing compaction to be coupled to the growth policy instead of the idle policy. TEST=manually inspect timeline Bug: #52513 Change-Id: I5f2b02834413089545612a7fce26da928597d611 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/306308 Reviewed-by: Siva Annamalai <asiva@google.com> Commit-Queue: Ryan Macnak <rmacnak@google.com>
- Represent cards with one bit instead of one byte. - Shrink the region tracked by one card. TEST=ci Bug: #52513 (benchmark) Change-Id: Ieba8d77bc9ff0dd3b7747329d6e446a5b41969e7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/245908 Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Ryan Macnak <rmacnak@google.com>
TEST=ci Bug: #52513 (benchmark) Change-Id: I4c30e9f148e90255e616bc8ea23f0778c1117b81 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/308164 Reviewed-by: Siva Annamalai <asiva@google.com> Commit-Queue: Ryan Macnak <rmacnak@google.com>
@rmacnak-google Can this issue be closed as fixed? |
@a-siva: was there any change that we need to be aware of? We are still running |
@a-siva No, disabling idle compaction with |
Can we create an issue for implementation of incremental compaction and close this as a duplicate of that issue. |
At the beginning of a major GC cycle, select some mostly-empty pages to be evacuated. Mark the pages and the objects on these pages. Apply a write barrier for stores creating old -> evacuation candidate pointers, and discover any such pointers that already exist during marking. At the end of a major GC cycle, evacuate objects from these pages. Forward pointers of objects in the remembered set and new-space. Free the evacuated pages. This compaction is incremental in the sense that creating the remembered set is interleaved with mutator execution. The evacuation step, however, is stop-the-world. Write-barrier elimination for x.slot = x is removed. Write-barrier elimination for x.slot = constant is removed in the JIT, kept for AOT but snapshot pages are marked as never-evacuate. TEST=ci Bug: #52513 Change-Id: Icbc29ef7cb662ef8759b8c1d7a63b7af60766281 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/357760 Reviewed-by: Alexander Aprelev <aam@google.com> Commit-Queue: Ryan Macnak <rmacnak@google.com>
This reverts commit bc0f02e. Reason for revert: #55754 Original change's description: > [vm, gc] Incremental compaction. > > At the beginning of a major GC cycle, select some mostly-empty pages to be evacuated. Mark the pages and the objects on these pages. Apply a write barrier for stores creating old -> evacuation candidate pointers, and discover any such pointers that already exist during marking. > > At the end of a major GC cycle, evacuate objects from these pages. Forward pointers of objects in the remembered set and new-space. Free the evacuated pages. > > This compaction is incremental in the sense that creating the remembered set is interleaved with mutator execution. The evacuation step, however, is stop-the-world. > > Write-barrier elimination for x.slot = x is removed. Write-barrier elimination for x.slot = constant is removed in the JIT, kept for AOT but snapshot pages are marked as never-evacuate. > > TEST=ci > Bug: #52513 > Change-Id: Icbc29ef7cb662ef8759b8c1d7a63b7af60766281 > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/357760 > Reviewed-by: Alexander Aprelev <aam@google.com> > Commit-Queue: Ryan Macnak <rmacnak@google.com> Bug: #52513 Change-Id: I565ad6c0fca283d33f605c10f181bc0a59e7d2b2 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/366965 Reviewed-by: Ryan Macnak <rmacnak@google.com> Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Commit-Queue: Alexander Aprelev <aam@google.com> Auto-Submit: Ryan Macnak <rmacnak@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
- Fix missing store buffer flush when --marker_tasks=0. - Fix passing untagged pointer to store barrier check on ARM/ARM64 (6bc417d). - Fix passing uninitialized header to store barrier check on ARM64/RISCV (1447193). TEST=ci Bug: #52513 Bug: #55754 Change-Id: Id2aa95b6d776b82d83464cde0d00e6f3b29b7b77 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367202 Commit-Queue: Ryan Macnak <rmacnak@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
This reverts commit 9077bf9. Reason for revert: CBuild and TGP crashes in random Dart code which look a lot like arbitrary memory corruption. Original change's description: > [vm, gc] Incremental compaction, take 2. > > - Fix missing store buffer flush when --marker_tasks=0. > - Fix passing untagged pointer to store barrier check on ARM/ARM64 (6bc417d). > - Fix passing uninitialized header to store barrier check on ARM64/RISCV (1447193). > > TEST=ci > Bug: #52513 > Bug: #55754 > Change-Id: Id2aa95b6d776b82d83464cde0d00e6f3b29b7b77 > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367202 > Commit-Queue: Ryan Macnak <rmacnak@google.com> > Reviewed-by: Alexander Aprelev <aam@google.com> Bug: #52513 Bug: #55754 Change-Id: I1d70d33c65fe6bf7089b8c1422d59f9146ae7ebf Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367962 Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Reviewed-by: Slava Egorov <vegorov@google.com> Auto-Submit: Daco Harkes <dacoharkes@google.com> Commit-Queue: Slava Egorov <vegorov@google.com>
This reverts commit 9077bf9. Reason for revert: CBuild and TGP crashes in random Dart code which look a lot like arbitrary memory corruption. Original change's description: > [vm, gc] Incremental compaction, take 2. > > - Fix missing store buffer flush when --marker_tasks=0. > - Fix passing untagged pointer to store barrier check on ARM/ARM64 (6bc417d). > - Fix passing uninitialized header to store barrier check on ARM64/RISCV (1447193). > > TEST=ci > Bug: #52513 > Bug: #55754 > Change-Id: Id2aa95b6d776b82d83464cde0d00e6f3b29b7b77 > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/367202 > Commit-Queue: Ryan Macnak <rmacnak@google.com> > Reviewed-by: Alexander Aprelev <aam@google.com> Bug: #52513 Bug: #55754 Change-Id: Iac70de4a56e8ce0916eff7defec1e085733d52ff Cherry-pick: https://dart-review.googlesource.com/c/sdk/+/367962 Cherry-pick-request: #55867 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/368620 Reviewed-by: Alexander Thomas <athom@google.com> Commit-Queue: Martin Kustermann <kustermann@google.com>
- Use atomics to mark remembered cards in the write barrier stub. TEST=ci Bug: #52513 Bug: #55754 Change-Id: I1f78c6b680a6ae9170613ba328a244335a6343e2 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/368480 Reviewed-by: Siva Annamalai <asiva@google.com> Commit-Queue: Ryan Macnak <rmacnak@google.com>
…ing idle GC. TEST=ci Bug: #52513 Change-Id: I85d861b2eb2ab4b461ae981bf094596af02d4df1 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/370702 Commit-Queue: Ryan Macnak <rmacnak@google.com> Reviewed-by: Siva Annamalai <asiva@google.com>
We recently observed that if we do a loop that makes a search request to pub.dev every 100ms, the requests are fast (~60ms for a simple query).
But if we do the request every 1s instead, the requests are slow (> 1s).
The issue appears to have started after we upgraded from 2.19.6 to 3.0.0.
And debugging it locally, we also find that 2.19.6 doesn't suffer from this behavior.
Credits @mkustermann for tracing this to
dart::GCCompactor::VisitPointers
Context: Our search process has a sizable amount of data in-memory and lots of Dart objects / Map / List / Set / String / etc.. So it's not surprising that compacting the heap could be slow. And each search does allocate a bunch of objects that is released when the search request is done.
See dump from
perf report --no-children
Recorded request latency
Using Dart 2.19.6
Using Dart 3.0.0
The text was updated successfully, but these errors were encountered: