New approximation for merged peak stats #1112
Open
+7
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It is mentioned in the code comment that
"peak scores do really not work across threads"
. Although it's true, and with the current model it is not possible to track global peak value precisely for threaded apps, I think it is possible to improve the accuracy, so it more often matches the expectations.The attached file shows that current approximation (
mi-before
folder) in some cases does not provide even expected order of magnitude of estimation (merged peak values monotonically increase). It also shows that after proposed approximation the peak values look more reasonable and do not exceed or at least do not dramatically exceed peak commit values.The attached file contains 4 set of measurements on interactive apps (games and replay tools) that are validated by Vulkan validation layer and it is VVL that allocates memory with mimalloc library (not the main app). The
mi-before
folder it's the results from the latestdev
branch andmi-after
it is how this PR changes the results.before-and-after.zip
Example from cs2-capture.
Before:
After:
Notice that in the updated version, the peak total value (2.1 GiB) is comfortably below the peak committed value (2.3 GiB).
Talking about the solution itself, it’s fairly intuitive. Suppose we have a merged
peak
of 100 MiB and a mergedcurrent
value of 80 MiB. If, during the next merge, a thread’speak
is 40 MiB, we can approximate the globalpeak
by adding those 40 MiB to the mergedcurrent
value, which gets 120 MiB as new mergedpeak
.Another approximation mentioned in the code - using the maximum of peak values, also doesn’t seem reliable. A per-thread peak value can be relatively small compared to the global peak (so max(global, thread) remains unchanged), but that doesn’t mean the global peak wasn’t actually reached (the previous example with local peak of 40 MiB and a global peak of 100 MiB).
It should be easy to construct a multithreaded scenario where this approach doesn't produce the expected results - it depends on the merging strategy, who allocates and deallocates. The assumption is that this approximation may be more useful in practice and, on average, tends to work better.