Skip to content

New approximation for merged peak stats #1112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

artem-lunarg
Copy link

@artem-lunarg artem-lunarg commented Jul 10, 2025

It is mentioned in the code comment that "peak scores do really not work across threads". Although it's true, and with the current model it is not possible to track global peak value precisely for threaded apps, I think it is possible to improve the accuracy, so it more often matches the expectations.

The attached file shows that current approximation (mi-before folder) in some cases does not provide even expected order of magnitude of estimation (merged peak values monotonically increase). It also shows that after proposed approximation the peak values look more reasonable and do not exceed or at least do not dramatically exceed peak commit values.

The attached file contains 4 set of measurements on interactive apps (games and replay tools) that are validated by Vulkan validation layer and it is VVL that allocates memory with mimalloc library (not the main app). The mi-before folder it's the results from the latest dev branch and mi-after it is how this PR changes the results.

before-and-after.zip

Example from cs2-capture.
Before:

heap stats:     peak       total     current       block      total#
    binned:    23.2 Gi     55.4 Gi      2.8 Mi                           not all freed
      huge:   368.2 Mi     14.5 Gi      0                                ok
     total:    23.6 GiB    69.9 GiB     2.8 MiB
  reserved:     3.0 GiB     3.7 GiB     3.0 GiB
 committed:     2.3 GiB     9.8 GiB    13.6 MiB

After:

heap stats:     peak       total     current       block      total#
    binned:     2.1 Gi     55.4 Gi      2.8 Mi                           not all freed
      huge:    56.2 Mi     14.5 Gi      0                                ok
     total:     2.1 GiB    69.9 GiB     2.8 MiB
  reserved:     3.0 GiB     3.7 GiB     3.0 GiB
 committed:     2.3 GiB     9.9 GiB    13.6 MiB

Notice that in the updated version, the peak total value (2.1 GiB) is comfortably below the peak committed value (2.3 GiB).

Talking about the solution itself, it’s fairly intuitive. Suppose we have a merged peak of 100 MiB and a merged current value of 80 MiB. If, during the next merge, a thread’s peak is 40 MiB, we can approximate the global peak by adding those 40 MiB to the merged current value, which gets 120 MiB as new merged peak.

Another approximation mentioned in the code - using the maximum of peak values, also doesn’t seem reliable. A per-thread peak value can be relatively small compared to the global peak (so max(global, thread) remains unchanged), but that doesn’t mean the global peak wasn’t actually reached (the previous example with local peak of 40 MiB and a global peak of 100 MiB).

It should be easy to construct a multithreaded scenario where this approach doesn't produce the expected results - it depends on the merging strategy, who allocates and deallocates. The assumption is that this approximation may be more useful in practice and, on average, tends to work better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant