New approximation for merged peak stats #1112

artem-lunarg · 2025-07-10T11:50:28Z

It is mentioned in the code comment that "peak scores do really not work across threads". Although it's true, and with the current model it is not possible to track global peak value precisely for threaded apps, I think it is possible to improve the accuracy, so it more often matches the expectations.

The attached file shows that current approximation (mi-before folder) in some cases does not provide even expected order of magnitude of estimation (merged peak values monotonically increase). It also shows that after proposed approximation the peak values look more reasonable and do not exceed or at least do not dramatically exceed peak commit values.

The attached file contains 4 set of measurements on interactive apps (games and replay tools) that are validated by Vulkan validation layer and it is VVL that allocates memory with mimalloc library (not the main app). The mi-before folder it's the results from the latest dev branch and mi-after it is how this PR changes the results.

before-and-after.zip

Example from cs2-capture.
Before:

heap stats:     peak       total     current       block      total#
    binned:    23.2 Gi     55.4 Gi      2.8 Mi                           not all freed
      huge:   368.2 Mi     14.5 Gi      0                                ok
     total:    23.6 GiB    69.9 GiB     2.8 MiB
  reserved:     3.0 GiB     3.7 GiB     3.0 GiB
 committed:     2.3 GiB     9.8 GiB    13.6 MiB

After:

heap stats:     peak       total     current       block      total#
    binned:     2.1 Gi     55.4 Gi      2.8 Mi                           not all freed
      huge:    56.2 Mi     14.5 Gi      0                                ok
     total:     2.1 GiB    69.9 GiB     2.8 MiB
  reserved:     3.0 GiB     3.7 GiB     3.0 GiB
 committed:     2.3 GiB     9.9 GiB    13.6 MiB

Notice that in the updated version, the peak total value (2.1 GiB) is comfortably below the peak committed value (2.3 GiB).

Talking about the solution itself, it’s fairly intuitive. Suppose we have a merged peak of 100 MiB and a merged current value of 80 MiB. If, during the next merge, a thread’s peak is 40 MiB, we can approximate the global peak by adding those 40 MiB to the merged current value, which gets 120 MiB as new merged peak.

Another approximation mentioned in the code - using the maximum of peak values, also doesn’t seem reliable. A per-thread peak value can be relatively small compared to the global peak (so max(global, thread) remains unchanged), but that doesn’t mean the global peak wasn’t actually reached (the previous example with local peak of 40 MiB and a global peak of 100 MiB).

It should be easy to construct a multithreaded scenario where this approach doesn't produce the expected results - it depends on the merging strategy, who allocates and deallocates. The assumption is that this approximation may be more useful in practice and, on average, tends to work better.

new approximation for merged peak stats

1f3a03e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New approximation for merged peak stats #1112

New approximation for merged peak stats #1112

Uh oh!

artem-lunarg commented Jul 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

New approximation for merged peak stats #1112

Are you sure you want to change the base?

New approximation for merged peak stats #1112

Uh oh!

Conversation

artem-lunarg commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

artem-lunarg commented Jul 10, 2025 •

edited

Loading