Possible bias in allocation profiling #737

plokhotnyuk · 2023-03-28T11:36:27Z

Could it be that direct comparison with VisualVM is wrong in this article.

If no, would it be possible to auto-correct allocation stats?

apangin · 2023-03-28T23:24:30Z

I'm afraid this is not the right place to comment on random blog posts on the internet.

If your actual question is whether allocation profiling is biased - the answer is yes: any sampling technique is biased in some sense. For instance, perf-based CPU sampling is biased towards functions that run longer on CPU. But isn't it the whole point of profiling to find such functions?

If we compare a function that works for 1ms and runs 1 million times to a function that works for 20ms and runs 100k times, the profile will be "biased" towards the latter, even though the former executes 10x more times. The same with allocation profiling: if a program allocates the same number of 80 byte and 800 byte objects, the latter will have roughly 10x more samples in the allocation profile, reflecting the fact those allocations eat 10x more memory.

Async-profiler's allocation sampler does not show the number of allocated objects, similarly to how it does not show the number of method calls. If required, it's possible to record every allocated object by turning off TLAB: -XX:-UseTLAB (may not work with all GCs, but works for G1). This can significantly impact performance though.

Note that with JDK 11+ async-profiler uses a slightly different allocation sampling mechanism based on JEP 331. It allows fine tuning of sampling threshold regardless of TLAB size and also adds some randomness to account for repeated allocation patterns.

plokhotnyuk · 2023-03-29T15:10:42Z

@apangin Thanks for the explanation!

I'm interested in bytes too, because using them I can easier understand where are sources of the memory/LLC bandwidth reduction.

Is any CPU event to track cycles spent in waiting for the memory access?

apangin · 2023-03-29T21:58:51Z

Is any CPU event to track cycles spent in waiting for the memory access?

Yes, but this is a complex topic. There are hundreds of hardware performance counters related to memory access. Furthermore, they differ with every next micro-architecture. Here is an example article¹ that demonstrates why I can't name you just a single counter.

Daniel Molka, Robert Schöne, Daniel Hackenberg, Wolfgang E. Nagel.
Detecting Memory-Boundedness with Hardware Performance Counters ↩

plokhotnyuk · 2023-03-30T04:42:11Z

@apangin Thanks a bunch for your responses!

apangin added the question label Mar 28, 2023

plokhotnyuk closed this as completed Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bias in allocation profiling #737

Possible bias in allocation profiling #737

plokhotnyuk commented Mar 28, 2023 •

edited

apangin commented Mar 28, 2023

plokhotnyuk commented Mar 29, 2023

apangin commented Mar 29, 2023

plokhotnyuk commented Mar 30, 2023

Possible bias in allocation profiling #737

Possible bias in allocation profiling #737

Comments

plokhotnyuk commented Mar 28, 2023 • edited

apangin commented Mar 28, 2023

plokhotnyuk commented Mar 29, 2023

apangin commented Mar 29, 2023

Footnotes

plokhotnyuk commented Mar 30, 2023

plokhotnyuk commented Mar 28, 2023 •

edited