-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bias in allocation profiling #737
Comments
I'm afraid this is not the right place to comment on random blog posts on the internet. If your actual question is whether allocation profiling is biased - the answer is yes: any sampling technique is biased in some sense. For instance, perf-based CPU sampling is biased towards functions that run longer on CPU. But isn't it the whole point of profiling to find such functions? If we compare a function that works for 1ms and runs 1 million times to a function that works for 20ms and runs 100k times, the profile will be "biased" towards the latter, even though the former executes 10x more times. The same with allocation profiling: if a program allocates the same number of 80 byte and 800 byte objects, the latter will have roughly 10x more samples in the allocation profile, reflecting the fact those allocations eat 10x more memory. Async-profiler's allocation sampler does not show the number of allocated objects, similarly to how it does not show the number of method calls. If required, it's possible to record every allocated object by turning off TLAB: Note that with JDK 11+ async-profiler uses a slightly different allocation sampling mechanism based on JEP 331. It allows fine tuning of sampling threshold regardless of TLAB size and also adds some randomness to account for repeated allocation patterns. |
@apangin Thanks for the explanation! I'm interested in bytes too, because using them I can easier understand where are sources of the memory/LLC bandwidth reduction. Is any CPU event to track cycles spent in waiting for the memory access? |
Yes, but this is a complex topic. There are hundreds of hardware performance counters related to memory access. Furthermore, they differ with every next micro-architecture. Here is an example article1 that demonstrates why I can't name you just a single counter. Footnotes
|
@apangin Thanks a bunch for your responses! |
Could it be that direct comparison with VisualVM is wrong in this article.
If no, would it be possible to auto-correct allocation stats?
The text was updated successfully, but these errors were encountered: