Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bias in allocation profiling #737

Closed
plokhotnyuk opened this issue Mar 28, 2023 · 4 comments
Closed

Possible bias in allocation profiling #737

plokhotnyuk opened this issue Mar 28, 2023 · 4 comments
Labels

Comments

@plokhotnyuk
Copy link

plokhotnyuk commented Mar 28, 2023

Could it be that direct comparison with VisualVM is wrong in this article.

If no, would it be possible to auto-correct allocation stats?

@apangin
Copy link
Collaborator

apangin commented Mar 28, 2023

I'm afraid this is not the right place to comment on random blog posts on the internet.

If your actual question is whether allocation profiling is biased - the answer is yes: any sampling technique is biased in some sense. For instance, perf-based CPU sampling is biased towards functions that run longer on CPU. But isn't it the whole point of profiling to find such functions?

If we compare a function that works for 1ms and runs 1 million times to a function that works for 20ms and runs 100k times, the profile will be "biased" towards the latter, even though the former executes 10x more times. The same with allocation profiling: if a program allocates the same number of 80 byte and 800 byte objects, the latter will have roughly 10x more samples in the allocation profile, reflecting the fact those allocations eat 10x more memory.

Async-profiler's allocation sampler does not show the number of allocated objects, similarly to how it does not show the number of method calls. If required, it's possible to record every allocated object by turning off TLAB: -XX:-UseTLAB (may not work with all GCs, but works for G1). This can significantly impact performance though.

Note that with JDK 11+ async-profiler uses a slightly different allocation sampling mechanism based on JEP 331. It allows fine tuning of sampling threshold regardless of TLAB size and also adds some randomness to account for repeated allocation patterns.

@plokhotnyuk
Copy link
Author

@apangin Thanks for the explanation!

I'm interested in bytes too, because using them I can easier understand where are sources of the memory/LLC bandwidth reduction.

Is any CPU event to track cycles spent in waiting for the memory access?

@apangin
Copy link
Collaborator

apangin commented Mar 29, 2023

Is any CPU event to track cycles spent in waiting for the memory access?

Yes, but this is a complex topic. There are hundreds of hardware performance counters related to memory access. Furthermore, they differ with every next micro-architecture. Here is an example article1 that demonstrates why I can't name you just a single counter.

Footnotes

  1. Daniel Molka, Robert Schöne, Daniel Hackenberg, Wolfgang E. Nagel.
    Detecting Memory-Boundedness with Hardware Performance Counters

@plokhotnyuk
Copy link
Author

@apangin Thanks a bunch for your responses!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants