-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detection of vtable stubs #890
Comments
Async-profiler already shows vtable stubs, so I'm not completely sure what this request is about. |
It took a while. Here is the benchmark:
Their score is respectively:
Yet for both of them, async profiler yields almost identical profiles. I suspect the root cause is somewhere around itable stubs and checkcasts. A few pointers that might be helpful:
|
Thank you for the update. I experimented with your benchmark on M1 and Graviton3 machines and observed performance difference around 10% on both. Unfortunately, it's not that big as in your case, so it's harder to make a solid conclusion. However, I can assure vtable/itable stubs are not related to the issue. In both benchmarks, async-profiler correctly shows an interface call to I attach sample profiles for the reference - profiles.zip The commands I used to generate profiles:
A few hints how to make profiles more verbose:
Async-profiler generally outlines a performance picture on a method level. In your benchmark, however, performance variance is not explained by the shape of the callstacks. It is rather a result of C2 ability to optimize code within one compilation unit. To analyze generated code of a microbenchmark, JMH with |
Thanks for the quick response!
I observe the same (~10%) difference with Java 21. Also, the difference on Java 17 disappears with
Thanks for the advice, it indeed makes the profile more helpful and hints where to look at.
Sure, it indeed seems to be a very compilation strategy-specific and does not generally affect the callstack.
We still suspect it can be. On Java 21, both methods are compiled to basically the same assembly, and the difference seems to lie in the compiled stubs, though it requires closer investigation. We'll keep pursuing it as it affects |
OK, let me close the issue for now. |
Setup
Consider the following benchmark, where two methods are supposedly identical:
On my machine, it yields the following result:
Surprisingly, their score differs by 25%.
Yet, if profiled with async profiler (even if ran with
-XX:+DebugNonSafepoints
), the resulting flamegraphs are almost totally identicalFlamegraphs
Root cause
The two methods, in fact, differ in a few extra
CHECKCAST
instructions that upcast the list type and prevent Hotspot from devirtualizingIterator
methods.According to
dtraceasm
, the extra 25% of time is spent inunknown
section that is, from the look on it, just a vtable stub.Bytecode
It would be nice if vtable stubs were detected and somehow reported in the resulting profile along with the methods that are dispatched, so the root cause of such a difference would be obvious.
Also, while this benchmark is very specific, unfortunately, the problem itself is quite widespread [for Kotlin codebases], and the benchmark is a boiled-down hot path from
kotlinc
.Additional details
OS/CPU: Mac OS X Sonoma 14.2.3 M3
Async-profiler version: 3.0
JDK version: OpenJDK Runtime Environment Corretto-17.0.7.7.1
The text was updated successfully, but these errors were encountered: