Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why async-profiler cpu event less than perf #801

Closed
zdyj3170101136 opened this issue Aug 16, 2023 · 10 comments
Closed

why async-profiler cpu event less than perf #801

zdyj3170101136 opened this issue Aug 16, 2023 · 10 comments

Comments

@zdyj3170101136
Copy link

i started perf and async-profiler on same process.
the higher line is perf and lower line is async profiler:
截屏2023-08-16 下午6 33 25
and sees async-profiler always less than perf.

the perf result:
截屏2023-08-16 下午6 32 13

the async profiler result:
截屏2023-08-16 下午6 32 41

@apangin
Copy link
Collaborator

apangin commented Aug 16, 2023

Attached graphs are not helpful to me, sorry.
To provide a context for your question, please specify what commands you run, what results do you expect, and what you get instead. Please also mention versions of the used sofware (OS, JDK, async-profiler).

@zdyj3170101136
Copy link
Author

zdyj3170101136 commented Aug 17, 2023

抱歉,附图对我没有帮助。 要为您的问题提供上下文,请指定您运行的命令、您期望的结果以及您得到的结果。另请提及所使用软件的版本(操作系统、JDK、async-profiler)。

run command

async-profiler command:

/home/data/software/parca-agent/async-profiler-2.10-linux-x64/bin/asprof collect -d 59 -o jfr --cstack dwarf -f /tmp/25215.jfr -e cpu -i 10ms --alloc 524288 --wall 200ms --lock 10ms 25215

perf command

perf record -a -B -g -F 100  sleep 59s

expect behaviour

perf should only a litter higher than async-profiler.

true behaviour

sometimes the perf result would much higher than async-profiler.

most on function unsafeParkHook.

version

[root@plat-sg03-data-testing-pts001 ~]# uname -a
Linux plat-sg03-data-testing-pts001 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@plat-sg03-data-testing-pts001 ~]# java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

@zdyj3170101136
Copy link
Author

here is perf script output, seems the unsafe park trigger by libAsyncprofiler.so.

截屏2023-08-17 下午4 55 25

seems that start and stop async-profiler itself would cost much of cpu.

@zdyj3170101136
Copy link
Author

zdyj3170101136 commented Aug 17, 2023

after remove wall type of async profiler.

the cpu cost is much is return to normal.

the process have 128 thread:

[root@plat-sg03-data-testing-pts001 parca-agent]# jstack 25215  | grep 'java.lang.Thread.State' | wc -l
128

the wall type would cost much cpu even for a java process which cost small cpu.

@franz1981
Copy link

franz1981 commented Aug 17, 2023

What happen if you lower the interval for wall to the default value? --wall 200ms seems pretty high here

@apangin
Copy link
Collaborator

apangin commented Aug 19, 2023

expect behaviour
perf should only a litter higher than async-profiler.

What exactly do you compare and how do you do that?

@franz1981
Copy link

franz1981 commented Aug 23, 2023

@zdyj3170101136 another important consideration:
async profiler -e cpu is using (in perf record) -e cpu-clock events and flamegraphs "weight" each sample based on the sample computer "period" (which is 1/F in ns, given that cpu-clock is a SW event), while aync-profiler, for the same type of event, doesn't.

More info on the timer configuration used by the kernel for it, at https://github.com/torvalds/linux/blob/93f5de5f648d2b1ce3540a4ac71756d4a852dc23/kernel/events/core.c#L11109C22-L11109C22.
For cycles, which is an hw event, the period can be variable and usually is adjusted on the fly, see https://github.com/torvalds/linux/blob/93f5de5f648d2b1ce3540a4ac71756d4a852dc23/kernel/events/core.c#L4065C23-L4065C23.

@apangin
Copy link
Collaborator

apangin commented Aug 26, 2023

@franz1981 When --total option is specified, async-profiler also records the counter value: nanoseconds/cycles/instructions/etc. - what you call "weight". But if we assume that perf interrupts are fair, it should not really matter whether we measure samples or counter values - the ratio will remain the same.

It's still not clear to me what OP compares, so I can't really comment on that.

As a side note: I often see people using perf -F option which I consider harmful, as it causes frequent PMU reconfiguration and thus extra overhead (hardware does not sample at a given frequency, only at a given period, therefore kernel needs to constantly adjust sampling period to keep the desired frequency).

perf -c provides just as good profiles, but with lower overhead. async-profiler's sampling mode is similar to perf -c.

@franz1981
Copy link

Thanks @apangin for the clarification, I didn't knew that --total would have used the configured period to make the absolute values to match what SVG flamegraphs produced out of perf data does!

For

as it causes frequent PMU reconfiguration and thus extra overhead

Fully agree, looking at the kernel code, HW events indeed require adjustments for each counter overflow (and during throttling too, it seems, which make sense).
But cpu-clock, which is SW it doesn't appear, and the period is fixed and computed just the very first time, at https://github.com/torvalds/linux/blob/93f5de5f648d2b1ce3540a4ac71756d4a852dc23/kernel/events/core.c#L11049-L11052

perf -c provides just as good profiles

I tends to agree, although HW events which does have correlation with the processor frequency and running on wildly tuned machines (I usually use tuned with network latency profiles, fixed frequencies, etc etc for sake of avoiding frequency fluctuations and/or weird processor idle states...but the world have plenty of unconfigured HW) could benefit from perioding re-evaluation of period, or they won't have uniform distribution...
Thanks for the detailed answer and I hope this digression not to be too much OT: the user has indeed provided not enough info to get any complete answer, just guesses...

@apangin
Copy link
Collaborator

apangin commented Sep 5, 2023

@zdyj3170101136 Do you have anything to add wrt. to the question or can I close the issue?

@apangin apangin closed this as completed Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants