Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does imprecise sample frequency affect correctness of flame graph analysis? #209

Open
open-richard opened this issue Jul 7, 2019 · 1 comment

Comments

@open-richard
Copy link

Hi Brendan,

I am a huge fan of Flamegraph and all your blogs and talks. But I have a question on the correctness of flame graph.
According to the source code, the flamegraph generation is totally relying on the sample counts of the stack traces. However, as experiments show when the system is relatively idle, the sampling frequency is not matching what we specify in the command line arguments (-F 99).

I checked out the tutorial of perf and got to know that -F means the kernel is dynamically adjusting the sampling period to achieve the target average rate. But if we look at the data, the actual sample frequency can be only around 60-70Hz on average when I am specifying a 99Hz sample rate. Each sample has a different length of period. Some of the long-lasting samples have a period longer than 30ms* while I am letting it collect one sample every 10ms. Meanwhile, some can be as short as less than 1ms.

With this kind of variation on sample duration, how can I know whether counting samples is correctly reflecting how much time a function is taking?

In practice, from the result of flamegraph, it does show the approximate portion. But I am confused why this kind of largely imprecise sample rate can be reliable? I am curious about how you think about this potentially wrong approximation.

And what could be the best way to guarantee a precise sample rate with perf? Apparently, neither -F nor -c can ensure one sample per 10ms. Is there a way?

Thank you very much!

best,
Richard

p.s.

  • * 30ms. Here I am using the CPU frequency to get on average how much time each cycle takes (e.g., 1/2.4GHz), and multiply the cycle counts in the period. From the documentation, I know "cycle" is not constant correlated to time, but I think I can use the CPU cycle time as a lower bound of the time this sample lasts. (please correct me if I am wrong here)
  • Sample count was obtained by running perf report -D |grep RECORD_SAMPLE |wc -l
  • The experiment was performed in a single core virtual machine with Ubuntu 16.04 kernel 4.13.0
  • When the system is under heavy load, the sample frequency is correct, i.e., getting 99 samples per second with -F 99.
  • I wanted to send this as a personal email, but I think github will make this question searchable to people who have the same question.
@Knio
Copy link

Knio commented Sep 21, 2020

This looks related to #165. Using cpu-clock instead of cycles should give you a more consistent sampling period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants