Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report expected vs. actual samples count to highlight low CPU usage/availability to application #126

Open
ceeaspb opened this issue May 10, 2016 · 6 comments

Comments

@ceeaspb
Copy link

ceeaspb commented May 10, 2016

Low number of samples likely reflects low cpu usage or a short profiling duration, etc.

But by reporting the cpu cost as a percentage we could lose sight of this (low sample number).

This warning is presented in other sampling profilers (ie. typical / covers off a profiling risk).

How many samples is low? how long is a piece of string? TBC

@nitsanw
Copy link
Member

nitsanw commented May 10, 2016

There's 2 problems here:

  1. Missing samples: The timer fires an interrupt but the JVM is not actually on CPU, so is not interrupted. I'm not sure where this interrupts go. We currently have no expectation on the number of samples, but we could start by logging the intended sampling rate and compare to that.
  2. Failed samples: The JVM is running, a thread is interrupted and for any reason sampling fails. I feel this is covered by Add sample error rates to the UI #97 .

@ceeaspb
Copy link
Author

ceeaspb commented May 10, 2016

@nitsanw

  1. it looks like SIGPROF will only ever fire when a thread is on cpu. It counts down as user and system cpu time elapses. So if the process has no threads on cpu then the timer will not expire and a signal not sent when off cpu.
    So I don't think we can "miss" a SIGPROF interrupt so to speak (would be good to check). They may just never occur if the whole jvm is off cpu forever.

Other timers:
SIGALRM - counts down real time
SIGVTALRM - counts down user cpu time and not system cpu time. Which makes me think of something else but I think my ticket quota today is exceeded.

  1. agree this ticket isn't about failed samples.

probably needs more thought. the risk is you are led to optimise for cpu when you should be looking at off cpu time (in which case you need a different tool).

@nitsanw
Copy link
Member

nitsanw commented May 10, 2016

From what you say I take that the time will expire, but the SIGPROF will get swallowed. This matches my observations looking at HP logs. But I would need to experiment more to validate.
Discovering the number of actual samples vs. wanted samples is easy by logging the timer thread counter of signals emitted and comparing to number of log entries.

@nitsanw
Copy link
Member

nitsanw commented May 10, 2016

This implies profiler stop sequence should be terminate timer, drain logger, log number of timer emitted signals.

@nitsanw nitsanw changed the title low samples warning Report expected vs. actual samples count to highlights low CPU usage/availability to application May 11, 2016
@nitsanw
Copy link
Member

nitsanw commented May 11, 2016

Rephrasing title to reflect discussion

@nitsanw nitsanw changed the title Report expected vs. actual samples count to highlights low CPU usage/availability to application Report expected vs. actual samples count to highlight low CPU usage/availability to application Jun 22, 2016
@nitsanw
Copy link
Member

nitsanw commented Jun 22, 2016

@ceeaspb reading the code more closely it looks like I misunderstood how the notification timer hangs together. In particular we rely on:
http://man7.org/linux/man-pages/man2/setitimer.2.html

ITIMER_VIRTUAL decrements only when the process is executing, and
                  delivers SIGVTALRM upon expiration.
ITIMER_PROF    decrements both when the process executes and when the
                  system is executing on behalf of the process.  Coupled
                  with ITIMER_VIRTUAL, this timer is usually used to
                  profile the time spent by the application in user and
                  kernel space.  SIGPROF is delivered upon expiration.

So if no CPU is used (user or system) no signal will be delivered. This does not mean we cannot follow through on this feature. An estimate of intended sample rate should be sufficient as an indicator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants