Report expected vs. actual samples count to highlight low CPU usage/availability to application #126

ceeaspb · 2016-05-10T10:46:29Z

Low number of samples likely reflects low cpu usage or a short profiling duration, etc.

But by reporting the cpu cost as a percentage we could lose sight of this (low sample number).

This warning is presented in other sampling profilers (ie. typical / covers off a profiling risk).

How many samples is low? how long is a piece of string? TBC

nitsanw · 2016-05-10T10:58:23Z

There's 2 problems here:

Missing samples: The timer fires an interrupt but the JVM is not actually on CPU, so is not interrupted. I'm not sure where this interrupts go. We currently have no expectation on the number of samples, but we could start by logging the intended sampling rate and compare to that.
Failed samples: The JVM is running, a thread is interrupted and for any reason sampling fails. I feel this is covered by Add sample error rates to the UI #97 .

ceeaspb · 2016-05-10T13:33:15Z

@nitsanw

it looks like SIGPROF will only ever fire when a thread is on cpu. It counts down as user and system cpu time elapses. So if the process has no threads on cpu then the timer will not expire and a signal not sent when off cpu.
So I don't think we can "miss" a SIGPROF interrupt so to speak (would be good to check). They may just never occur if the whole jvm is off cpu forever.

Other timers:
SIGALRM - counts down real time
SIGVTALRM - counts down user cpu time and not system cpu time. Which makes me think of something else but I think my ticket quota today is exceeded.

agree this ticket isn't about failed samples.

probably needs more thought. the risk is you are led to optimise for cpu when you should be looking at off cpu time (in which case you need a different tool).

nitsanw · 2016-05-10T13:59:32Z

From what you say I take that the time will expire, but the SIGPROF will get swallowed. This matches my observations looking at HP logs. But I would need to experiment more to validate.
Discovering the number of actual samples vs. wanted samples is easy by logging the timer thread counter of signals emitted and comparing to number of log entries.

nitsanw · 2016-05-10T14:00:54Z

This implies profiler stop sequence should be terminate timer, drain logger, log number of timer emitted signals.

nitsanw · 2016-05-11T14:09:42Z

Rephrasing title to reflect discussion

nitsanw · 2016-06-22T08:37:05Z

@ceeaspb reading the code more closely it looks like I misunderstood how the notification timer hangs together. In particular we rely on:
http://man7.org/linux/man-pages/man2/setitimer.2.html

ITIMER_VIRTUAL decrements only when the process is executing, and
                  delivers SIGVTALRM upon expiration.
ITIMER_PROF    decrements both when the process executes and when the
                  system is executing on behalf of the process.  Coupled
                  with ITIMER_VIRTUAL, this timer is usually used to
                  profile the time spent by the application in user and
                  kernel space.  SIGPROF is delivered upon expiration.

So if no CPU is used (user or system) no signal will be delivered. This does not mean we cannot follow through on this feature. An estimate of intended sample rate should be sufficient as an indicator.

nitsanw changed the title ~~low samples warning~~ Report expected vs. actual samples count to highlights low CPU usage/availability to application May 11, 2016

ceeaspb mentioned this issue May 11, 2016

warning for short profiling duration (low number of samples) #130

Open

nitsanw changed the title ~~Report expected vs. actual samples count to highlights low CPU usage/availability to application~~ Report expected vs. actual samples count to highlight low CPU usage/availability to application Jun 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report expected vs. actual samples count to highlight low CPU usage/availability to application #126

Report expected vs. actual samples count to highlight low CPU usage/availability to application #126

ceeaspb commented May 10, 2016 •

edited

nitsanw commented May 10, 2016

ceeaspb commented May 10, 2016

nitsanw commented May 10, 2016

nitsanw commented May 10, 2016

nitsanw commented May 11, 2016

nitsanw commented Jun 22, 2016

Report expected vs. actual samples count to highlight low CPU usage/availability to application #126

Report expected vs. actual samples count to highlight low CPU usage/availability to application #126

Comments

ceeaspb commented May 10, 2016 • edited

nitsanw commented May 10, 2016

ceeaspb commented May 10, 2016

nitsanw commented May 10, 2016

nitsanw commented May 10, 2016

nitsanw commented May 11, 2016

nitsanw commented Jun 22, 2016

ceeaspb commented May 10, 2016 •

edited