Recording accurate job durations #1053

DanielHeath · 2023-08-29T00:15:49Z

Currently, good_job records the start and end time of a job.

That seems like it should be enough to calculate the job runtime, except... during NTP adjustments it can result in a negative duration.

Using Process.clock_gettime(Process::CLOCK_MONOTONIC) to calculate start/end times would also be sufficient if you didn't want the extra overhead of storing duration.

The text was updated successfully, but these errors were encountered:

bensheldon · 2023-08-29T01:03:47Z

This is really interesting! And totally makes sense that we should try to be as accurate as possible for calculating those durations) Some thoughts:

This makes a lot of sense when calculating execution duration, but would be harder to calculate queue latency because the job could be enqueued and executed on different machines. Hopefully that queue latency is brief so theres less chance of an NTP adjustment 🤞🏻
I would like to keep storing timestamps. Would it be weird to say the "finish_time" is ntp_start_time + monotonic_duration? I think that would be fine 🤷🏻
For jobs that are executed inline or async (i.e. if the operation does take place on the same machine) I think that makes sense to calculate all the timestamps (that we can) using the monotonic clock (though there also shouldn't be any meaningful queue latency in that case)

If that all makes sense, I would accept a PR if you wanted to calculate the duration of a job using the monotonic clock and use that to calculate the finished_at/errored_at (and anywhere else you find; like batches too I think)

DanielHeath · 2023-08-29T01:08:59Z

For latency, the problem is essentially not solvable; postgres lacks a "monotonic timestamp" function, and multi-machine clock consistency should be out of scope.

If all clock calls use CLOCK_MONOTONIC, then if they happen to occur on the same instance they will behave sensibly. Might make sense to record the host alongside each timestamp?

bensheldon · 2023-08-29T16:48:07Z

For latency, the problem is essentially not solvable; postgres lacks a "monotonic timestamp" function

That's a bummer, but also makes sense that would likely be very difficult to implement for a cluster.

Might make sense to record the host alongside each timestamp?

I'm generally reluctant to make schema changes. I think we should make best effort to be accurate, but I imagine storing the host alongside timestamps would have very niche usage.

bensheldon added the help wanted Extra attention is needed label Nov 27, 2023

bensheldon mentioned this issue May 27, 2024

Add stats about job classes #1362

Closed

bensheldon mentioned this issue Jul 6, 2024

Add initial Performance panel to dashboard #1388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recording accurate job durations #1053

Recording accurate job durations #1053

DanielHeath commented Aug 29, 2023

bensheldon commented Aug 29, 2023

DanielHeath commented Aug 29, 2023

bensheldon commented Aug 29, 2023

Recording accurate job durations #1053

Recording accurate job durations #1053

Comments

DanielHeath commented Aug 29, 2023

bensheldon commented Aug 29, 2023

DanielHeath commented Aug 29, 2023

bensheldon commented Aug 29, 2023