Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recording accurate job durations #1053

Open
DanielHeath opened this issue Aug 29, 2023 · 3 comments
Open

Recording accurate job durations #1053

DanielHeath opened this issue Aug 29, 2023 · 3 comments
Labels
help wanted Extra attention is needed

Comments

@DanielHeath
Copy link
Contributor

Currently, good_job records the start and end time of a job.

That seems like it should be enough to calculate the job runtime, except... during NTP adjustments it can result in a negative duration.

Using Process.clock_gettime(Process::CLOCK_MONOTONIC) to calculate start/end times would also be sufficient if you didn't want the extra overhead of storing duration.

@bensheldon
Copy link
Owner

This is really interesting! And totally makes sense that we should try to be as accurate as possible for calculating those durations) Some thoughts:

  • This makes a lot of sense when calculating execution duration, but would be harder to calculate queue latency because the job could be enqueued and executed on different machines. Hopefully that queue latency is brief so theres less chance of an NTP adjustment 🤞🏻
  • I would like to keep storing timestamps. Would it be weird to say the "finish_time" is ntp_start_time + monotonic_duration? I think that would be fine 🤷🏻
  • For jobs that are executed inline or async (i.e. if the operation does take place on the same machine) I think that makes sense to calculate all the timestamps (that we can) using the monotonic clock (though there also shouldn't be any meaningful queue latency in that case)

If that all makes sense, I would accept a PR if you wanted to calculate the duration of a job using the monotonic clock and use that to calculate the finished_at/errored_at (and anywhere else you find; like batches too I think)

@DanielHeath
Copy link
Contributor Author

For latency, the problem is essentially not solvable; postgres lacks a "monotonic timestamp" function, and multi-machine clock consistency should be out of scope.

If all clock calls use CLOCK_MONOTONIC, then if they happen to occur on the same instance they will behave sensibly. Might make sense to record the host alongside each timestamp?

@bensheldon
Copy link
Owner

For latency, the problem is essentially not solvable; postgres lacks a "monotonic timestamp" function

That's a bummer, but also makes sense that would likely be very difficult to implement for a cluster.

Might make sense to record the host alongside each timestamp?

I'm generally reluctant to make schema changes. I think we should make best effort to be accurate, but I imagine storing the host alongside timestamps would have very niche usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Status: Inbox
Development

No branches or pull requests

2 participants