Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display task duration on finished tasks #3579

Merged
merged 6 commits into from Apr 3, 2019

Conversation

@mockersf
Copy link
Contributor

mockersf commented Mar 23, 2019

for #1788

task duration is displayed on the right of finished tasks:
task duration

@marco-m

This comment has been minimized.

Copy link
Contributor

marco-m commented Mar 25, 2019

Yes! thanks @mockersf

Copy link
Member

vito left a comment

😍 This is excellent, thanks for the PR (and tests!)!

I think it could use some UI polish - we don't often use character notation ([]) and I'm not sure it'll be immediately obvious what the "10s" is without something saying 'duration' or 'took' or something.

@marco-m also brought up a good point that this is valuable information to show for get and put steps, too, but that space in the UI is currently taken up by the resource version for those steps. @Lindsayauchin when you have time, this could be a good 'design snack' (show duration for each step, not just tasks).

For now I'm happy to merge as-is if you want to leave the rest to us. 🙂 I'm fine with shipping this and iterating on the design later as it's definitely useful to have. I'll leave it open for a bit in case you want to play with the design some more, but 👍 from me.

@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Mar 25, 2019

I went with @marco-m proposition in #1788 for [1m 06s] after several tries:

  • as duration 1m 06s like resources version, but then it wouldn't be possible to add it also to resources
  • as duration 1m 06s next to the task name, which would be somewhat similar to how duration is displayed for the build but it feels to me like it is too... heavy?
  • as the way metadata are displayed for a resource, but it hides the duration somewhat and it may be an issue when adding the duration for a resource

I'm open to suggestion too 😃

For resource duration, I tried to see if it was the same kind of change but it seems events for resource ends (FinishGet and FinishPut) don't have a time at least in some cases

@vito

This comment has been minimized.

Copy link
Member

vito commented Mar 26, 2019

Ah, shame we don't include timestamps for those events. Would you be interested in adding that? I actually think it wouldn't be too hard; we'd just need to add the field to the FinishGet/FinishPut types and record the time here:

err := d.build.SaveEvent(event.FinishPut{
Origin: d.eventOrigin,
ExitStatus: int(exitStatus),
CreatedVersion: info.Version,
CreatedMetadata: info.Metadata,
})

...just like we do for tasks here:

func (d *taskDelegate) Finished(logger lager.Logger, exitStatus exec.ExitStatus) {
err := d.build.SaveEvent(event.FinishTask{
ExitStatus: int(exitStatus),
Time: time.Now().Unix(),
Origin: d.eventOrigin,
})
if err != nil {
logger.Error("failed-to-save-finish-event", err)
return
}
logger.Info("finished", lager.Data{"exit-status": exitStatus})
}

re: where to put the timestamp, it does seem like the UI is a bit cramped. The easiest way to layer it in would maybe be as a tooltip on the step's icon (the checkmark/X on the right). The disadvantage would be that you'd have to mouse-over to reveal the duration. Would that be OK? It would at least de-clutter the UI when you don't actually care, but it could also make it hard for slowness to 'jump out'.

@vito vito self-assigned this Mar 26, 2019
@marco-m

This comment has been minimized.

Copy link
Contributor

marco-m commented Mar 26, 2019

@vito FYI in #1788, @Lindsayauchin added a thumbs up to the proposal of the [1m 06s] :-)

@Lindsayauchin

This comment has been minimized.

Copy link
Contributor

Lindsayauchin commented Mar 26, 2019

Proposing a design similar to issue #1788 that fits nicely into the current UI.

Screen Shot 2019-03-26 at 2 35 42 PM

@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Mar 26, 2019

@Lindsayauchin that wouldn't work when also displaying durations for resources. With [] it gives:
resource duration

@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Mar 27, 2019

@vito I added duration for resources

  • I added time to FinishGet and FinishPut events, that can be empty for previous versions of those events
  • I added events InitializeGet, StartGet, InitializePut and StartPut similar to task events
  • I removed remaining references to an Initialize event that were never called
@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Mar 27, 2019

I tried the tooltip, which has the nice side effect of allowing more info to be displayed:
duration tooltip
initialization duration is selecting a worker, creating the container, ... and step is the duration spent in the actual step

Exist status code could be later added to this tooltip for example

I can commit that and remove the duration between [] if we agree on the tooltip

@vito

This comment has been minimized.

Copy link
Member

vito commented Mar 27, 2019

@mockersf Nifty, I hadn't thought of timing breakdowns but that seems useful, especially when a significant chunk goes just to initialization. I wonder if there's any other useful info to put there in the future. 🤔

@marco-m @Lindsayauchin Wdyt of the tooltip UI/UX?

@marco-m

This comment has been minimized.

Copy link
Contributor

marco-m commented Mar 27, 2019

@vito at the beginning I was unsure, but the more I think about it, the more I find that the tooltip works well. I also like the timing breakdown. Thanks again @mockersf !

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Mar 28, 2019

Ah, shame we don't include timestamps for those events. Would you be interested in adding that?

I'm so excited for this change. In the data I scrape it's a noticeable hole.

Related wish: regardless of how it's presented in the UI, please give finer-grained timestamps in the events streamed from the API. To many things can happen in a 1-second window, especially if caching makes various events "instant".

I'd ask for at least milliseconds, but I'd be happiest with nanoseconds (Golang's internal granularity).

I hadn't thought of timing breakdowns but that seems useful, especially when a significant chunk goes just to initialization. I wonder if there's any other useful info to put there in the future. 🤔

Everything. Please. Everything.

@mockersf mockersf force-pushed the mockersf:task-duration branch from dca699b to d6ca4b9 Mar 29, 2019
@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Mar 29, 2019

Related wish: regardless of how it's presented in the UI, please give finer-grained timestamps in the events streamed from the API. To many things can happen in a 1-second window, especially if caching makes various events "instant".

I'd ask for at least milliseconds, but I'd be happiest with nanoseconds (Golang's internal granularity).

Adding a more precise time in the event shouldn't be complicated, but there's an issue with format and backward compatibility:

  • Changing the time field to be an int in nanoseconds would break existing clients and not be a standard
  • Changing the time field to be a float in nanoseconds with the integer part for the second may break clients, but the UI would still work and gain precision. It would require additional computing on the server side
  • Adding a new field could have one of the two format above, or use RFC3339 (Golang default) for a string format

what do you think ?

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Mar 29, 2019

what do you think ?

I prefer clock time elapsed since epoch. RFC3339 timestamps are calendar time, not clock time: they require external information for correct interpretation (timezones, daylight savings, leap seconds etc). Clock time ignores these additional constructs. A thousand years from now the nanoseconds elapsed from epoch until the time a stamp was taken will have exactly the same value. But calendars change all the time, a record kept as a calendrical timestamp is subject to unexpected -- and hard-to-detect -- corruption due to factors outside the record.

For example, the dates 3rd to the 13th of September, 1752, never happened in the British Empire. They were skipped due to a change from the Julian to Gregorian calender. If I had recorded a timestamp at the time (say 4th Sept 1752), it would become meaningless without additional processing. But if I had an epoch stamp (supposing nanoseconds since 1700, say), I would still have a universally reliable concept of when the event happened.

Both ways of expressing points in time are affected by clock drift, clock accuracy, clock precision and relativistic effects.

As for the format, I'd prefer an int64 encoding nanoseconds since epoch (even if it's rounded off to whatever precision to OS allows). I don't think there is an compatibility guarantee for the API and clients can use /v1/info to see what version of Concourse they're talking to. Using an int64 for nanoseconds gives approximately 580 years of headroom before it overflows.

@mockersf mockersf force-pushed the mockersf:task-duration branch from 4facdf9 to 07672a2 Mar 31, 2019
@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Mar 31, 2019

I switched the time fields to be a time in nanoseconds, and the UI to display milliseconds details if it's below 1 second.

According to Golang's doc, it will start failing in 2263

Copy link
Member

vito left a comment

The nanosecond change kind of explodes the complexity/risk of this, but if you're willing to follow through with the changes necessary, then I'm cool with it. 👍

@jchesterpivotal Can you provide more details on why such high precision is important to you? I get that a lot can happen in a second, but transitioning to nanoseconds is nontrivial, and it's not like we're taking these measurements in a vacuum. There are a lot of moving parts behind a running build (including network calls to the database), and at the subsecond level it seems likely you'd get a whole lot of noise rather than a more precise signal. 🤔

@@ -102,7 +103,7 @@ func (d *drainer) drainBuild(logger lager.Logger, build db.Build, syslog *Syslog
payload := log.Payload
tag := build.TeamName() + "/" + build.PipelineName() + "/" + build.JobName() + "/" + build.Name() + "/" + string(log.Origin.ID)

err = syslog.Write(d.hostname, tag, time.Unix(log.Time, 0), payload)
err = syslog.Write(d.hostname, tag, time.Unix(log.Time / 1000000000, int64(math.Remainder(float64(log.Time), 1000000000)) ), payload)

This comment has been minimized.

Copy link
@vito

vito Apr 1, 2019

Member

I think this can be slightly cleaned up like so:

Suggested change
err = syslog.Write(d.hostname, tag, time.Unix(log.Time / 1000000000, int64(math.Remainder(float64(log.Time), 1000000000)) ), payload)
err = syslog.Write(d.hostname, tag, time.Unix(log.Time / time.Second, log.Time % time.Second), payload)
@@ -2101,6 +2104,94 @@ all =
>> Query.children []
>> Query.index -1
>> Query.has [ text "v3.1.4" ]
, test "finist task lists initialization duration in tooltip" <|

This comment has been minimized.

Copy link
@vito

vito Apr 1, 2019

Member
Suggested change
, test "finist task lists initialization duration in tooltip" <|
, test "finish task lists initialization duration in tooltip" <|
>> Query.findAll [ tag "tr" ]
>> Query.index 0
>> Query.has [ text "initialization", text "10s" ]
, test "finist task lists step duration in tooltip" <|

This comment has been minimized.

Copy link
@vito

vito Apr 1, 2019

Member
Suggested change
, test "finist task lists step duration in tooltip" <|
, test "finish task lists step duration in tooltip" <|
@@ -17,7 +17,7 @@ type FinishTask struct {
}

func (FinishTask) EventType() atc.EventType { return EventTypeFinishTask }
func (FinishTask) Version() atc.EventVersion { return "4.0" }
func (FinishTask) Version() atc.EventVersion { return "5.0" }

This comment has been minimized.

Copy link
@vito

vito Apr 1, 2019

Member

⚠️ Bumps to event versions require copying the old event definition to deprecated_events.go so that clients can still handle the old event type (e.g. when doing a fly watch on an older build). The old event should then be registered here.

fly will have to be changed to handle nanosecond-precision timestamps here. It should probably handle old and new event types and 'normalize' the old ones so nanoseconds or something.

fly will have to be updated to handle LogV51, just like it handles LogV50 now. Same for all the other bumped events.

Sorry this is such a pain. I'd like to get away from this event-based system in the future - possibly as part of #3521.

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Apr 1, 2019

@jchesterpivotal Can you provide more details on why such high precision is important to you? I get that a lot can happen in a second, but transitioning to nanoseconds is nontrivial, and it's not like we're taking these measurements in a vacuum.

Partly because of the limits of PostgreSQL interval types, which behave badly in interval queries when they are "empty". Nanoseconds are the finest granularity I can ever expect to get in my lifetime and it's highly unlikely that two events will ever collide.

Right now millisecond resolution will probably avoid problems most of the time, but I will still need special cases because a lot of the time loading from the cache will be sub-millisecond due to file page caching. Microseconds are probably going to be enough for some time, but if you're going as far as microseconds ...

@marco-m

This comment has been minimized.

Copy link
Contributor

marco-m commented Apr 1, 2019

There are a lot of moving parts behind a running build (including network calls to the database), and at the subsecond level it seems likely you'd get a whole lot of noise rather than a more precise signal

+1

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Apr 1, 2019

I accept noise that will reduce the accuracy of my figures, but precision is still useful for me, because I need sufficient precision to create a reliable ordering of events with a lifetime > 0.

@vito

This comment has been minimized.

Copy link
Member

vito commented Apr 1, 2019

I accept noise that will reduce the accuracy of my figures, but precision is still useful for me, because I need sufficient precision to create a reliable ordering of events with a lifetime > 0.

I see - so having the higher precision allows you to determine the order of events in close timing situations?

Could you use the event ID instead? That's ultimately what determines the order of events from Concourse's perspective.

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Apr 1, 2019

Could you use the event ID instead? That's ultimately what determines the order of events from Concourse's perspective.

I miss out on the interval querying, which is super useful to me. If I have time intervals I can ask questions like "how many builds were active during [arbitrary window of time]?"

But PostgreSQL, at least for version 9.6, interprets [123,123) as being [123,). Then the [123,) doesn't show up properly in interval queries because it's not "between" anything. I can query occurs-before and occurs-after, but not occurs-during, occurs-overlapping etc.

@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Apr 1, 2019

The nanosecond change kind of explodes the complexity/risk of this, but if you're willing to follow through with the changes necessary, then I'm cool with it. 👍

As there are more changes than what I did, I think it would be better to split the PR and keep this one just to display duration of steps in the UI, and then I could continue on moving to nanoseconds in another PR.

if you agree I'll remove the change to nanoseconds from this one and start another

@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Apr 1, 2019

I think it would be better to split the PR

Yes, I think so. I somewhat derailed the original purpose. Sorry :|

@vito

This comment has been minimized.

Copy link
Member

vito commented Apr 1, 2019

@mockersf @jchesterpivotal Sounds good. Happy to discuss nanosecond precision stuff in another PR. 👍

@mockersf mockersf force-pushed the mockersf:task-duration branch from 07672a2 to 9f9689e Apr 1, 2019
@mockersf mockersf force-pushed the mockersf:task-duration branch from 9f9689e to b04aa81 Apr 1, 2019
@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Apr 1, 2019

@vito done, everything is back to seconds 🦊

@vito

This comment has been minimized.

Copy link
Member

vito commented Apr 3, 2019

@mockersf Sorry, looks like #3664 introduced some conflicts. Once this is rebased I'll keep an eye out for it and merge so you don't hit another conflict. 👍

@vito vito dismissed their stale review Apr 3, 2019

out of date

mockersf added 6 commits Mar 23, 2019
Signed-off-by: François Mockers <mockersf@gmail.com>
and time to events finish get / put

Signed-off-by: François Mockers <mockersf@gmail.com>
added Date to last event that didn't have it (StartTask)
keep dates of initialize, start and finish to get durations by step

Signed-off-by: François Mockers <mockersf@gmail.com>
Signed-off-by: François Mockers <mockersf@gmail.com>
Signed-off-by: François Mockers <mockersf@gmail.com>
Signed-off-by: François Mockers <mockersf@gmail.com>
@mockersf mockersf force-pushed the mockersf:task-duration branch from 8fcfaf9 to b8ff53b Apr 3, 2019
@mockersf

This comment has been minimized.

Copy link
Contributor Author

mockersf commented Apr 3, 2019

@vito it's rebased !

@vito
vito approved these changes Apr 3, 2019
Copy link
Member

vito left a comment

Works great! 🎊

@vito vito merged commit e9e1f5b into concourse:master Apr 3, 2019
5 checks passed
5 checks passed
DCO DCO
Details
WIP Ready for review
Details
concourse-ci/testflight Concourse CI build success
Details
concourse-ci/unit Concourse CI build success
Details
concourse-ci/watsjs Concourse CI build success
Details
@JohannesRudolph

This comment has been minimized.

Copy link
Contributor

JohannesRudolph commented Jun 4, 2019

I may be missing something... but has this made its way into concourse 5.2.0? Because I don't see no task durations on my dashboard :-(

@JohannesRudolph

This comment has been minimized.

Copy link
Contributor

JohannesRudolph commented Jun 4, 2019

Ah... it's on hover. Nevermind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.