-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds workflow_job_seconds metric #5
Conversation
* Intention to support instantiation of server for integration testing. Side effect is improving readability. * Extracts root handler into func. * Error logging handled by main.
* Intention is to test against package interface using server_test package. Couldn't get to work with main_test. * Side effect is improved readability due to less global state. * ghActionsExporter private to package as created in Server setup
* Intention was to improve coverage of http server implementation. * Probably makes more sense to test the handlers directly for most cases.
* Backfills tests before behaviour changes. * CheckRun test only checks for 200 due to async behaviour.
Assumes job queue time can be calculated from difference between job start time and start time of first step. Labels: * runner_group to identify queues for specific runners * state queued as durations can also be published for completed jobs * no workflow label as it is not possible to identify workflow without additional API calls
* Simplifies testing by moving observation of workflow job metrics under interface. * Leaves other metrics untouched for now, even though inconsistent.
* Use custom serve mux to allow for creation of multiple servers in integration tests.
looks great, i will run locally, let me know when you are ready and then set the PR out from draft! |
Awesome, thanks! The build failed - the lint stage passes locally for me. The error suggests something about updating the tool so I've bumped the golangci-lint version to latest:
|
This is possible as it seems step started_at times are rounded to the closest second whereas job started_at time has miliseconds included.
I think this is ready for review. I've tested the exporter locally and ran a couple of workflows. I spotted a few issues which I've fixed up and added tests for. |
a8eab39
to
5528cde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge and if we found any issues we can fix that in a follow up
I need to refactor the release pipeline and decouple some things which I will do soon
thanks so much for this change it is super cool!
Fixes #4
This PR brings a number of improvements, mainly to support testing:
workflow_job_duration_seconds
workflow_job
eventsstate
which has two possible values:queued
- the amount of time the job spent queued waiting for a runnerin_progress
- the amount of time the job spent in progress whilst completing all stepsWorkflowJobObserver
interface created to allow for testing of observations without interacting with Prometheus lib.Server
struct to allow for start + stop of server with different configurations.The PR brings more changes than actually needed to add the new metric (i.e extracting behaviour to a
Server
struct in an additional package. I can try to unpick these changes into a smaller but I think the other changes are worth considering.The individual commits should largely be complete and documented.
Still TODO as part of this PR:
Remove check_run event handling- this doesn't make sense - we'd need to add the workflow event handler for that one and I'd rather do that as a separate PR.Recordworkflow_execution_time_seconds
using theworkflow_job
completed
eventStill TODO hopefully out of scope for this PR: