-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds metric for monitoring the service programming duration #32055
Adds metric for monitoring the service programming duration #32055
Conversation
e197036
to
1ac4e82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks Ovidiu!
8311f7a
to
f5e7ff6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ovidiutirla Thank you for this PR! I think we should perhaps revisit the naming here as service_programming is a little confusion since we are measure cilium k8s service event handling time.
/test |
0a23e49
to
ebb9caf
Compare
/test |
Looks like all tests passed, I'll squash the commits and rebase on main. |
fcf22cc
to
de3981d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! just one question
de3981d
to
a829122
Compare
The documentation says, "Duration in seconds to propagate the programming of a service [...] from the time the service was changed." The code as-written ignores queue latency. Is that desired? I assume not. I'd suggest setting the "start" timestamp in Ideally we could determine the original enqueue time in the Resource queue itself (thoughts, @joamaki), but that's overkill for this PR. |
eaf2edb
to
fa03a55
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the update to pkg/metrics/metrics.go could all go into the first commit, rather than having the description fixed in the second commit. But that's not blocking.
Looks good from my side, thanks!
Sorry, there was a bit of fat-fingering on the phone UI on my end for the auto-merge. The close/reopen was to re-trigger the image build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ovidiutirla Thank you for the updates!
fa03a55
to
5fadfa3
Compare
/test |
The metrics measures the execution time of the service event handler Signed-off-by: Ovidiu Tirla <otirla@google.com>
Signed-off-by: Ovidiu Tirla <otirla@google.com>
5fadfa3
to
f7fc729
Compare
/test |
Adds a new metric (
service_implementation_delay
) for monitoring the service programming duration. The metrics measures the execution time of the service event handler reflecting the time it took to program the service. From the time the service or pod was changed to the time the change was propagated.The metric also take into consideration the duration to program the in-cluster network (Network programming latency SLIs) which is useful to monitor and understand network bottlenecks in large clusters.
The metric should not impact the current performance of the handler allowing us to have it enabled by default.