Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a gRPC interceptor for function metrics #5006

Merged
merged 5 commits into from Nov 21, 2023

Conversation

negz
Copy link
Member

@negz negz commented Nov 15, 2023

Description of your changes

Fixes #5001

This adds some basic RED metrics for functions:

  • RunFunctionRequests sent (count)
  • RunFunctionResponses received (count)
  • RunFunction duration (histogram)

All metrics are labelled by function name, package, and gRPC target. Responses are also labelled with gRPC status code and "severity". The gRPC status code represents whether the function successfully returned a response. The severity indicates whether the response was successful. We consider the severity of the response to be the severity of the most severe result in the response. So a response that returns a fatal result has a severity of fatal. This will allow folks to monitor how often a function is successfully returning a response, but a fatal response.

Example metrics from an E2E test run:

# HELP composition_run_function_request_total Total number of RunFunctionRequests sent.
# TYPE composition_run_function_request_total counter
composition_run_function_request_total{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_target="dns:///function-auto-ready.crossplane-system:9443"} 9
composition_run_function_request_total{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_target="dns:///function-dummy.crossplane-system:9443"} 14
# HELP composition_run_function_response_total Total number of RunFunctionResponses received.
# TYPE composition_run_function_response_total counter
composition_run_function_response_total{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal"} 9
composition_run_function_response_total{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal"} 14
# HELP composition_run_function_seconds Histogram of RunFunctionResponse latency (seconds).
# TYPE composition_run_function_seconds histogram
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="0.005"} 6
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="0.01"} 7
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="0.025"} 7
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="0.05"} 7
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="0.1"} 8
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="0.25"} 9
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="0.5"} 9
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="1"} 9
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="2.5"} 9
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="5"} 9
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="10"} 9
composition_run_function_seconds_bucket{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal",le="+Inf"} 9
composition_run_function_seconds_sum{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal"} 0.23078379300000001
composition_run_function_seconds_count{function_name="function-auto-ready",function_package="xpkg.upbound.io/crossplane-contrib/function-auto-ready:v0.1.2",grpc_code="OK",grpc_target="dns:///function-auto-ready.crossplane-system:9443",result_severity="Normal"} 9
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="0.005"} 9
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="0.01"} 9
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="0.025"} 10
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="0.05"} 10
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="0.1"} 13
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="0.25"} 13
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="0.5"} 13
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="1"} 13
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="2.5"} 14
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="5"} 14
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="10"} 14
composition_run_function_seconds_bucket{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal",le="+Inf"} 14
composition_run_function_seconds_sum{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal"} 1.5026575030000002
composition_run_function_seconds_count{function_name="function-dummy",function_package="xpkg.upbound.io/crossplane-contrib/function-dummy:v0.2.1",grpc_code="OK",grpc_target="dns:///function-dummy.crossplane-system:9443",result_severity="Normal"} 14

I have:

Need help with this checklist? See the cheat sheet.

@negz

This comment was marked as outdated.

@negz negz marked this pull request as ready for review November 16, 2023 05:41
@negz negz requested a review from a team as a code owner November 16, 2023 05:41
@negz negz requested a review from bobh66 November 16, 2023 05:41
if err != nil {
return errors.Wrap(err, "cannot load client TLS certificates")
}

m := xfn.NewMetrics()
metrics.Registry.MustRegister(m)
Copy link
Contributor

@sttts sttts Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics registration should go to an init func. Otherwise, you will never be able to create multiple instances of this (e.g. for testing) without a panic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's that cut-and-dry, and would generally prefer to setup things like metrics, loggers etc here in "main" 1 and plumb them down rather than relying on init functions hidden away in individual packages.

Footnotes

  1. This isn't literally func main but it's close enough in practice. I can't see us unit testing this plumbing.

This adds some basic metrics for functions:

* RPCs started (count)
* RPCs handled (count)
* RPC compleition time (histogram)

While we probably want metrics like this, I don't think this premade
interceptor will work for us. It's close, but we probably want function
name/target as a label and this doesn't seem to give us that.

Signed-off-by: Nic Cope <nicc@rk0n.org>
This allows us to add useful, function-specific metric labels including
what function is being called and the severity of the result.

Signed-off-by: Nic Cope <nicc@rk0n.org>
This is useful to be able to look at metrics after an E2E run.

Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
@negz negz merged commit 8413569 into crossplane:master Nov 21, 2023
17 of 18 checks passed
@negz negz deleted the the-metric-system branch November 21, 2023 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have Crossplane emit prometheus metrics about function runs
3 participants