Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus gRPC metrics for hubble and hubble-relay #20376

Merged
merged 9 commits into from
Jul 11, 2022

Conversation

chancez
Copy link
Contributor

@chancez chancez commented Jul 2, 2022

I wasn't sure if I should split this up into multiple PRs (one for hubble, one for relay, and maybe more for just adding interceptors vs the metrics middleware), but I didn't because then I would have to deal with adding the same dependency in each and deal with conflicts later, and the code is quite similar for the two subsystems. Let me know if you would prefer this is split up.

This PR:

  • Does a bit of refactoring for the hubble server to follow common Go conventions, and to support gRPC interceptors.
  • Adds support for gRPC interceptors on hubble relay
  • Adds a prometheus metrics endpoint to hubble relay
  • Adds prometheus gRPC middleware to both hubble and hubble relay

These metrics can help determine if errors are occurring with the hubble APIs, and diagnose potential performance impacts of making queries against hubble.

Additionally, by supporting configuring gRPC interceptors, we can further enhance our gRPC servers with additional middleware (eg: logging, tracing, auth, etc) in the future.

Add Prometheus gRPC metrics for hubble and hubble-relay

@chancez chancez requested a review from a team as a code owner July 2, 2022 00:08
@chancez chancez requested review from a team July 2, 2022 00:08
@chancez chancez requested a review from a team as a code owner July 2, 2022 00:08
@chancez chancez requested review from rolinh and sayboras July 2, 2022 00:08
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 2, 2022
@chancez chancez added kind/feature This introduces new functionality. area/metrics Impacts statistics / metrics gathering, eg via Prometheus. sig/hubble Impacts hubble server or relay labels Jul 2, 2022
@sayboras sayboras added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jul 4, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 4, 2022
@kaworu kaworu self-requested a review July 4, 2022 08:54
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work. I'm not familiar with interceptors or the Prometheus gRPC metrics package, so I mostly focused around the existing Hubble logic. A few minor things that I think need to be addressed.

pkg/hubble/observer/local_observer.go Outdated Show resolved Hide resolved
daemon/cmd/hubble.go Show resolved Hide resolved
pkg/hubble/server/server.go Outdated Show resolved Hide resolved
pkg/hubble/server/server.go Outdated Show resolved Hide resolved
pkg/hubble/server/server.go Show resolved Hide resolved
pkg/hubble/relay/server/server.go Outdated Show resolved Hide resolved
install/kubernetes/cilium/values.yaml.tmpl Outdated Show resolved Hide resolved
@chancez chancez force-pushed the pr/chancez/grpc_middleware branch from 67848d9 to 02baed5 Compare July 5, 2022 17:08
@chancez
Copy link
Contributor Author

chancez commented Jul 5, 2022

@gandro Thanks for the review, I've addressed your comments. LMK what you think.

@chancez chancez force-pushed the pr/chancez/grpc_middleware branch from 02baed5 to 6ef7e6b Compare July 5, 2022 17:16
@chancez chancez requested a review from gandro July 5, 2022 17:21
@chancez chancez force-pushed the pr/chancez/grpc_middleware branch 2 times, most recently from 4c2619e to e84ad3d Compare July 5, 2022 22:19
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this looks great. Thanks a lot!

pkg/hubble/relay/server/server.go Outdated Show resolved Hide resolved
@sayboras
Copy link
Member

sayboras commented Jul 6, 2022

/test

Job 'Cilium-PR-K8s-GKE' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Timed out after 31.644s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create one.

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very comprehensive PR 💯 , I have only two minor comments as per below.

Sample metrics list for other reviewer

# HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE grpc_server_handled_total counter
grpc_server_handled_total{grpc_code="Aborted",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
# HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
# TYPE grpc_server_msg_received_total counter
grpc_server_msg_received_total{grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_msg_received_total{grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_msg_received_total{grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_msg_received_total{grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_msg_received_total{grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_msg_received_total{grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_msg_received_total{grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_msg_received_total{grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
# HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.
# TYPE grpc_server_msg_sent_total counter
grpc_server_msg_sent_total{grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_msg_sent_total{grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_msg_sent_total{grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_msg_sent_total{grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_msg_sent_total{grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_msg_sent_total{grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_msg_sent_total{grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_msg_sent_total{grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0
# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_started_total{grpc_method="GetAgentEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_started_total{grpc_method="GetDebugEvents",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_started_total{grpc_method="GetFlows",grpc_service="observer.Observer",grpc_type="server_stream"} 0
grpc_server_started_total{grpc_method="GetNodes",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_started_total{grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_started_total{grpc_method="ServerStatus",grpc_service="observer.Observer",grpc_type="unary"} 0
grpc_server_started_total{grpc_method="Watch",grpc_service="grpc.health.v1.Health",grpc_type="server_stream"} 0

@chancez chancez force-pushed the pr/chancez/grpc_middleware branch from e84ad3d to 9fc526b Compare July 6, 2022 15:12
Copy link
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent 🚀
I left a non-blocking suggestion.

pkg/hubble/relay/server/option.go Outdated Show resolved Hide resolved
pkg/hubble/relay/server/option.go Outdated Show resolved Hide resolved
pkg/hubble/server/serveroption/option.go Outdated Show resolved Hide resolved
pkg/hubble/server/serveroption/option.go Outdated Show resolved Hide resolved
In Go, Serve() blocks, and it's the callers responsibiilty to decide if
it should run in a Go routine.

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Too much initialization was occurring in the Serve() function which
should generally only deal with listening on the socket/port and
starting the server.

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
The cancel() looks potentially unused in the local observer code here,
so added a comment indicating how it's used and why the cancellation is
there at all.

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
This is based off the same approach we take within
observer.NewLocalServer, which allows other packages to extend the
default options of Hubble relay the same way packages can extend the
options of the Hubble observer server.

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
@chancez chancez force-pushed the pr/chancez/grpc_middleware branch from 9fc526b to f99d8d7 Compare July 7, 2022 16:00
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.12.0 Jul 7, 2022
@joestringer
Copy link
Member

Even though it's late in 1.12 cycle, more metrics typically help with production operations and the risk here seems low. I didn't review the code in depth since it looked like others already provided sufficient attention.

@chancez
Copy link
Contributor Author

chancez commented Jul 7, 2022

/test

Job 'Cilium-PR-K8s-GKE' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Timed out after 61.397s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create one.

@chancez
Copy link
Contributor Author

chancez commented Jul 8, 2022

/mlh new-flake Cilium-PR-K8s-GKE

👍 created #20445

@chancez
Copy link
Contributor Author

chancez commented Jul 8, 2022

/test

@chancez
Copy link
Contributor Author

chancez commented Jul 8, 2022

/ci-l4lb

@chancez
Copy link
Contributor Author

chancez commented Jul 8, 2022

/ci-aks

@chancez
Copy link
Contributor Author

chancez commented Jul 11, 2022

@joestringer I believe the L4LB, AKS and GKE tests were disabled due to flakes, so at this point I think this is ready.

@joestringer
Copy link
Member

Travis didn't kick off on this PR for some reason, but I checked out the PR locally & ran make integration-tests. All tests passed there. Otherwise the failed and not-run jobs are the unstable ones that we have now marked as not required.

@joestringer joestringer merged commit 2c6e045 into cilium:master Jul 11, 2022
@chancez chancez deleted the pr/chancez/grpc_middleware branch July 11, 2022 22:42
@aanm aanm added backport-pending/1.12 backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed needs-backport/1.12 labels Jul 14, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport done to v1.12 in 1.12.0 Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. kind/feature This introduces new functionality. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay
Projects
No open projects
1.12.0
Backport done to v1.12
Development

Successfully merging this pull request may close these issues.

None yet

6 participants