Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Hubble Relay Kubernetes Readiness/Liveness check #28765

Merged
merged 3 commits into from Nov 15, 2023

Conversation

glrf
Copy link
Contributor

@glrf glrf commented Oct 24, 2023

Please ensure your pull request adheres to the following guidelines:

  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.

This commit improves the exiting Hubble Relay Readiness and liveness checks by using the gRPC Health Checking Protocol.

To do that we add the grpc_health_probe binary to the hubble relay container image and use exec probes. We can't use the built-in gRPC health checking as it doesn't yet support TLS and because we still support k8s versions before v1.23.

We also update the the existing gRPC health server to check connectivity to the peer service and hubble backends. The health server is only serving, if the peer service and at least one hubble observe service is available.

This PR also contains a relatively unrelated fix for graceful shutdown of hubble relay. It simplified the tests, but I can also split it into two PRs if requested

closes #23542

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 24, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Oct 24, 2023
@glrf glrf force-pushed the hubble-relay/health-probe branch 3 times, most recently from 73c3fe1 to 5ebbaae Compare October 25, 2023 07:48
@glrf glrf marked this pull request as ready for review October 25, 2023 11:39
@glrf glrf requested review from a team as code owners October 25, 2023 11:39
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, glad to see this implemented! Two minor comments around the Kubernetes manifest

@gandro gandro added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay area/helm Impacts helm charts and user deployment experience and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 25, 2023
@glrf glrf force-pushed the hubble-relay/health-probe branch 2 times, most recently from 7f67338 to bc8f699 Compare October 26, 2023 12:05
@glrf glrf requested a review from gandro October 26, 2023 12:39
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Glad to see that the local server simplifies the probe. I've left some additional feedback

@glrf glrf requested a review from gandro October 27, 2023 08:52
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I hope the gRPC probe will support accessing from localhost in the future, so we can switch it to listening on localhost only. But for now, this seems like the best approach

@dylandreimerink
Copy link
Member

/test

pkg/hubble/relay/pool/manager.go Show resolved Hide resolved
@glrf glrf force-pushed the hubble-relay/health-probe branch 2 times, most recently from a923464 to 6315553 Compare October 31, 2023 12:15
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@tklauser
Copy link
Member

/test

@aanm
Copy link
Member

aanm commented Nov 3, 2023

/test

@rolinh
Copy link
Member

rolinh commented Nov 3, 2023

/test

1.1.1.1 was down, failing connectivity tests. Hence re-running.

@squeed
Copy link
Contributor

squeed commented Nov 8, 2023

This PR will be ready-to-merge once all conversations are resolved. @glrf, can you take care of that? Then I'll click the big green button.

@julianwiedmann julianwiedmann added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Nov 14, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 14, 2023
@julianwiedmann
Copy link
Member

@glrf could you please rebase? The PR picked up some conflicts :/.

@julianwiedmann julianwiedmann removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 14, 2023
Signed-off-by: Fabian Fischer <fabian.fischer@isovalent.com>
This commit updates the relay gRPC health server to check connectivity
to the peer service and hubble backends. The health server is only
serving, if the peer service and at least one hubble observe service is
available.

Signed-off-by: Fabian Fischer <fabian.fischer@isovalent.com>
Switch from simply checking if the gRPC tcp socket is open, to using the
gRPC Health Checking Protocol.

To do that we add the grpc_health_probe[1] binary to the hubble relay
container image and use `exec probes` We can't use the built-in gRPC
health checking as it doesn't yet support TLS and because we still
support k8s versions before v1.23.

[1] https://github.com/grpc-ecosystem/grpc-health-probe

Signed-off-by: Fabian Fischer <fabian.fischer@isovalent.com>
@julianwiedmann
Copy link
Member

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 15, 2023
@julianwiedmann julianwiedmann merged commit 4b3309c into cilium:main Nov 15, 2023
61 checks passed
@kaworu kaworu removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Nov 15, 2023
@joestringer
Copy link
Member

Hi @glrf, PR #29111 is hitting some CI issues with hubble-relay starting up (link), do you think that this could be somehow related to this change? I haven't done much investigation, but I just noticed that this change went in recently and the other PR is hitting issues with hubble-relay. I don't know if this PR is related or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm Impacts helm charts and user deployment experience kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Hubble Relay Kubernetes Readiness/Liveness check