Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium: Don't report health error when disabled #17146

Merged
merged 1 commit into from Oct 20, 2021

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Aug 12, 2021

When the user configures --enable-health-checking=false, 'cilium status'
would previously print:

  $ cilium status
  ...
  Cluster health:               Warning   cilium-health daemon unreachable

This would occur despite the user explicitly disabling the feature. To
better reflect whether the feature is enabled or not, check in the
status response whether there is a health endpoint. If there is one,
then the feature is enabled and we can query & print its status. If
there isn't one, then the feature is disabled and we won't get a
successful response anyway, so just report back that the feature is
disabled instead:

  $ cilium status
  ...
  Cluster health:               Probe disabled

@joestringer joestringer requested a review from a team as a code owner August 12, 2021 01:26
@joestringer joestringer added the release-note/misc This PR makes changes that have no direct user impact. label Aug 12, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 12, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 12, 2021
cilium/cmd/status.go Outdated Show resolved Hide resolved
cilium/cmd/status.go Show resolved Hide resolved
@joestringer
Copy link
Member Author

Thanks for the review, I addressed the feedback with this diff:

$ git dc
diff --git a/cilium/cmd/status.go b/cilium/cmd/status.go
index b58662c6446c..94cb9008d469 100644
--- a/cilium/cmd/status.go
+++ b/cilium/cmd/status.go
@@ -14,6 +14,7 @@ import (
        pkg "github.com/cilium/cilium/pkg/client"
        "github.com/cilium/cilium/pkg/command"
        healthPkg "github.com/cilium/cilium/pkg/health/client"
+       "github.com/cilium/cilium/pkg/health/defaults"
 
        "github.com/spf13/cobra"
 )
@@ -97,8 +98,9 @@ func statusDaemon() {
 
                healthEnabled := false
                for _, c := range sr.Controllers {
-                       if c.Name == "cilium-health-ep" {
+                       if c.Name == defaults.HealthEPName {
                                healthEnabled = true
+                               break
                        }
                }
                if healthEnabled {
diff --git a/daemon/cmd/health.go b/daemon/cmd/health.go
index c1dc93694e09..f62defb496b2 100644
--- a/daemon/cmd/health.go
+++ b/daemon/cmd/health.go
@@ -13,6 +13,7 @@ import (
        "github.com/cilium/cilium/pkg/cleanup"
        "github.com/cilium/cilium/pkg/controller"
        "github.com/cilium/cilium/pkg/endpoint"
+       "github.com/cilium/cilium/pkg/health/defaults"
        "github.com/cilium/cilium/pkg/k8s"
        "github.com/cilium/cilium/pkg/logging/logfields"
        "github.com/cilium/cilium/pkg/option"
@@ -50,7 +51,7 @@ func (d *Daemon) initHealth() {
        // Wait for the API, then launch the controller
        var client *health.Client
 
-       controller.NewManager().UpdateController("cilium-health-ep",
+       controller.NewManager().UpdateController(defaults.HealthEPName,
                controller.ControllerParams{
                        DoFunc: func(ctx context.Context) error {
                                var err error
diff --git a/pkg/health/defaults/defaults.go b/pkg/health/defaults/defaults.go
index d92a0ef21cd2..9e60da3842a3 100644
--- a/pkg/health/defaults/defaults.go
+++ b/pkg/health/defaults/defaults.go
@@ -25,4 +25,8 @@ const (
 
        // ServiceL7PathPort is used for probing service redirect path connectivity with L7
        ServiceL7PathPort = 4243
+
+       // EndpointName is the name used for the health endpoint, which is also
+       // used by the CLI client to detect when connectivity health is enabled
+       HealthEPName = "cilium-health-ep"
 )

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me, besides the nit that Tobias already mentioned.

@tklauser
Copy link
Member

tklauser commented Aug 19, 2021

test-me-please

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sPolicyTest Clusterwide policies Test clusterwide connectivity with policies

Failure Output

FAIL: Egress connectivity should be denied for pod "app2" in "2021081909562k8spolicytestclusterwidepoliciestestclusterwidecon" namespace

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create a new GitHub issue to track it.

@joestringer
Copy link
Member Author

Hitting #16938.

@joestringer
Copy link
Member Author

joestringer commented Sep 28, 2021

/test

Job 'Cilium-PR-K8s-1.20-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 192.168.36.11:80 from testclient-7llk4

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.20-kernel-4.19 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes with L7 policy Tests NodePort with L7 Policy

Failure Output

FAIL: Request from testclient-cb5tc pod to service tftp://[fd04::12]:31489/hello failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create a new GitHub issue to track it.

@joestringer
Copy link
Member Author

test-1.20-4.19 run hit #16122.
test-1.21-4.9 run hit #13011.

@aanm aanm added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Oct 1, 2021
When the user configures --enable-health-checking=false, 'cilium status'
would previously print:

  $ cilium status
  ...
  Cluster health:               Warning   cilium-health daemon unreachable

This would occur despite the user explicitly disabling the feature. To
better reflect whether the feature is enabled or not, check in the
status response whether there is a health endpoint. If there is one,
then the feature is enabled and we can query & print its status. If
there isn't one, then the feature is disabled and we won't get a
successful response anyway, so just report back that the feature is
disabled instead:

  $ cilium status
  ...
  Cluster health:               Probe disabled

Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Oct 19, 2021
@joestringer
Copy link
Member Author

joestringer commented Oct 19, 2021

/test

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDatapathConfig Etcd Check connectivity

Failure Output

FAIL: Pods are not ready in time: timed out waiting for pods with filter  to be ready: 4m0s timeout expired

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

EDIT: CI was broken due to VirtualBox 6.1 upgrade. Need to re-run.

@joestringer
Copy link
Member Author

joestringer commented Oct 20, 2021

/mlh new-flake Cilium-PR-K8s-GKE

👍 created #17657

@joestringer
Copy link
Member Author

joestringer commented Oct 20, 2021

/test

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDatapathConfig Host firewall With native routing

Failure Output

FAIL: Failed to reach 192.168.36.12:80 from testclient-6cqgz

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create a new GitHub issue to track it.

@joestringer
Copy link
Member Author

Failure is known flake #16664.

@joestringer joestringer merged commit 5cfd761 into cilium:master Oct 20, 2021
@joestringer joestringer deleted the submit/disable-health-report branch October 20, 2021 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants