Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

health: Add flag to set HTTP port #16926

Merged
merged 4 commits into from Sep 29, 2021
Merged

Conversation

errordeveloper
Copy link
Contributor

@errordeveloper errordeveloper commented Jul 19, 2021

Towards: #15956

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 19, 2021
@errordeveloper errordeveloper changed the title [WIP] Add cluster-health-port health: Add flag to set HTTP port Jul 19, 2021
@errordeveloper errordeveloper removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 19, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 19, 2021
@errordeveloper errordeveloper added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jul 19, 2021
@errordeveloper errordeveloper force-pushed the health-port branch 4 times, most recently from 92e0679 to 52fc184 Compare July 19, 2021 15:58
@errordeveloper errordeveloper marked this pull request as ready for review July 19, 2021 15:59
@errordeveloper errordeveloper requested a review from a team July 19, 2021 15:59
@errordeveloper errordeveloper requested a review from a team as a code owner July 19, 2021 15:59
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the cleanup. I remember trying this two years ago, but never made it in.

There is a typo in a variable name (thus the code doesn't compile), but other than that only a couple nits.

pkg/health/server/prober.go Show resolved Hide resolved
pkg/health/server/server.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
@gandro gandro added the area/health Relates to the cilium-health component label Jul 19, 2021
@errordeveloper errordeveloper force-pushed the health-port branch 2 times, most recently from 8cdb54b to ced9ce8 Compare July 19, 2021 17:27
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor nits, but this is also a nice cleanup. Thanks.

daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
@errordeveloper
Copy link
Contributor Author

test-me-please

@errordeveloper
Copy link
Contributor Author

Looks like this PR is hitting cilium/cilium-cli#355 as well:

✨ [gke_***_us-west2-a_cilium-cilium-1217868111] Creating namespace for connectivity check...
✨ [gke_***_us-west2-a_cilium-cilium-1217868111] Deploying echo-same-node service...
✨ [gke_***_us-west2-a_cilium-cilium-1217868111] Deploying same-node deployment...
✨ [gke_***_us-west2-a_cilium-cilium-1217868111] Deploying client deployment...
✨ [gke_***_us-west2-a_cilium-cilium-1217868111] Deploying client2 deployment...
✨ [gke_***_us-west2-a_cilium-cilium-1217868111] Deploying echo-other-node service...
✨ [gke_***_us-west2-a_cilium-cilium-1217868111] Deploying other-node deployment...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for deployments [client client2 echo-same-node] to become ready...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for deployments [echo-other-node] to become ready...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for CiliumEndpoint for pod cilium-test/client-6488dcf5d4-2jx79 to appear...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for CiliumEndpoint for pod cilium-test/client2-65f446d77c-hlzzg to appear...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-697d5d69b7-b8z56 to appear...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-7967996674-t6rfl to appear...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for Service cilium-test/echo-same-node to become ready...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for NodePort 10.168.0.96:32553 (cilium-test/echo-other-node) to become ready...
⌛ [gke_***_us-west2-a_cilium-cilium-1217868111] Waiting for NodePort 10.168.0.96:32666 (cilium-test/echo-same-node) to become ready...
Connectivity test failed: timeout reached waiting for NodePort 10.168.0.96:32666 (cilium-test/echo-same-node)

https://github.com/cilium/cilium/runs/3557961582?check_suite_focus=true#step:14:29

@errordeveloper
Copy link
Contributor Author

And now this also: #11560

@errordeveloper
Copy link
Contributor Author

And a new one also: #17363

@errordeveloper
Copy link
Contributor Author

test-gke

Re-running to see if #17363 re-occurs.

@errordeveloper

This comment has been minimized.

@errordeveloper

This comment has been minimized.

Default port constant already exists, but a literal
integrer was used in the flag definition logic here.

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
These constants had been defined, yet no longer in use anywhere.

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
The original design of this logic was aimed at handling multiple
health probe paths, however only one remains presently.

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
This change makes it possible for user to set a custom port
for Cilium heath-checks.

Default port remains unchanged.

Towards: cilium#15956

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
@errordeveloper
Copy link
Contributor Author

/test

@aanm
Copy link
Member

aanm commented Sep 29, 2021

all required jobs have passed, merging.

@aanm aanm merged commit e624868 into cilium:master Sep 29, 2021
@errordeveloper errordeveloper deleted the health-port branch September 30, 2021 15:13
gandro added a commit to gandro/cilium that referenced this pull request Nov 30, 2021
To determine cluster health, Cilium exposes a HTTP server both on each
node, as well as on the artificial health endpoint running on each node.
The port used for this HTTP server is the same and can be configured via
`cluster-health-port` (introduced in cilium#16926) and defaults to 4240.

This commit fixes a bug where the port specified by
`cluster-health-port` was not passed to the Cilium health endpoint
responder. Which meant that `cilium-health-responder` was always
listening on the default port instead of the one configured by the user,
while the probe tried to connect via `cluster-health-port`. This
resulted in the cluster being reported us unhealthy whenever
`cluster-health-port` was set to a non-default value (which is the case
our OpenShift OLM for v1.11):

```
Nodes:
  gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost):
    Host connectivity to 10.0.128.2:
      ICMP to stack:   OK, RTT=634.746µs
      HTTP to agent:   OK, RTT=228.066µs
    Endpoint connectivity to 10.128.11.73:
      ICMP to stack:   OK, RTT=666.83µs
      HTTP to agent:   Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused
```

Fixes: e624868 ("health: Add a flag to set HTTP port")

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
qmonnet pushed a commit that referenced this pull request Nov 30, 2021
To determine cluster health, Cilium exposes a HTTP server both on each
node, as well as on the artificial health endpoint running on each node.
The port used for this HTTP server is the same and can be configured via
`cluster-health-port` (introduced in #16926) and defaults to 4240.

This commit fixes a bug where the port specified by
`cluster-health-port` was not passed to the Cilium health endpoint
responder. Which meant that `cilium-health-responder` was always
listening on the default port instead of the one configured by the user,
while the probe tried to connect via `cluster-health-port`. This
resulted in the cluster being reported us unhealthy whenever
`cluster-health-port` was set to a non-default value (which is the case
our OpenShift OLM for v1.11):

```
Nodes:
  gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost):
    Host connectivity to 10.0.128.2:
      ICMP to stack:   OK, RTT=634.746µs
      HTTP to agent:   OK, RTT=228.066µs
    Endpoint connectivity to 10.128.11.73:
      ICMP to stack:   OK, RTT=666.83µs
      HTTP to agent:   Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused
```

Fixes: e624868 ("health: Add a flag to set HTTP port")

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
qmonnet pushed a commit to qmonnet/cilium that referenced this pull request Dec 2, 2021
[ upstream commit c640c71 ]

To determine cluster health, Cilium exposes a HTTP server both on each
node, as well as on the artificial health endpoint running on each node.
The port used for this HTTP server is the same and can be configured via
`cluster-health-port` (introduced in cilium#16926) and defaults to 4240.

This commit fixes a bug where the port specified by
`cluster-health-port` was not passed to the Cilium health endpoint
responder. Which meant that `cilium-health-responder` was always
listening on the default port instead of the one configured by the user,
while the probe tried to connect via `cluster-health-port`. This
resulted in the cluster being reported us unhealthy whenever
`cluster-health-port` was set to a non-default value (which is the case
our OpenShift OLM for v1.11):

```
Nodes:
  gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost):
    Host connectivity to 10.0.128.2:
      ICMP to stack:   OK, RTT=634.746µs
      HTTP to agent:   OK, RTT=228.066µs
    Endpoint connectivity to 10.128.11.73:
      ICMP to stack:   OK, RTT=666.83µs
      HTTP to agent:   Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused
```

Fixes: e624868 ("health: Add a flag to set HTTP port")

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
joestringer pushed a commit that referenced this pull request Dec 2, 2021
[ upstream commit c640c71 ]

To determine cluster health, Cilium exposes a HTTP server both on each
node, as well as on the artificial health endpoint running on each node.
The port used for this HTTP server is the same and can be configured via
`cluster-health-port` (introduced in #16926) and defaults to 4240.

This commit fixes a bug where the port specified by
`cluster-health-port` was not passed to the Cilium health endpoint
responder. Which meant that `cilium-health-responder` was always
listening on the default port instead of the one configured by the user,
while the probe tried to connect via `cluster-health-port`. This
resulted in the cluster being reported us unhealthy whenever
`cluster-health-port` was set to a non-default value (which is the case
our OpenShift OLM for v1.11):

```
Nodes:
  gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost):
    Host connectivity to 10.0.128.2:
      ICMP to stack:   OK, RTT=634.746µs
      HTTP to agent:   OK, RTT=228.066µs
    Endpoint connectivity to 10.128.11.73:
      ICMP to stack:   OK, RTT=666.83µs
      HTTP to agent:   Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused
```

Fixes: e624868 ("health: Add a flag to set HTTP port")

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health Relates to the cilium-health component release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants