-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Require cluster-wide connectivity before running tests #18153
ci: Require cluster-wide connectivity before running tests #18153
Conversation
/test |
94df4d3
to
3f12fc5
Compare
This comment has been minimized.
This comment has been minimized.
3f12fc5
to
169aae3
Compare
/test |
169aae3
to
ff97bfe
Compare
/test https://jenkins.cilium.io/job/Cilium-PR-K8s-1.22-kernel-4.19/75/ Edit: Seems like the readiness check is not sufficient, as it does not consider node count. Will try to modify |
ff97bfe
to
9fd19f0
Compare
/test Edit: Uncovered a bug in https://jenkins.cilium.io/job/Cilium-PR-K8s-1.23-kernel-4.9/77/ Reproducible on Vagrant via |
fef6ad5
to
e1a47c3
Compare
/test Runtime failed with a provisioning failure: https://jenkins.cilium.io/job/Cilium-PR-Runtime-net-next/810/ Edit: All Jenkins pipeline affected by these changes are green ✔️ Will re-run after pushing a few minor changes and mark ready for review. |
e1a47c3
to
a2d2e3d
Compare
/test |
This commit extends the cilium-health API to expose the status of secondary addresses of the cilium-health endpoint. To ensure backwards-compatibility of the cilium-health API and JSON output, the old field is deprecated and a new one is introduced. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit modifies the cilium-health server to populate the newly added `health-endpoint` field in node health status model. It now also includes the address and probe status of secondary endpoint paths. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit changes the way cilium health status is displayed in `cilium status` and `cilium-health status` with regards to secondary addresses (i.e. IPv6 host or health endpoint IP addresses in dual-stack setups). Firstly, in the brief output format (i.e. the output of `cilium status`), we do not consider a host or endpoint as "reachable" if a secondary path is unreachable. This is similar to how we already treat partial reachability for a health probe if ICMP succeeded but HTTP did not. Secondly, in `cilium-health status --verbose`, we now also print the status of secondary endpoint addresses (which we usually probe, but before this commit never displayed anywhere). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit changes the output of `cilium-health status` to always display the status of unhealthy secondary paths. This makes it easier for users to spot the reason why cilium health reports connectivity issues, as this information is usually hidden behind the `--verbose` flag. This approach is similar to how we already show unhealth nodes in the brief output. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit changes the semantics of the `unreachable_nodes` and `unreachable_health_endpoints` metrics to include nodes or endpoints where the secondary path (i.e. IPv6 in dual-stack) has no connectitivty. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This modifies the bugtool to collect the verbose `cilium-health status` output which includes information about secondary paths (i.e. IPv6 paths in dual-stack setups). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This changes the K8sHealthTest suite to use the newly introduced `health-endpoint` field insteaed of the deprecated `endpoint` field. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
To work around a bug where host firewall and endpoint routes drops very small packages on the return path. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
In our CI setup, we have a helper function called `ciliumHealthPreFlightCheck` which ensures that all nodes participating in CI have connectivity to each other. However, that check did not consider connectivity to remote endpoints or alternative paths. This commit extends that function to ensure that not only connectivity between the node has been established, but also that connectivity to the health endpoint both via ICMP and HTTP is healthy. This commit also introduces checks for secondary addresses (i.e. IPv6 addresses in dual-stack setups). If no secondary addresses are present, the JSONPath filter used here returns an empty string which will be ignored. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
6dc645e
to
8ccdf11
Compare
Addressed two nits, rebased and removed the commit which un-qurantines the test (and thus this PR needs no sign-off from team loadbalancer anymore). Main diff: diff --git a/pkg/health/client/client.go b/pkg/health/client/client.go
index ab2b2f088466..fe93c9834d55 100644
--- a/pkg/health/client/client.go
+++ b/pkg/health/client/client.go
@@ -274,7 +274,7 @@ func GetAllHostAddresses(node *models.NodeStatus) []*models.PathStatus {
}
// GetEndpointPrimaryAddress returns the PrimaryAddress for the health endpoint
-// within node. If node.Host is nil, returns nil.
+// within node. If node.HealthEndpoint is nil, returns nil.
func GetEndpointPrimaryAddress(node *models.NodeStatus) *models.PathStatus {
if node.HealthEndpoint == nil {
return nil
@@ -283,7 +283,8 @@ func GetEndpointPrimaryAddress(node *models.NodeStatus) *models.PathStatus {
return node.HealthEndpoint.PrimaryAddress
}
-// GetEndpointSecondaryAddresses returns the secondary host addresses (if any)
+// GetEndpointSecondaryAddresses returns the secondary health endpoint addresses
+// (if any)
func GetEndpointSecondaryAddresses(node *models.NodeStatus) []*models.PathStatus {
if node.HealthEndpoint == nil {
return nil |
/test Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
|
/test-gke |
/ci-multicluster |
This PR introduces more in-depth checks on
cilium-health
in CI.The first set of commits ensures that
cilium-health
properly reports connectivity failures in dual-stack setups. Before this PR,cilium-health
would probe the connectivity of secondary endpoint addresses, but not report any failures in the health API and thus also not display the failures incilium status
orcilium-health status
. To fix this, the API is extended to expose the status of the secondary endpoint addresses probes, and the formatting and output logic is improved to better display and report these kind of failures.The second set of commits modifies CI to make use of this new functionality, i.e.
ciliumHealthPreFlightCheck
is extended such that we wait for all paths to be healthy before we start a test suite. This likely fixes a a few flakes in e.g. direct routing mode, where we started the test suite even though the direct routes for IPv6 endpoint addresses have not yet been installed.Ref: #17895 (should fix that flake, but the test will not yet be unquarantined)