Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.13] health: only launch /hello after host datapath is ready #33253

Merged
merged 3 commits into from
Jul 2, 2024

Conversation

ti-mo
Copy link
Contributor

@ti-mo ti-mo commented Jun 19, 2024

Backporting some host endpoint-related fixes to hopefully address HOST_EP_ID 65535 being rendered into bpf_lxc's ep_config.h.

 27392 32521 32439

@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Jun 19, 2024
@ti-mo
Copy link
Contributor Author

ti-mo commented Jun 19, 2024

/test-backport-1.13

@ti-mo ti-mo marked this pull request as ready for review June 20, 2024 08:36
@ti-mo ti-mo requested a review from a team as a code owner June 20, 2024 08:36
@ti-mo
Copy link
Contributor Author

ti-mo commented Jun 20, 2024

/test-1.16-4.19

Error authenticating to ghcr

@marseel
Copy link
Contributor

marseel commented Jun 21, 2024

/test-1.16-4.19

@ti-mo
Copy link
Contributor Author

ti-mo commented Jun 24, 2024

/test-backport-1.13

Job 'Cilium-PR-K8s-1.21-kernel-4.19' hit: #30802 (90.90% similarity)

Job 'Cilium-PR-K8s-1.20-kernel-4.19' hit: #30802 (91.90% similarity)

@lmb
Copy link
Contributor

lmb commented Jun 25, 2024

/test-backport-1.13

Job 'Cilium-PR-K8s-1.19-kernel-4.19' failed:

Click to show.

Test Name

K8sAgentPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: connectivity-check pods are not ready after timeout

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-4.19/579/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@lmb
Copy link
Contributor

lmb commented Jun 25, 2024

Blocked on #33376 to fix a couple of workflows.

lmb added 3 commits July 1, 2024 09:58
[ upstream commit 8c97a21 ]

The code to restore endpoint state checks for the presence of
bpf_host.o to determine whether the host endpoint ID needs to
be restored. Use ep.IsHost instead to decouple the restore process
from what the loader is doing under the covers.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
[ upstream commit 916a2ce ]

Delay starting the /hello endpoint until we've loaded the host
datapath at least once. This means that the presence of /health
can be used to infer not only that the cilium unix socket API
is up but also that the datapath can do basic packet processing.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
[ upstream commit acf3141 ]
[ backporter's notes: only needed a single line from this patch, the rest
  was not compatible ]

BPF regeneration writes state into a new temporary directory. Once it
has succeeded we need to swap the old and new directory. This is currently
achieved by "backing up" the current state by renaming the directory.
This code has a bunch of corner cases around cleaning up old directories
and so on which are necessary since the synchronization isn't truly
atomic.

Instead, use the RENAME_EXCHANGE flag to atomically exchange the two
existing directories. Also use hard links to retain existing state
so that killing the agent during a synchronization doesn't lead
to corruption.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Timo Beckers <timo@isovalent.com>
@lmb
Copy link
Contributor

lmb commented Jul 1, 2024

/test-backport-1.13

Job 'Cilium-PR-K8s-1.20-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) Tests NodePort inside cluster (kube-proxy) with IPSec and externalTrafficPolicy=Local

Failure Output

FAIL: Request from k8s1 to service http://[fd04::11]:30863 failed

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.19/580/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.20-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.17-kernel-4.19' failed:

Click to show.

Test Name

K8sAgentHubbleTest Hubble Observe Test L3/L4 Flow

Failure Output

FAIL: hubble observe query timed out on "Exitcode: 0 \nStdout:\n \t {\"flow\":{\"time\":\"2024-07-01T10:46:23.135860186Z\",\"verdict\":\"FORWARDED\",\"ethernet\":{\"source\":\"f2:e8:03:71:e6:20\",\"destination\":\"9e:ae:67:dd:75:83\"},\"IP\":{\"source\":\"10.0.1.161\",\"destination\":\"10.0.1.192\",\"ipVersion\":\"IPv4\"},\"l4\":{\"TCP\":{\"source_port\":43796,\"destination_port\":80,\"flags\":{\"SYN\":true}}},\"source\":{\"ID\":1092,\"identity\":15918,\"namespace\":\"202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"labels\":[\"k8s:appSecond=true\",\"k8s:id=app2\",\"k8s:io.cilium.k8s.policy.cluster=default\",\"k8s:io.cilium.k8s.policy.serviceaccount=app2-account\",\"k8s:io.kubernetes.pod.namespace=202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"k8s:zgroup=testapp\"],\"pod_name\":\"app2-6dfff7bfbd-79cxb\",\"workloads\":[{\"name\":\"app2\",\"kind\":\"Deployment\"}]},\"destination\":{\"ID\":2744,\"identity\":6597,\"namespace\":\"202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"labels\":[\"k8s:id=app1\",\"k8s:io.cilium.k8s.policy.cluster=default\",\"k8s:io.cilium.k8s.policy.serviceaccount=app1-account\",\"k8s:io.kubernetes.pod.namespace=202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"k8s:zgroup=testapp\"],\"pod_name\":\"app1-755788dd65-795pd\",\"workloads\":[{\"name\":\"app1\",\"kind\":\"Deployment\"}]},\"Type\":\"L3_L4\",\"node_name\":\"k8s1\",\"event_type\":{\"type\":4},\"traffic_direction\":\"INGRESS\",\"trace_observation_point\":\"TO_ENDPOINT\",\"is_reply\":false,\"interface\":{\"index\":94},\"Summary\":\"TCP Flags: SYN\"},\"node_name\":\"k8s1\",\"time\":\"2024-07-01T10:46:23.135860186Z\"}\n\t {\"flow\":{\"time\":\"2024-07-01T10:46:23.135863261Z\",\"verdict\":\"FORWARDED\",\"ethernet\":{\"source\":\"f2:e8:03:71:e6:20\",\"destination\":\"9e:ae:67:dd:75:83\"},\"IP\":{\"source\":\"10.0.1.161\",\"destination\":\"10.0.1.192\",\"ipVersion\":\"IPv4\"},\"l4\":{\"TCP\":{\"source_port\":43796,\"destination_port\":80,\"flags\":{\"ACK\":true}}},\"source\":{\"ID\":1092,\"identity\":15918,\"namespace\":\"202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"labels\":[\"k8s:appSecond=true\",\"k8s:id=app2\",\"k8s:io.cilium.k8s.policy.cluster=default\",\"k8s:io.cilium.k8s.policy.serviceaccount=app2-account\",\"k8s:io.kubernetes.pod.namespace=202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"k8s:zgroup=testapp\"],\"pod_name\":\"app2-6dfff7bfbd-79cxb\",\"workloads\":[{\"name\":\"app2\",\"kind\":\"Deployment\"}]},\"destination\":{\"ID\":2744,\"identity\":6597,\"namespace\":\"202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"labels\":[\"k8s:id=app1\",\"k8s:io.cilium.k8s.policy.cluster=default\",\"k8s:io.cilium.k8s.policy.serviceaccount=app1-account\",\"k8s:io.kubernetes.pod.namespace=202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"k8s:zgroup=testapp\"],\"pod_name\":\"app1-755788dd65-795pd\",\"workloads\":[{\"name\":\"app1\",\"kind\":\"Deployment\"}]},\"Type\":\"L3_L4\",\"node_name\":\"k8s1\",\"event_type\":{\"type\":4},\"traffic_direction\":\"INGRESS\",\"trace_observation_point\":\"TO_ENDPOINT\",\"is_reply\":false,\"interface\":{\"index\":94},\"Summary\":\"TCP Flags: ACK\"},\"node_name\":\"k8s1\",\"time\":\"2024-07-01T10:46:23.135863261Z\"}\n\t {\"flow\":{\"time\":\"2024-07-01T10:46:23.136463470Z\",\"verdict\":\"FORWARDED\",\"ethernet\":{\"source\":\"f2:e8:03:71:e6:20\",\"destination\":\"9e:ae:67:dd:75:83\"},\"IP\":{\"source\":\"10.0.1.161\",\"destination\":\"10.0.1.192\",\"ipVersion\":\"IPv4\"},\"l4\":{\"TCP\":{\"source_port\":43796,\"destination_port\":80,\"flags\":{\"PSH\":true,\"ACK\":true}}},\"source\":{\"ID\":1092,\"identity\":15918,\"namespace\":\"202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"labels\":[\"k8s:appSecond=true\",\"k8s:id=app2\",\"k8s:io.cilium.k8s.policy.cluster=default\",\"k8s:io.cilium.k8s.policy.serviceaccount=app2-account\",\"k8s:io.kubernetes.pod.namespace=202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"k8s:zgroup=testapp\"],\"pod_name\":\"app2-6dfff7bfbd-79cxb\",\"workloads\":[{\"name\":\"app2\",\"kind\":\"Deployment\"}]},\"destination\":{\"ID\":2744,\"identity\":6597,\"namespace\":\"202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"labels\":[\"k8s:id=app1\",\"k8s:io.cilium.k8s.policy.cluster=default\",\"k8s:io.cilium.k8s.policy.serviceaccount=app1-account\",\"k8s:io.kubernetes.pod.namespace=202407011046k8sagenthubbletesthubbleobservetestl3l4flow\",\"k8s:zgroup=testapp\"],\"pod_name\":\"app1-755788dd65-795pd\",\"workloads\":[{\"name\":\"app1\",\"kind\":\"Deployment\"}]},\"Type\":\"L3_L4\",\"node_name\":\"k8s1\nStderr:\n \t \n"

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.17-kernel-4.19/591/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.17-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@lmb
Copy link
Contributor

lmb commented Jul 1, 2024

/test-1.17-4.19

@lmb
Copy link
Contributor

lmb commented Jul 1, 2024

/test-1.16-4.19

@lmb lmb enabled auto-merge (rebase) July 1, 2024 10:52
@lmb
Copy link
Contributor

lmb commented Jul 1, 2024

/test-1.20-4.19

@lmb
Copy link
Contributor

lmb commented Jul 1, 2024

/test-1.17-4.19

3 similar comments
@lmb
Copy link
Contributor

lmb commented Jul 1, 2024

/test-1.17-4.19

@lmb
Copy link
Contributor

lmb commented Jul 1, 2024

/test-1.17-4.19

@lmb
Copy link
Contributor

lmb commented Jul 1, 2024

/test-1.17-4.19

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 1, 2024
@julianwiedmann julianwiedmann requested a review from a team July 2, 2024 05:55
@lmb lmb merged commit 53de6bb into cilium:v1.13 Jul 2, 2024
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants