Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Conformance Ginkgo: K8sDatapathLRPTests Checks local redirect policy LRP connectivity: error="reporter 10-datapath-regenerate has been stopped" #31147

Closed
joestringer opened this issue Mar 4, 2024 · 4 comments · Fixed by #31262
Assignees
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@joestringer
Copy link
Member

CI failure

PR failure: https://github.com/cilium/cilium/actions/runs/8147153559/job/22269541305
cilium-sysdumps (20).zip

K8sDatapathLRPTests Checks local redirect policy 
  LRP connectivity
  /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:515
22:30:47 STEP: Running BeforeAll block for EntireTestsuite K8sDatapathLRPTests
22:30:48 STEP: Ensuring the namespace kube-system exists
22:30:48 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs")
22:30:48 STEP: WaitforPods(namespace="kube-system", filter="-l k8s-app=cilium-test-logs") => <nil>
22:30:49 STEP: Running BeforeAll block for EntireTestsuite K8sDatapathLRPTests Checks local redirect policy
22:30:49 STEP: Installing Cilium
22:30:56 STEP: Waiting for Cilium to become ready
22:31:25 STEP: Validating if Kubernetes DNS is deployed
22:31:25 STEP: Checking if deployment is ready
22:31:26 STEP: Checking if kube-dns service is plumbed correctly
22:31:26 STEP: Checking if pods have identity
22:31:26 STEP: Checking if DNS can resolve
22:31:30 STEP: Kubernetes DNS is not ready: %!s(<nil>)
22:31:30 STEP: Restarting Kubernetes DNS (-l k8s-app=kube-dns)
22:31:31 STEP: Waiting for Kubernetes DNS to become operational
22:31:31 STEP: Checking if deployment is ready
22:31:32 STEP: Kubernetes DNS is not ready yet: only 0 of 2 replicas are available
22:31:32 STEP: Checking if deployment is ready
22:31:33 STEP: Kubernetes DNS is not ready yet: only 0 of 2 replicas are available
22:31:33 STEP: Checking if deployment is ready
22:31:34 STEP: Kubernetes DNS is not ready yet: only 0 of 2 replicas are available
22:31:34 STEP: Checking if deployment is ready
22:31:35 STEP: Kubernetes DNS is not ready yet: only 0 of 2 replicas are available
22:31:35 STEP: Checking if deployment is ready
22:31:36 STEP: Kubernetes DNS is not ready yet: only 0 of 2 replicas are available
22:31:36 STEP: Checking if deployment is ready
22:31:37 STEP: Kubernetes DNS is not ready yet: only 0 of 2 replicas are available
22:31:37 STEP: Checking if deployment is ready
22:31:38 STEP: Kubernetes DNS is not ready yet: only 0 of 2 replicas are available
22:31:38 STEP: Checking if deployment is ready
22:31:39 STEP: Kubernetes DNS is not ready yet: only 0 of 2 replicas are available
22:31:39 STEP: Checking if deployment is ready
22:31:40 STEP: Kubernetes DNS is not ready yet: only 1 of 2 replicas are available
22:31:40 STEP: Checking if deployment is ready
22:31:41 STEP: Checking if kube-dns service is plumbed correctly
22:31:41 STEP: Checking if pods have identity
22:31:41 STEP: Checking if DNS can resolve
22:31:42 STEP: Validating Cilium Installation
22:31:42 STEP: Performing Cilium controllers preflight check
22:31:42 STEP: Performing Cilium health check
22:31:42 STEP: Checking whether host EP regenerated
22:31:42 STEP: Performing Cilium status preflight check
22:31:44 STEP: Performing Cilium service preflight check
22:31:44 STEP: Performing K8s service preflight check
22:31:45 STEP: Waiting for cilium-operator to be ready
22:31:45 STEP: WaitforPods(namespace="kube-system", filter="-l name=cilium-operator")
22:31:45 STEP: WaitforPods(namespace="kube-system", filter="-l name=cilium-operator") => <nil>
22:31:46 STEP: WaitforPods(namespace="default", filter="-l role=frontend")
22:31:50 STEP: WaitforPods(namespace="default", filter="-l role=frontend") => <nil>
22:31:50 STEP: WaitforPods(namespace="default", filter="-l role=backend")
22:31:51 STEP: WaitforPods(namespace="default", filter="-l role=backend") => <nil>
22:31:51 STEP: WaitforPods(namespace="default", filter="-l role=lrpAddrBackend")
22:31:51 STEP: WaitforPods(namespace="default", filter="-l role=lrpAddrBackend") => <nil>
22:31:53 STEP: Checking traffic goes to local backend
Pod                           Ingress   Egress
grafana-6f4755f98c-vlfvn                
prometheus-67fdcf4796-z6d56             
k8s1-backend                            
k8s1-client                             
k8s2-backend                            
k8s2-client                             
coredns-76f75df574-8pvx8                
coredns-76f75df574-x8ls2                
Cilium agent 'cilium-64t8x': Status: Ok  Health: Ok Nodes "" ContainerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 29 Failed 0
Cilium agent 'cilium-kq54f': Status: Ok  Health: Ok Nodes "" ContainerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 51 Failed 0

</Checks>


• Failure in Spec Teardown (AfterEach) [136.897 seconds]
K8sDatapathLRPTests
/home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:461
  Checks local redirect policy
  /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:461
    LRP connectivity [AfterEach]
    /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:515

    Found 6 k8s-app=cilium logs matching list of errors that must be investigated:
    2024-03-04T22:31:37.194947916Z level=warning msg="Regeneration of endpoint failed" bpfCompilation=14.631117962s bpfLoadProg=515.994526ms bpfWaitForELF=14.631543195s bpfWriteELF=14.744104ms ciliumEndpointName=kube-system/coredns-76f75df574-85k62 containerID=24fea10bf3 containerInterface=eth0 datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3173 error="Error while configuring proxy redirects: context cancelled before waiting for proxy updates: context canceled" identity=686 ipv4=10.0.0.237 ipv6="fd02::d5" k8sPodName=kube-system/coredns-76f75df574-85k62 mapSync="121.004µs" policyCalculation=1.724529ms prepareBuild=10.524056ms proxyConfiguration="10.248µs" proxyPolicyCalculation="774.945µs" proxyWaitForAck="4.448µs" reason="syncing state to host" subsys=endpoint total=15.210727697s waitingForCTClean=552ns waitingForLock="45.164µs"
    2024-03-04T22:31:37.195980470Z level=error msg="endpoint regeneration failed" ciliumEndpointName=kube-system/coredns-76f75df574-85k62 containerID=24fea10bf3 containerInterface=eth0 datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3173 error="Error while configuring proxy redirects: context cancelled before waiting for proxy updates: context canceled" identity=686 ipv4=10.0.0.237 ipv6="fd02::d5" k8sPodName=kube-system/coredns-76f75df574-85k62 subsys=endpoint
    2024-03-04T22:31:37.196506781Z level=warning msg="could not set degraded status on subreporter \"9-datapath-regenerate\"" error="reporter 9-datapath-regenerate has been stopped" subsys=hive
    2024-03-04T22:31:37.381[298](https://github.com/cilium/cilium/actions/runs/8147153559/job/22269541305#step:16:300)791Z level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=704.95865ms bpfWaitForELF=14.621677359s bpfWriteELF=11.414126ms ciliumEndpointName=kube-system/coredns-76f75df574-qklfc containerID=907fa13179 containerInterface=eth0 datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1037 error="Error while configuring proxy redirects: context cancelled before waiting for proxy updates: context canceled" identity=686 ipv4=10.0.0.58 ipv6="fd02::f7" k8sPodName=kube-system/coredns-76f75df574-qklfc mapSync="91.58µs" policyCalculation=2.392209ms prepareBuild=21.725394ms proxyConfiguration="9.698µs" proxyPolicyCalculation="496.[304](https://github.com/cilium/cilium/actions/runs/8147153559/job/22269541305#step:16:306)µs" proxyWaitForAck="5.731µs" reason="syncing state to host" subsys=endpoint total=15.39751845s waitingForCTClean=461ns waitingForLock="1.642µs"
    2024-03-04T22:31:37.381371136Z level=error msg="endpoint regeneration failed" ciliumEndpointName=kube-system/coredns-76f75df574-qklfc containerID=907fa1[317](https://github.com/cilium/cilium/actions/runs/8147153559/job/22269541305#step:16:319)9 containerInterface=eth0 datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1037 error="Error while configuring proxy redirects: context cancelled before waiting for proxy updates: context canceled" identity=686 ipv4=10.0.0.58 ipv6="fd02::f7" k8sPodName=kube-system/coredns-76f75df574-qklfc subsys=endpoint
    2024-03-04T22:31:37.381373971Z level=warning msg="could not set degraded status on subreporter \"10-datapath-regenerate\"" error="reporter 10-datapath-regenerate has been stopped" subsys=hive

    /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:413

Looks like there's a race condition where endpoint regeneration was canceled somewhere during the test and this generated error/warning logs.

@joestringer joestringer added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Mar 4, 2024
@joestringer joestringer added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Mar 4, 2024
@bimmlerd
Copy link
Member

bimmlerd commented Mar 6, 2024

cc @tommyp1ckles this feels a bit silly - we're failing tests because we're failing to set status OK on the EP health reporter 😅
Could be a quick win, not even sure this deserves a warning log as afaik humans can't do much with it?
hit in https://github.com/cilium/cilium/actions/runs/8173644856/job/22346595464#step:16:480

@tommyp1ckles tommyp1ckles self-assigned this Mar 8, 2024
tommyp1ckles added a commit that referenced this issue Mar 8, 2024
Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurances should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
tommyp1ckles added a commit that referenced this issue Mar 13, 2024
Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurances should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
@jrajahalme
Copy link
Member

endpoint regeneration failed error logging should be gated against the context canceled condition since #19159, but maybe !errors.Is(regenError, context.Canceled) does not work on this error:

    2024-03-04T22:31:37.195980470Z level=error msg="endpoint regeneration failed" ciliumEndpointName=kube-system/coredns-76f75df574-85k62 containerID=24fea10bf3 containerInterface=eth0 datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3173 error="Error while configuring proxy redirects: context cancelled before waiting for proxy updates: context canceled" identity=686 ipv4=10.0.0.237 ipv6="fd02::d5" k8sPodName=kube-system/coredns-76f75df574-85k62 subsys=endpoint

Note how the context canceled is wrapped in Error while configuring proxy redirects: context cancelled before waiting for proxy updates: context canceled?

aditighag added a commit to aditighag/cilium that referenced this issue Mar 14, 2024
Code involving regeneration of endpoints checks for
context cancellations before logging errors for failed
scenarios. These errors are flagged in CI runs during
tear down phases leading to flakes.

Wrap these errors so that the call site can check for
valid non-error conditions.

Relates: cilium#31147

Signed-off-by: Aditi Ghag <aditi@cilium.io>
@aditighag
Copy link
Member

Hit on - https://github.com/cilium/cilium/actions/runs/8287906302/job/22681350615

  Found 2 k8s-app=cilium logs matching list of errors that must be investigated:
    2024-03-14T21:46:38.496655696Z level=warning msg="could not set OK status on subreporter \"11-datapath-regenerate\"" error="reporter 11-datapath-regenerate has been stopped" subsys=hive
    2024-03-14T21:46:38.521633026Z level=warning msg="could not set OK status on subreporter \"10-datapath-regenerate\"" error="reporter 10-datapath-regenerate has been stopped" subsys=hive

/cc @tommyp1ckles

jrajahalme added a commit that referenced this issue Mar 18, 2024
Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@gandro gandro mentioned this issue Mar 18, 2024
4 tasks
jrajahalme added a commit that referenced this issue Mar 18, 2024
Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Mar 18, 2024
Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme
Copy link
Member

Introducing proper error wrapping here: #31458

jrajahalme added a commit that referenced this issue Mar 18, 2024
Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Mar 18, 2024
Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
tommyp1ckles added a commit that referenced this issue Mar 18, 2024
Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurances should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
tommyp1ckles added a commit that referenced this issue Mar 18, 2024
Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurances should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
tommyp1ckles added a commit that referenced this issue Mar 18, 2024
When waiting for proxy endpoint completions during the bpf-regenerate step, if the endpoint regenerate context is cancelled we don't return a wrapped error.
Thus, when the error for the regenerate event is collected, context cancelled errors that should be dismissed for error logging and health updates are propegated causing CI failures.

This just fixes the error wrapping for e.waitForProxyCompletions(..)'s ctx cancel case to fix this.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
tommyp1ckles added a commit that referenced this issue Mar 18, 2024
Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurances should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
tommyp1ckles added a commit that referenced this issue Mar 18, 2024
When waiting for proxy endpoint completions during the bpf-regenerate step, if the endpoint regenerate context is cancelled we don't return a wrapped error.
Thus, when the error for the regenerate event is collected, context cancelled errors that should be dismissed for error logging and health updates are propegated causing CI failures.

This just fixes the error wrapping for e.waitForProxyCompletions(..)'s ctx cancel case to fix this.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
tommyp1ckles added a commit that referenced this issue Mar 18, 2024
Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
tommyp1ckles added a commit that referenced this issue Mar 18, 2024
When waiting for proxy endpoint completions during the bpf-regenerate step, if the endpoint regenerate context is cancelled we don't return a wrapped error.
Thus, when the error for the regenerate event is collected, context cancelled errors that should be dismissed for error logging and health updates are propegated causing CI failures.

This just fixes the error wrapping for e.waitForProxyCompletions(..)'s ctx cancel case to fix this.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
github-merge-queue bot pushed a commit that referenced this issue Mar 18, 2024
Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
github-merge-queue bot pushed a commit that referenced this issue Mar 18, 2024
Code involving regeneration of endpoints checks for
context cancellations before logging errors for failed
scenarios. These errors are flagged in CI runs during
tear down phases leading to flakes.

Wrap these errors so that the call site can check for
valid non-error conditions.

Relates: #31147

Signed-off-by: Aditi Ghag <aditi@cilium.io>
jrajahalme added a commit that referenced this issue Mar 19, 2024
Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
gandro pushed a commit that referenced this issue Mar 19, 2024
[ upstream commit 44e9005 ]

Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro pushed a commit that referenced this issue Mar 20, 2024
[ upstream commit 44e9005 ]

Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro pushed a commit that referenced this issue Mar 20, 2024
[ upstream commit 44e9005 ]

Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
jrajahalme pushed a commit that referenced this issue Mar 21, 2024
[ upstream commit 44e9005 ]

Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 3, 2024
Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
rzdebskiy pushed a commit to rzdebskiy/cilium that referenced this issue Apr 3, 2024
Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: cilium#31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
rzdebskiy pushed a commit to rzdebskiy/cilium that referenced this issue Apr 3, 2024
Code involving regeneration of endpoints checks for
context cancellations before logging errors for failed
scenarios. These errors are flagged in CI runs during
tear down phases leading to flakes.

Wrap these errors so that the call site can check for
valid non-error conditions.

Relates: cilium#31147

Signed-off-by: Aditi Ghag <aditi@cilium.io>
github-merge-queue bot pushed a commit that referenced this issue Apr 3, 2024
Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 4, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 5, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 5, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 5, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 16, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 16, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 16, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
aditighag pushed a commit that referenced this issue Apr 17, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
aditighag pushed a commit that referenced this issue Apr 17, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jrajahalme added a commit that referenced this issue Apr 18, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
ti-mo pushed a commit that referenced this issue Apr 23, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
ubergesundheit added a commit to giantswarm/cilium-upstream that referenced this issue Apr 25, 2024
* wireguard: Encrypt L7 proxy pkts to remote pods

[ upstream commit 26f83491643b8c9b2921544ad340d8de07e9138c ]

Marco reported that the following L7 proxy traffic is leaked (bypasses
the WireGuard encryption):

    1. WG: tunnel, L7 egress policy: forward traffic is leaked
    2. WG: tunnel, DNS: all DNS traffic is leaked
    3. WG: native routing, DNS: all DNS traffic is leaked

This was reported before the introduction of the --wireguard-encapsulate
[1].

The tunneling leak cases are obvious. The L7 proxy traffic got
encapsulated by the Cilium's tunneling device. This made it to bypass
the redirection to the Cilium's WireGuard device. However, [1] fixed
this behavior. For Cilium v1.15 (upcoming) nothing needs to be
configured. Meanwhile, for v1.14.4 users need to set
--wireguard-encapsulate=true.

The native routing case is more tricky. The L7 proxy taffic got a src IP
of a host instead of a client pod. So, the redirection was bypassed.
To fix this, we extended the redirection check to identify L7 proxy
traffic.

[1]: https://github.com/cilium/cilium/pull/28917

Reported-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>

* wireguard: Improve L7 proxy traffic detection

[ upstream commit 96e01adeaa51c85a5671d34c4715c07de97e26e1 ]

Use marks set by the proxy instead of assuming that each pkt from
HOST_ID w/o MARK_MAGIC_HOST belongs to the proxy.

In addition, in the tunneling mode the mark might get reset before
entering wg_maybe_redirect_to_encrypt(), as the proxy packets are
instead routed to from_host@cilium_host. The latter calls
inherit_identity_from_host() which resets the mark. In this case, rely
on the TC index.

Suggested-by: Gray Lian <gray.liang@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>

* images: bump cni plugins to v1.4.1

The result of running

```
images/scripts/update-cni-version.sh 1.4.1
```

Signed-off-by: André Martins <andre@cilium.io>

* images: update cilium-{runtime,builder}

Signed-off-by: André Martins <andre@cilium.io>

* bgpv1: Disable PodCIDR Reconciler for unsupported IPAM modes

[ upstream commit 276499405242e03b01da54f2a1f35891db56783a ]

[ backporter's note: Discarded document changes. We'll backport it
  together with other recent document changes. ]

PodCIDR shouldn't take any effect for the unsupported IPAM modes. Modify
ExportPodCIDRReconciler's constructor to not provide ConfigReconciler
for unsupported IPAMs.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>

* Prepare for release v1.15.2

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

* install: Update image digests for v1.15.2

Generated from https://github.com/cilium/cilium/actions/runs/8266651120.

`quay.io/cilium/cilium:v1.15.2@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746`
`quay.io/cilium/cilium:stable@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746`

`quay.io/cilium/clustermesh-apiserver:v1.15.2@sha256:478c77371f34d6fe5251427ff90c3912567c69b2bdc87d72377e42a42054f1c2`
`quay.io/cilium/clustermesh-apiserver:stable@sha256:478c77371f34d6fe5251427ff90c3912567c69b2bdc87d72377e42a42054f1c2`

`quay.io/cilium/docker-plugin:v1.15.2@sha256:ba4df0d63b48ba6181b6f3df3b747e15f5dfba06ff9ee83f34dd0143c1a9a98c`
`quay.io/cilium/docker-plugin:stable@sha256:ba4df0d63b48ba6181b6f3df3b747e15f5dfba06ff9ee83f34dd0143c1a9a98c`

`quay.io/cilium/hubble-relay:v1.15.2@sha256:48480053930e884adaeb4141259ff1893a22eb59707906c6d38de2fe01916cb0`
`quay.io/cilium/hubble-relay:stable@sha256:48480053930e884adaeb4141259ff1893a22eb59707906c6d38de2fe01916cb0`

`quay.io/cilium/operator-alibabacloud:v1.15.2@sha256:e2dafa4c04ab05392a28561ab003c2894ec1fcc3214a4dfe2efd6b7d58a66650`
`quay.io/cilium/operator-alibabacloud:stable@sha256:e2dafa4c04ab05392a28561ab003c2894ec1fcc3214a4dfe2efd6b7d58a66650`

`quay.io/cilium/operator-aws:v1.15.2@sha256:3f459999b753bfd8626f8effdf66720a996b2c15c70f4e418011d00de33552eb`
`quay.io/cilium/operator-aws:stable@sha256:3f459999b753bfd8626f8effdf66720a996b2c15c70f4e418011d00de33552eb`

`quay.io/cilium/operator-azure:v1.15.2@sha256:568293cebc27c01a39a9341b1b2578ebf445228df437f8b318adbbb2c4db842a`
`quay.io/cilium/operator-azure:stable@sha256:568293cebc27c01a39a9341b1b2578ebf445228df437f8b318adbbb2c4db842a`

`quay.io/cilium/operator-generic:v1.15.2@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088`
`quay.io/cilium/operator-generic:stable@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088`

`quay.io/cilium/operator:v1.15.2@sha256:e592ceba377985eb4225b0da9121d0f8c68a564ea38e5732bd6d59005eb87c08`
`quay.io/cilium/operator:stable@sha256:e592ceba377985eb4225b0da9121d0f8c68a564ea38e5732bd6d59005eb87c08`

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

* gha: drop unused check_url environment variable

[ upstream commit e17cf21d9720493766c9f1d12c2d75c842f26e86 ]

This variable used to be used in combination with the Sibz/github-status-action
action, which we replaced with myrotvorets/set-commit-status-action when
reworking the workflows to be triggered by Ariane [1]. Given it is now
unused, let's get rid of the leftover environment variable, so that we
also stop copying it to new workflows.

[1]: 9949c5a1891a ("ci: rework workflows to be triggered by Ariane")

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* gha: centralize kind version and image definition in set-env-variables

[ upstream commit 394b3de26a4e2235ec25399861e12886b507f335 ]

Let's define kind-related variables (i.e., version, k8s image and k8s
version) inside the set-env-variables action. One all consumers will
have been migrated through the subsequent commit, this will ensure
consistency across workflows, simplify version bumps as well as the
introduction of new workflows depending on them. One extra byproduct
is that renovate updates will also stop requesting reviews from all
the different teams owning each specific workflow.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* gha: migrate workflows to use the global kind-related variables

[ upstream commit aabdfa73d3d83edda3935277bd08c4c4c0bf5b68 ]

Let's switch all the workflows over to using the globally defined
kind-related variables, and remove the workflow specific definitions.
This also addresses a few cases which didn't specify any version.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* gha: don't wait for kind clusters to become ready

[ upstream commit 39637d6d2385baab556078d844f431522d99f616 ]

They will never, because no CNI is present at that point. Hence, let's
just avoid wasting one minute waiting for the timeout to expire.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* gha: checkout target branch instead of the default one

[ upstream commit 6716a9c01b69d88da0c3316fe8b3640180bbafb1 ]

Currently, the GHA workflows running tests triggered on pull_request
and/or push events initially checkout the default branch to configure
the environment variables, before retrieving the PR head. However,
this is problematic on stable branches, as we then end up using the
variables from the default (i.e., main) branch (e.g., Kubernetes
version, Cilium CLI version), which may not be appropriate here.

Hence, let's change the initial checkout to retrieve the target (i.e.,
base) branch, falling back to the commit in case of push events.
This ensure that we retrieve the variables from the correct branch,
and matches the behavior of Ariane triggered workflows.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* gha: switch kind images back to kind

Kind has released stable versions for k8s 1.29 so we can use this
image instead of the cilium kindest for ginkgo tests. The same
version has already been configured for the rest of the workflows
in the previous commits.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* loader: fix cancelled context during compile logging errors.

[ upstream commit 70b405f32018af84ad8221e4bafb223a70c23736 ]

On Linux/Unix based implementations, exec/cmd.Run will return either context.ContextCancelled or the error "signal: killed" depending on whether the cancellation occurred while the process was running.

There's several places we check on ```is.Errors(err, context.Cancelled)``` on whether to emit high level logs about failed program compilations.  Because already running cmd.Run() doesn't return an error that satisfies this, this will result in spurious error logs about failed compilation (i.e. "signal: killed")

This meant that in cases where a compilation is legitimately cancelled, we would still log an error such as

msg="BPF template object creation failed" ... error="...: compile bpf_lxc.o: signal: killed"

This can occur occasionally in CI, which enforces no error to pass, causing failures.

example:
```
	ctx, c := context.WithTimeout(context.Background(), time.Second)
	go func() {
		time.Sleep(time.Second)
		c()
	}()
	cmd := exec.CommandContext(ctx, "sleep", "2")
	fmt.Println(cmd.Run())

	ctx, c = context.WithTimeout(context.Background(), time.Second)
	c()
	cmd = exec.CommandContext(ctx, "sleep", "2")
	fmt.Println(cmd.Run())
```

To fix this, this will join in the ctx.Err() if it is:
* context.Cancelled
* The process has not exited itself.
* The process appeared to be SIGKILL'ed.

Addresses: #30991

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* cni: use default logger with timestamps.

[ upstream commit a099bf1571f1a090ccfd6ccbba545828a6b3b63c ]

Unlike runtime agent/operator logs, CNI logs are just written to disk so we have no way to attach timestamps to them.
This makes it harder to debug CNI issues as we have no way to correlate when things happened between Agent logs and CNI events.

This switches CNI to use the same default logger, except with timestamps enabled.

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* gateway-api: Bump to the latest version from upstream

[ upstream commit d50c2e4fb057f3ca041a7c346cac8c12e0f4a422 ]

This is mainly to pick up the new GRPC Conformance tests added recently.

Relates: kubernetes-sigs/gateway-api#2745
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* gateway: Remove manual GRPCRoute test

[ upstream commit f77f3c385e885eab1e0c58cf8d80c99244802637 ]

This is to remove our manual GRPCRoute test, in favour of more
comprehensive tests added recently in upstream.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* cilium-dbg: Don't fatal on XFRM rule deletion errors

[ upstream commit 927969b247ed8f8b499988274a23e8ca2da42346 ]

This commit slightly changes the behavior of the "encrypt flush"
command in case of errors when trying to delete XFRM rules. The tool
currently lists rules, filters them based on user-given arguments, and
then deletes them. If an XFRM rule is deleted by the agent or the user
while we're filtering, the deletion will fail.

The current behavior in that case is to fatal. On busy clusters, that
might mean that we always fatal because XFRM states and policies are
constently added and removed.

This commit changes the behavior to proceed with subsequent deletions in
case one fails.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* cilium-dbg: Refactor confirmation message for encrypt flush

[ upstream commit 5c2a67fcd306329abb8f5be0a7bac753141bfea6 ]

This commit refactors the code a bit simplify a latter commit. No
functional changes.

This may be a bit excessive in commit splitting, but at least I can
claim my last commit is free of any refactoring :sweat_smile:

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* cilium-dbg: New --stale flag for encrypt flush

[ upstream commit 5eb27e25b38bc6d073e96835f674d0748176d49e ]

This new flag will allow users to clean stale XFRM states and policies
based on the node ID map contents. If XFRM states or policies are found
with a node ID that is not in the BPF map, then we probably have a leak
somewhere.

Such leaks can lead in extreme cases to performance degradation when the
number of XFRM states and policies grows large (and if using ENI or
Azure IPAM). Having a tool to cleanup these XFRM states and policies
until the leak is fixed can therefore be critical.

The new flag is incompatible with the --spi and --node-id filter flags.

We first dump the XFRM rules and then dump the map content. In that way,
if a new node ID is allocated while we're running the tool, we will
simply ignore the corresponding XFRM rules. If a node ID is removed
while running the tool, we will fail to remove the corresponding XFRM
rules and continue with the others.

Tested on a GKE cluster by adding fake XFRM states and policies that the
tool was able to remove.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* k8s_install.sh: specify the CNI version

[ upstream commit f92b528abc30a5c66ba80d5922d4e58c48cfe7e1 ]

The CNI version should be specify so that in case we have to fallback
the installation of k8s via binaries it doesn't fail with the error:

```
10:29:25      k8s1-1.25: gzip: stdin: not in gzip format
10:29:25      k8s1-1.25: tar: Child returned status 1
10:29:25      k8s1-1.25: tar: Error is not recoverable: exiting now
```

Fixes: ce69afdc3ad1 ("add support for k8s 1.25.0")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* statedb: Add read-after-write tests

[ upstream commit 070e4deddcdbd892bf35be1d6224237d533ef27b ]

The optimization to reuse the last used "indexReadTxn" was broken
when a WriteTxn was first used for a read and then for a write
followed by a read (the last read re-used an old transaction).

Add test to observe this bug.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* statedb: Fix read-after-write bug

[ upstream commit 52944a032150e8040869c20bead7ef05ae32dbe3 ]

[ backporter's note: minor conflicts due to different variables' names]

The optimization in *txn to hold the last used index read transaction
to avoid a hash map lookup was broken as the write txn used for reading
was cloned, leading to not being able to read any of the future writes
in the same transaction.

Benchmark show that this optimization does not actually help, so solve
the problem by removing the optimization.

Before:
BenchmarkDB_RandomLookup-8       1000000              1471 ns/op
After:
BenchmarkDB_RandomLookup-8       1000000              1485 ns/op

Fixes: d0d4d46992bf ("statedb: Store index unique info in the index tree entry")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* introduce ARM github workflows

[ upstream commit 7a301a48c55c427714261e0686699fc2f63d2d31 ]

This commit adds the GH workflow to run on arm machines. This
effectively means that we can remove our travis integration and only use
GH actions from now on.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* metrics: Disable prometheus metrics by default

[ upstream commit f31cbef727b3cb252d6117f48fd0bb24106d9876 ]

Prior to commit c49ef45399fc9, the prometheus metrics server was
disabled by default in Cilium. Typically we would expect users to
specify in helm both `prometheus.enabled` and `prometheus.port` to
determine how to configure the prometheus server to ensure that the
prometheus port doesn't also conflict with other services that the user
is running on their nodes in their clusters.

With the refactor in the aforementioned commit, the default was set to
`:9962`. This means that even if the user installed Cilium without
prometheus settings, or explicitly configured helm with
`prometheus.enabled: false`, the prometheus metrics server would be
enabled.

This patch reverts the default back to the pre-v1.14 default.

Fixes: c49ef45399fc ("metrics: Modularize daemon metrics registry")
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* ingress: Update docs with network policy example

[ upstream commit b68cf99c3bb50a6a06dbbb0fd3085e5430297eac ]

This is to extend the existing basic ingress docs with external lockdown
CCNP, while still allows in-cluster traffic to Ingress LB IP.

Relates: https://github.com/cilium/cilium/pull/28126
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* Downgrade L2 Neighbor Discovery failure log to Debug

[ upstream commit db7b3efeb4176b7a993816cb095da93ca184e968 ]

Suppress the "Unable to determine next hop address" logs. While it shows
the L2 neighbor resolution failure, it does not always indicate a
datapath connectivity issue.

For example, when "devices=eth+" is specified and the device
naming/purposing is not consistent across the nodes in the cluster, in
some nodes "eth1" is a device reachable to other nodes, but in some
nodes, it is not. As a result, L2 Discovery generates an "Unable to
determine next hop address".

Another example is ENI mode with automatic device detection. When
secondary interfaces are added, they are used for L2 Neighbor Discovery
as well. However, nodes can only be reached via the primary interface
through the default route in the main routing table. Thus, L2 Neighbor
Discovery generates "Unable to determine next hop address" for secondary
interfaces.

In both cases, it does not always mean the datapath has an issue for KPR
functionality. However, the logs appear repeatedly, are noisy, and the
message is error-ish, causing confusion.

This log started to appear for some users who did not see it before from
v1.14.3 (#28858) and v1.15.0 (in theory). For v1.14.3, it affects KPR +
Tunnel users because of f2dcc86. Before the commit, we did not perform
L2 Neighbor Discovery in tunnel mode, so even if users had an interface
unreachable to other nodes, the log did not appear.

For v1.15.0, it affects to the users who used to have the unreachable
interface. 2058ed6a made it visible. Before the commit, some kind of the
errors like EHOSTUNREACH and ENETUNREACH were not caught because
FIBMatch option didn't specified. After v1.15.0, users started to see
the log.

Fixes: #28858

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* gha: additionally test host firewall + KPR disabled in E2E tests

[ upstream commit 77a0c6b0424b8fe89459ff74787b2e46461731aa ]

Let's additionally enable host firewall on a couple of existing matrix
entries associated with KPR disabled, so that we can additionally cover
this configuration and prevent regressions.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* bgpv1: Adjust ConnectionRetryTimeSeconds to 1 in component tests

[ upstream commit 777b58037491951212ed874165563aac17e61763 ]

Set ConnectionRetryTimeSeconds in the component tests to 1s in component
tests unless it is specified explicitly. Otherwise, when the initial
connect fails, we need to 120s for the next connection by default, which
may longer than the timeout of the test itself.

Fixes: #31217

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* bpf: Enable monitor aggregation for all events in bpf_network.c

[ upstream commit 81f14bbd4ebe898c48a918822ed30fe42ed5620d ]

This commit adjusts the usage of send_trace_notify in bpf_network.c to
enable monitor aggregation for all events emitted at this observation
point in the datapath. This change helps improve resource usage by
reducing the overall number of events that the datapath emits, while
still enabling packet observability with Hubble.

The events in bpf_network.c enable observability into the IPSec
processing of the datapath. Before this commit, multiple other efforts
have been made to increase the aggregation of events related to IPSec to
reduce resource usage, see #29616 and #27168. These efforts were related
to packets that were specifically marked as encrypted or decrypted by
IPSec and did not include events in bpf_network.c that were emitted when
either: (a) a plaintext packet has been received from the network, or
(b) a packet was decrypted and reinserted into the stack by XFRM. Both
of these events are candidates for aggregation because similar to-stack
events will be emitted down the line in the datapath anyways.
Additionally, these events are mainly useful for root-cause
analysis or debugging and are not necessarily helpful from an overall
observability standpoint.

Signed-off-by: Ryan Drew <ryan.drew@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* endpointmanager: Improve health reporter messages when stopped

[ upstream commit 02d04b34005c45d955a6850d3faf561d9eae8f78 ]

The health reporter messages could use a bit more clarity and another
was a copy-paste mistake.

Fixes: bb957f3821 ("endpointmanager: add modular health checks to epm componenets.")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>

* bpf, tests: Force v6 address into 'rodata' section

[ upstream commit 39e65e99d9397b6a036a049d4830322261de6920 ]

This commit forces v6 addresses in the pkgten.h header into the .rodata
section of compiled ELF binaries. This prevents a possible mismatch
between ELF and BTF for the locations of these v6 symbols.

Signed-off-by: Ryan Drew <ryan.drew@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* ingress: don't emit an error message on namespace termination

[ upstream commit 3b8c517c95e6167a531e24e9d0fea5c1416478d4 ]

When a namespace containing an Ingress configured in dedicated mode gets
deleted, the controller may first receive the event corresponding to the
deletion of one of the dependent resources, and only afterwards of the
ingress itself. Hence, it will attempt to recreate these resources (e.g.,
the CiliumEnvoyConfig), but fail as rejected by the k8s APIServer with an
error along the lines of:

  failed to create or update CiliumEnvoyConfig: ciliumenvoyconfigs.cilium.io
    \"cilium-ingress-cilium-test-ingress-service\" is forbidden: unable to
    create new content in namespace cilium-test because it is being terminated

Let's replace this error with an informative message at lower severity,
given that the situation resolves itself, and there's no actionable item
for a user reading the error. This also prevents the error from being
flagged by the Cilium CLI no-errors-in-logs check.

Since we don't return error to controller-runtime, the reconciliation will
not be retried automatically. However, we will be woken up again once the
ingress deletion event is received, so that we can proceed with the cleanup.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* ingress: add unit test for namespace termination error

[ upstream commit 00e45d7a19ff46dafcb7cd99a737c6d240e84e81 ]

[ backporter's notes: adapted the newIngressReconciler parameters as
  appropriate for v1.15. ]

This commit adds a unit test for the case where reconciliation shouldn't
omit an error if resource creation is failing due to Namespace termination.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* envoy: enable k8s secret watch even if only CEC is enabled

Currently, the K8s Secret watch (used by Envoy SecretSync (K8s TLS Secret -> Envoy SDS))
is only active if either Ingress Controller or Gateway API is enabled.

Hence Secrets aren't available via SDS in cases where only CiliumEnvoyConfig is
enabled (`--enable-envoy-config`).

This commit fixes this by enabling the K8s Secret watch also in cases where only
CiliumEnvoyConfig is enabled (without Ingress Controller and/or Gateway API
being enabled).

Fixes: #26005

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

* chore(deps): update dependency cilium/cilium-cli to v0.16.3

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* Update --nodes-without-cilium flag

The --nodes-without-cilium flag type changed from string slice to
boolean in cilium/cilium-cli#2427.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>

* chore(deps): update gcr.io/distroless/static-debian11:nonroot docker digest to 55c6361

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* k8s/watchers: inline single-use updateEndpointLabels

[ upstream commit bba0ff569e10e902747ae8df0761f96374a10f5f ]

The functions updateEndpointLabel is only used in one place. Inline it
to improve readability and simplify changes in successive commits.

Signed-off-by: Tobias Klauser <tobias@cilium.io>

* k8s/watchers: warn when endpoint label update fails on pod update

[ upstream commit 9a26446df2acad3c03c2413aca4d95f1e9bd7406 ]

Currently, failure to update endpoint labels based on pod labels on pod
update is silently ignored by the callers or only reflected in error
count metrics. Report a warning to clearly indicate that pod and
endpoint labels might be out of sync.

Signed-off-by: Tobias Klauser <tobias@cilium.io>

* k8s: move filterPodLabels to k8s/utils package for SanitizePodLabels

[ upstream commit 23098051f1994e5ac69d181680556de28c3e5540 ]

Currently GetPodMetadata is the only caller of SanitizePodLabels but
other callers will be introduced in successive changes. This change
ensures the io.cilium.k8s.* labels are filtered for these callers as
well.

Signed-off-by: Tobias Klauser <tobias@cilium.io>

* k8s/watchers: set unfiltered pod labels on CEP on pod update

[ upstream commit 5508746586091ffa2c72e24362ec62781a4ce2ad ]

The labels on the CEP are set to the unfiltered pod labels on CEP
creation, see [1]. On any label update where labels contain filtered
labels, e.g. io.cilium.k8s.* labels or labels filtered out by the user
by means of the --label and/or --label-prefix-file agent options the
current logic would wrongly remove the filtered labels from the CEP
labels. Fix this by always using the unfiltered pod labels.

[1] https://github.com/cilium/cilium/blob/b58125d885edbb278f11f84303c0e7c934ca7ea4/pkg/endpointmanager/endpointsynchronizer.go#L185-L187

Signed-off-by: Tobias Klauser <tobias@cilium.io>

* k8s/utils: filter out cilium-owned labels on pod update

[ upstream commit ed4e6505c059512446d45b40d5aba6ca5cf15d3c ]

Currently `io.cilium.k8s.*` pod labels are only filtered out on pod
creation. On pod update, they are currently not filtered which leads to
a situation where no pod label update is reflected in the endpoint
anymore in case of a `io.cilium.k8s.*` label set on the pod:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: foo
  namespace: default
  labels:
    app: foobar
    io.cilium.k8s.something: bazbar
spec:
  containers:
  - name: nginx
    image: nginx:1.25.4
    ports:
    - containerPort: 80
EOF
$ kubectl -n kube-system exec -it cilium-nnnn -- cilium-dbg endpoint list
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                              IPv6                  IPv4           STATUS
           ENFORCEMENT        ENFORCEMENT
252        Disabled           Disabled          50316      k8s:app=foobar                                                           fd00:10:244:1::8b69   10.244.1.78    ready
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
                                                           k8s:io.cilium.k8s.policy.cluster=kind-kind
                                                           k8s:io.cilium.k8s.policy.serviceaccount=default
                                                           k8s:io.kubernetes.pod.namespace=default
$ kubectl label pods foo app=nothing --overwrite
$ kubectl describe pod foo
[...]
Labels:           app=nothing
                  io.cilium.k8s.something=bazbar
[...]
$ kubectl describe cep foo
[...]
Labels:       app=foobar
              io.cilium.k8s.something=bazbar
[...]
$ kubectl -n kube-system exec -it cilium-nnnn -- cilium-dbg endpoint list
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                              IPv6                  IPv4           STATUS
           ENFORCEMENT        ENFORCEMENT
252        Disabled           Disabled          50316      k8s:app=foobar                                                           fd00:10:244:1::8b69   10.244.1.78    ready
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
                                                           k8s:io.cilium.k8s.policy.cluster=kind-kind
                                                           k8s:io.cilium.k8s.policy.serviceaccount=default
                                                           k8s:io.kubernetes.pod.namespace=default
1285       Disabled           Disabled          1          reserved:host                                                                                                 ready
1297       Disabled           Disabled          4          reserved:health                                                          fd00:10:244:1::ebfb   10.244.1.222   ready

Note that the `app` label didn't change from `foobar` to `nothing` in
the endpoint and the CiliumEndpoint CRD

This is because the filtered labels are passed wrongly passed
to `(*Endpoint).ModifyIdentityLabels` which in turn calls
`e.OpLabels.ModifyIdentityLabels` which checks whether all of the
deleted labels (which contains the filtered label on pod update
for the example above) were present before, i.e. on pod creation. This
check fails however because the labels were filtered out on pod
creation.

Fix this issue by also filtering out the labels on pod update and thus
allowing the label update to successfully complete in the presence of
filtered labels.

After this change, the labels are correctly updated on the endpoint and
the CiliumEndpoint CRD:

$ kubectl label pods foo app=nothing --overwrite
$ kubectl describe pod foo
[...]
Labels:           app=nothing
                  io.cilium.k8s.something=bazbar
[...]
$ kubectl describe cep foo
[...]
Labels:       app=nothing
              io.cilium.k8s.something=bazbar
[...]
$ kubectl -n kube-system exec -it cilium-x2x5r -- cilium-dbg endpoint list
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                              IPv6                  IPv4           STATUS
           ENFORCEMENT        ENFORCEMENT
57         Disabled           Disabled          56486      k8s:app=nothing                                                          fd00:10:244:1::71b7   10.244.1.187   ready
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
                                                           k8s:io.cilium.k8s.policy.cluster=kind-kind
                                                           k8s:io.cilium.k8s.policy.serviceaccount=default
                                                           k8s:io.kubernetes.pod.namespace=default
201        Disabled           Disabled          4          reserved:health                                                          fd00:10:244:1::c8de   10.244.1.221   ready
956        Disabled           Disabled          1          reserved:host                                                                                                 ready

Fixes: 599dde3b91b3 ("k8s: Filter out cilium owned from pod labels")

Signed-off-by: Tobias Klauser <tobias@cilium.io>

* k8s/utils: correctly filter out labels in StripPodSpecialLabels

[ upstream commit 280b69d9c12da757bfa764e3e77b87660f77c5bc ]

Filter the labels before iterating them, otherwise they are added to
sanitizedLabels again. Also remove forbidden keys prefixed with
io.cilium.k8s because they are already filtered out by filterPodLabels
and inline the check for kubernetes namespace labels.

Fixes: ed4e6505c059 ("k8s/utils: filter out cilium-owned labels on pod update")
Signed-off-by: Tobias Klauser <tobias@cilium.io>

* chore(deps): update all github action dependencies

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* cni: Use batch endpoint deletion API in chaining plugin

[ upstream commit 31be7879e3a3a16936b2df71b14bb4cbdfc697a6 ]

This commit is to leverage new batch endpoint deletion API instead of
singular endpoint deletion based on ID. The main reason is to provide
backward compatibility on upgrade path.

The current CNI attachment ID requires a valid containerIfName attribute,
however, the endpoints created by old cilium versions (e.g. <1.15) are
not having such details. Any CNI DEL command for these endpoints will
lead to invalid lookup (e.g. DeleteEndpointIDNotFoundCode), and prevent
cleaning up of related resources such as IP addresses.

The impact is only limited to CNI chaining mode, as batch endpoint
deletion API is already used cilium-cni/cmd/cmd.go as part of #27351.

Old endpoint details without (or empty) ContainerIfName

```json
{
  "ID": 423,
  "ContainerName": "",
  "dockerID": "415beb119c4b0910f62634510e921a447893195ebedc30ca0e9cd5bf02569645",
  "DockerNetworkID": "",
  "DockerEndpointID": "",
  "IfName": "eni22524e9e591",
  "IfIndex": 13,
  "ContainerIfName": "",
  "DisableLegacyIdentifiers": false,
  ...
}
```

New endpoint details with valid ContainerIfName (e.g. eth0)

```json
{
  "ID": 3627,
  "ContainerName": "",
  "dockerID": "f89ccf654b878248442981d4c56fe3f50fa127f922b46ee6dccc94ae10e94b79",
  "DockerNetworkID": "",
  "DockerEndpointID": "",
  "IfName": "enia67a2d3c27d",
  "IfIndex": 45,
  "ContainerIfName": "eth0",
  "DisableLegacyIdentifiers": false,
  ...
}
```

Relates: #26894, #27351
Suggested-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>

* test: Remove duplicate Cilium deployments in some datapath config tests

When disabling IPv6 BPF masquerading in datapath configuration tests
using the host firewall, we accidentally duplicated the command to
deploy Cilium for some of the tests. Let's clean this up to avoid
slowing down the tests.

Fixes: 934e1f2df26c ("daemon: Forbid IPv6 BPF masquerading with the host firewall")
Signed-off-by: Quentin Monnet <qmo@qmon.net>

* chore(deps): update docker.io/library/golang:1.21.8 docker digest to 8560736

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

* controlplane: fix panic: send on closed channel

[ upstream commit 7df437a0d03e91ea137836b5fa3436fe60e3cdc8 ]

Rarely, the control plane test panics, due to a send on a closed
channel. This can occur in a narrow race window in the filteringWatcher:

1. Stop is called on the child watcher
2. Child watcher calls stop on parent watcher
3. Concurrently, an event is dequeued from the parent result chan, and
   we enter the filtering logic.
4. The parent result chan is closed, and we close the child event channel
5. The filter is matched, and we attempt to write on the closed channel,
   which causes the panic.

Instead of closing the channel in the Stop method, close the channel
from the writing goroutine (as is commonly considered best practice in
Go.)

Fixes: fa89802ce7 (controlplane: Implement filtering of objects with field selectors)

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* controlplane: add mechanism to wait for watchers

[ upstream commit ba99d74c444ec3bdcd7d32bd4bbec1d76f85cdd2 ]
[ backporter notes: Whitespace conflict at the new `establishedWatchers`
  field in test/controlplane/suite/testcase.go ]

We've recently learned that the fake k8s client set's object tracker do
not respect the semantics of the real api-server when it comes to
'Watch': since the object tracker does not care for ResourceVersions, it
cannot respect the version from which it ought to replay events. As a
result, the default informer (more precisely, its reflector) is racy: it
uses a ListAndWatch approach, which relies on this resource version to
avoid a race window between the end of list and the beginning of watch.
Therefore, all informers used in cilium have a low chance of hitting
this race when used with a k8s fake object tracker.

This is somewhat known in the k8s community, see for example [1].
However, the upstream response is that one simply shouldn't be using the
fake infrastructure to test real informers. Unfortunately, this pattern
is used somewhat pervasively inside the cilium tests, specifically so in
the controlplane tests.

This patch introduces a mechanism which reduces the likelihood of
hitting the flake, under the assumption that we do not (often) establish
multiple watchers for the same resource. In the following patch, we'll
use the new infrastructure to reduce the flakiness of tests.

[1]: https://github.com/kubernetes/kubernetes/issues/95372

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* controlplane: wait for watcher establishment

[ upstream commit 955049a8c6339ae8983fa8f041062e850ddcd5a3 ]
[ backporter notes: Also updated status_updates_gc.go ]

Use the infrastructure introduced in the previous commit to deflake
control plane tests which update k8s state after starting the agent.

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* job: avoid a race condition in ExitOnCloseFnCtx

[ upstream commit 6b2d186f653d8856aefa05b771ae0c7acf16a5b2 ]

The test attempted to avoid closing a channel multiple times by setting
'started' to nil. However, since the outer scope will wait on 'started',
if started is set to nil before the outer scope waits, it will wait
indefinitely - resulting in a time out of the test.

Fixes: daa85a0f4a (jobs,test: Fix TestTimer_ExitOnCloseFnCtx channel close panic)

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* controlplane: fix mechanism for ensuring watchers

[ upstream commit fe71a4aef3bdd2fdafa7a8cbdf3dc71d29f18cf7 ]
[ backporter notes: Minor conflicts in the fields of `ControlPlaneTest`
  (fakeDatapath package has been renamed) ]

I realized that the fix for controlplane tests isn't complete. There is
still a (small) race window: The current watch reaction records a
watcher as established without "handling" the watch itself, i.e. it lets
the default watch reaction actually call 'Watch' on the tracker. This is
racy, as things can happen in the window between recordng and actually
watching.

To fix this, add the recording unconditionally in the existing tracker
augmentation.

Fixes: ba99d74c44 (controlplane: add mechanism to wait for watchers)

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* Handle InvalidParameterValue as well for PD fallback

[ upstream commit 5a487b59a5b9cbc52dd221885655c6cb8abd1890 ]

cilium#30536 prematurely concluded that
AWS now uses InsufficientCidrBlocks to indicate the subnet is out of
prefixes. Looks like AWS still uses InvalidParameterValue and "There
aren't sufficient free Ipv4 addresses or prefixes" to indicate subnet is
at capacity. In addition to this InsufficientCidrBlocks is returned
when subnet is at capacity potentially due to fragmentation. In either
case, it's worth trying to fallback since /32 IPs might still be
available compared to /28. See PR for details from AWS support ticket.

Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* Adding unit test for PD fallback

[ upstream commit 0007e35d0a5de7b570e4750cb12a7965b3301949 ]
[ backporter notes: Minor import conflict in pkg/aws/ec2/mock/mock.go ]

Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* slices: don't modify missed input slice in test

[ upstream commit 08d898d8011c68a85851629e131900deef91bab3 ]

The commit below correctly identified that modifying the test input
makes the test flaky due to being dependent on the non-deterministic
ordering. However, it missed adding the work-around for the first test,
TestUnique.

Fixes: 32543a40b1 (slices: don't modify input slices in test)

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* bgpv1: avoid object tracker vs informer race

[ upstream commit cfd17903fb69d1d6153c172f0af6ac200721e065 ]

The fake k8s testing infrastructure has an unfortunate interaction
between real informers and the object tracker used in fake clientsets:
the informer uses ListAndWatch to subscribe to the api server, but
ListAndWatch relies on the ResourceVersion field to bridge the gap
between the List and the Watch. The simple object tracker does not
provide the resource versioning, nor does its Watch implementation
attempt to replay existing state.

The race window, hence, allows for creation of objects _after_ the
initial list, but _before_ the establishment of Watch. These objects are
not observed by the informer and thus tests are likely to fail.

As a workaround, we ensure that the object tracker has registered a
watcher before we start the test in earnest. It is only important that
we do not create/update/delete objects in the window between starting
the hive (and hence running of the informer) and having ensured that the
watcher is in place. Creation prior to starting the hive is okay, as
well as after ensuring the watcher exists.

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* doc: Clarified GwAPI KPR prerequisites

[ upstream commit 1321e03093db0f7655488c60f91e22e273c65cb0 ]

Before the GwAPI doc listed KPR mode as a prerequisite. However,
it's actually only required to enable BPF nodePort support.

Signed-off-by: Philip Schmid <philip.schmid@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* helm: Add pod affinity for cilium-envoy

[ upstream commit 44aeb535b1d328734f8faac07669fbfa4683bc6f ]
[ backporter notes: Minor conflict in values.yaml due to different
  intendation of podAntiAffinity in v1.15 ]

This commit is to avoid cilium-envoy running on the node without cilium
agent. Two main changes are:

- nodeAffinity to make sure that cilium-envoy will not be scheduled on
  node without cilium agent
- podAffinity with requiredDuringSchedulingIgnoredDuringExecution to
  cilium agent

Relates: #25081, #30034
Fixes: #31149
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* bgpv1: fix Test_PodIPPoolAdvert flakiness

[ upstream commit 696a4fd9620d6efe8f1697a5d4d066a6ab5fbdf8 ]

Since the test updates multiple k8s resources in each test step,
and each step depends on the previous one, we should be careful
about how many k8s resources are we actually changing in each step,
to not trigger multiple reconciliations with different results
(advertisements) in a single step.

This change ensures we change only one value that affects
the advertisement in each test step.

Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* bpf, maps: Don't propagate nodeID to bpf map when allocation fails.

[ upstream commit 4f4b0a7460ec759dc025c91df92f29aa7b64cce1 ]

When we run out of IDs to allocate for nodes, we were propagating zero
ID to bpf map.
Now we just simply return error and not modify bpf map instead.
Also clean up incorrectly mapped nodeids on startup in case that
happened.

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* policy: Fix missing labels from SelectorCache selectors

[ upstream commit 65bbf3fbb568bf3b8a76a2ec87ac826e3e036a0a ]

During the refactor of the below commit, it seems the labels were left
out inadvertently, breaking the `cilium policy selectors` command that
displays the labels/name of the policy from which the selectors
originate from.

Fixes: 501944c35d ("policy/selectorcache: invert identitySelector interface")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* ci: Bump lvh-kind ssh-startup-wait-retries

[ upstream commit a5eafe025390d77fa370add4883e6b1a00da7365 ]

Recently, we frequently see the CI failure with lvh-kind startup
failure with exit code 41. This indicates the timeout of the task
waiting for the SSH startup. Bump the timeout (retry) to 600 (10min) as
a workaround.

Fixes: #31336

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* datapath: Remove unnecessary IPsec code

[ upstream commit 4ba7e6a3111f1bb43c54253ac645ed7949709a2e ]

Commit 891fa78474 ("bpf: Delete obsolete do_netdev_encrypt_pools()")
removed the special code we had to rewrite the IPsec outer header. The
code removed in the present commit is therefore not required anymore.

Fixes: 891fa78474 ("bpf: Delete obsolete do_netdev_encrypt_pools()")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* operator: fix errors/warnings metric.

[ upstream commit f61651f3797432892588bfb8086e70991312313e ]

This was broken during transition of pkg/metrics to integrate with Hive where relevant operator metrics where never initialized.
This adds a init func specific for operator and cleans up the "flush" logic used as a work around for errors/warnings emitted prior to agent starting (in the case of the operator).

Addresses: #29525

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* loader: add message if error is ENOTSUP

[ upstream commit 64364477fc024e8026e30016257fa124fca4a901 ]

I was trying to install cilium in a machine without a vxlan module
available. Add a helpful message for the next time this happens.

Tested by running:
> PRIVILEGED_TESTS=1 go test  -test.v -test.run TestSetupTunnelDevice

on said machine, and got the following output:
...

```
        Error:          Received unexpected error: setting up vxlan device: creating vxlan device: creating device cilium_vxlan: operation not supported, maybe kernel module for vxlan is not available?
	Test:           TestSetupTunnelDevice/Vxlan
```

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* ipam: fix azure ipam test panics due to shared pointers.

[ upstream commit 1ca6141190f455cae85ebe6215f86a74bda9d213 ]

pkg/ipam/types.(*InstanceMap).DeepCopy(...) will iterate for all instances/interfaces in order to copy the data.
However, unlike what the name suggests, underlying instance pkg/ipam/types.Interface pointers are copied and shared in the returned instance map.
In some cases, this case result in memory corruption issues resulting in confusing panics while running tests such as:

```
panic: runtime error: makeslice: cap out of range
goroutine 1366 [running]:
strconv.appendQuotedWith({0xc000576208, 0x0, 0x3f?}, {0x1000000, 0x100000001000000}, 0x22, 0x0, 0x0)
	/opt/hostedtoolcache/go/1.22.0/x64/src/strconv/quote.go:35 +0x85
strconv.AppendQuote(...)

...
```

Capturing such an event in a debugger you would see a AzureInterface struct such as this with the IP string memory being corrupt (likely due to an interleaved read/write) being passed to logrus causing a crash.

```
github.com/cilium/cilium/pkg/azure/types.AzureAddress {
	IP: "\x10\x01\x00\x00\x00\x00\x00\x00\x10\x01\x00\x00\x00\x00\x00\x007d�_\x02\b\b\x19\x00\x00\x00\x00\x00\x00\x00\x00�\x1f�\x03\x00\x00\x00\x00W�\b\x00\x00\x00\x00\x00\x00p \x03\x00\x00\x00\x00�qi\x03\x00\x00\x00\x00...+51559946186908214 more",
	Subnet: "subsys",
	State: "instanceID",}
```
This ensures that the revision interface is correctly deepcopied such that the underlying resource is also safely copied.

Fixed: #31059

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* gateway-api: Retrieve LB service from same namespace

[ upstream commit e8bed8d8343731422dc65b706c3617a40d0a8058 ]
[ backporter notes: conflicts in unit tests. Had to change introduce
  the `NoError` check instead of expecting a specific error ]

This commit is to add the same namespace while listing generated LB
service, the main reason is to avoid wrong status update for gateways
having the same name, but belonged to different namespace.

Testing was done locally as per below:

Before the fix:
```
$ kg gateway -A
NAMESPACE   NAME         CLASS    ADDRESS          PROGRAMMED   AGE
another     my-gateway   cilium   10.110.222.237   True         4s
default     my-gateway   cilium   10.110.222.237   True         56s
```

After the fix:

```
$ kg gateway -A
NAMESPACE   NAME         CLASS    ADDRESS          PROGRAMMED   AGE
another     my-gateway   cilium   10.102.170.180   True         14m
default     my-gateway   cilium   10.110.222.237   True         14m

$ kg services -A
NAMESPACE     NAME                           TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
another       cilium-gateway-my-gateway      LoadBalancer   10.102.170.180   10.102.170.180   80:31424/TCP                 15m
default       cilium-gateway-my-gateway      LoadBalancer   10.110.222.237   10.110.222.237   80:31889/TCP                 15m
...
```

Fixes: https://github.com/cilium/cilium/issues/31270
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* ci-e2e: Add matrix for bpf.tproxy

[ upstream commit 5884af16935bc7acec4dcbb836dffe7a5e460855 ]

This is to make sure that we have the coverage for bpf.tproxy enabled.
The first step is to test it with Ingress Controller enabled, we can add
more settings if required later.

Relates: #30331, #30404

Suggested-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* gha: disable fail-fast on integration tests

[ upstream commit 0fb203e8671638bad22438ec31cbe687462c4826 ]

So that the failure of one matrix entry (e.g., caused by a flake) doesn't
cancel the other ongoing tests, if any.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* hive/cell/health: don't warn when reporting on stopped reporter.

[ upstream commit 44e90054f29c410608e816b112df8fc615eb0131 ]

Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* docs: Warn on key rotations during upgrades

[ upstream commit b639eab46a1fd85f0bab362b3fdf4cf21df80ace ]

In general, it is not recommended to carry several admin. operations on
the cluster at the same time, as it can make troubleshooting in case of
issues a lot more complicated. Mixing operations is also less likely to
be covered in CI so more likely to hit corner cases.

Performing IPsec key rotations during Cilium up/downgrades is one such
case. Let's document it explicitly to discourage users from doing that.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* chore(deps): update all github action dependencies

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* hubble: fix traffic direction and reply on encrypted trace notifications

[ upstream commit 9939fa2b0848ddd056e81f14a548f179f59027f3 ]

Before this patch, Hubble would wrongly report known traffic direction
and reply status when IPSec was enabled.

Signed-off-by: Alexandre Perrin <alex@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>

* AKS: avoid overlapping pod and service CIDRs

[ upstream commit fbe78c42f44f8e1c04d3886fc51a2b283b668770 ]

The default service CIDR of AKS clusters is 10.0.0.0/16 [1].
Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence
use cilium's default of 10.0.0.0/8, which overlaps. This can
lead to "fun" situations in which e.g. the kube-dns service ClusterIP is
the same as the hubble-relay pod IP, or similar shenanigans. This
usually breaks the cluster utterly.

The fix is relatively straight-forward: set a pod CIDR for cilium which
does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this
is what is recommended in [2].

[1]: https://learn.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-with-system-assigned-managed-identities
[2]: https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#option-1-assign-ip-addresses-from-an-overlay-network

Fixes: fbf3d38a4b (ci: add AKS workflow)

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>

* hubble/relay: Fix certificate reloading in PeerManager

[ upstream commit 71628929e10005e3620df0f52732ecb57015cbef ]

This change fixes an issue with the PeerManager that can lead to Relay
being unable to reconnect to some or all peers when the client
certificate expires or the Certificate Authority is replaced.

Before this change, when the client certificate changes, we did not
redial or update the exiting gRPC ClientConns. When the old certificate
becomes invalid, (expiring, changed CA, or revoked) The connection will
eventually fail with a certificate error.
However, the gRPC ClientConn is not closed, but treats the certificate
error as a transient failure and will retry connecting with the old
credentials indefinitely.
In most cases this will cause the relay health checks to fail. Relay
will restart and successfully reconnect to all peers. However, if a new
peer joins between the certificate being updated and the connections
failing, Relay may keep on running in a degraded state.

This issue was introduced by #28595. Before that change, Relay
aggressively closed and re-dialed ClientConns on any error, mitigating
this problem.

We fix this issue by wrapping the provided gRPC transport credentials
and updating the TLS configuration whenever a new TLS connection is
established. This means every TLS connection will use up-to-date
certificates and gRPC ClientConns will be able to recover when their
certificate changes.

Fixes: aca4d42ce80d ("hubble/relay: Fix connection leak when reconnecting to peer service")

Signed-off-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Alexandre Perrin <alex@isovalent.com>

* cilium-dbg: listing load-balancing configurations displays L7LB proxy port

[ upstream commit d3b19d65d3358c1dc08d55e715219a74567cd26d ]

Currently, listing the load-balancing configuration doesn't display the
L7LB Proxy Port for services of type `l7-load-balancer`.

```
cilium-dbg bpf lb list
SERVICE ADDRESS     BACKEND ADDRESS (REVNAT_ID) (SLOT)
...
10.96.193.7:443     0.0.0.0:0 (30) (0) [ClusterIP, non-routable, l7-load-balancer]
```

The only way of retrieving the L7LB proxy port is to list the frontends
(`cilium-dbg bpf lb list --frontends`) and manually convert the backend id
(union type) to the L7LB proxy port.

Therefore, this commit addsd the L7LB proxy port to the output of `cilium-dbg bpf lb list`
if the service is of type L7 LoadBalancer. The `--frontends` subcommand still displays the
unmapped backend id.

```
cilium-dbg bpf lb list
SERVICE ADDRESS     BACKEND ADDRESS (REVNAT_ID) (SLOT)
10.96.0.1:443       172.18.0.3:6443 (1) (1)
                    0.0.0.0:0 (1) (0) [ClusterIP, non-routable]
10.96.252.10:443    172.18.0.2:4244 (22) (1)
                    0.0.0.0:0 (22) (0) [ClusterIP, InternalLocal, non-routable]
10.96.155.44:80     0.0.0.0:0 (14) (0) [ClusterIP, non-routable]
                    10.244.1.211:80 (14) (1)
172.18.0.2:32646    0.0.0.0:0 (33) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 15735)
10.96.193.7:443     0.0.0.0:0 (30) (0) [ClusterIP, non-routable, l7-load-balancer] (L7LB Proxy Port: 15735)
10.96.122.45:80     10.244.1.250:80 (26) (1)
                    0.0.0.0:0 (26) (0) [ClusterIP, non-routable]
10.96.102.137:80    0.0.0.0:0 (23) (0) [ClusterIP, non-routable]
                    10.244.1.126:4245 (23) (1)
10.96.108.180:443   0.0.0.0:0 (17) (0) [ClusterIP, non-routable, l7-load-balancer] (L7LB Proxy Port: 17731)
172.18.255.1:80     0.0.0.0:0 (25) (0) [LoadBalancer, l7-load-balancer] (L7LB Proxy Port: 17731)
0.0.0.0:32646       0.0.0.0:0 (34) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 15735)
0.0.0.0:31012       0.0.0.0:0 (21) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 17731)
```

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>

* chore: update sw + json-mock image

[ upstream commit f8fb8d1fb3c94776faa63b3c3062837b6c74e188 ]

Replaced docker.io by quay.io pinned with current latest
Docker source is deprecated.

Signed-off-by: loomkoom <loomkoom@hotmail.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>

* bgpv1: BGP Control Plane metrics

[ upstream commit 59a01a89a231cab751fe88a028ce4c5c46c4d513 ]

Implement following initial metrics for BGP Control Plane.

1. cilium_bgp_control_plane_session_state

Gauge that shows session state per vrouter/neighbor. Established (1) or
Not Established (0).

2. cilium_bgp_control_plane_advertised_routes

Gauge that shows the number of advertised routes per
vrouter/neighbor/afi/safi.

3. cilium_bgp_control_plane_received_routes

Gauge that shows the number of received routes per
vrouter/neighbor/afi/safi.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>

* bpf: Remove `declare_tailcall_if`

[ upstream commit 46db41390be3a17872f7274c0ded62ecf7a07bf8 ]

Remove `declare_tailcall_if`, so we always emit the tailcall programs
into the ELF. The followup commit will implement pruning logic based
on the actual usage of the tail calls. This means that we will only
need the `invoke_tailcall_if` without the need to keep both the
declaration and invocation in sync.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Timo Beckers <timo@isovalent.com>

* bpf: Modify tail_call_static to emit better parseable assembly

[ upstream commit c4cbb38a1819adb08bd9b479b37ab4a7ec1b4e7a ]

Before this change the tail_call_static function would emit the
following instructions to perform a tailcall:

```
Mov R1, Rctx
Mov R2, Rmap_ptr
Mov R3, <slot>
Call TailCall
```

Since the second instruction is always a Register to Register move, we
would have to backtrack to find the actual map which is being used.

These changes makes it so the following instructions are emitted:

```
Mov R1, Rctx
Mov R2, 0 ll <calls_map>
Mov R3, <slot>
Call TailCall
```

By always using a double word immediate, with a relocation entry on the
Mov R2 instruction it is much easier to find the actual map which is
being used. As a side effect, we usually eliminate an extra instruction
clang was otherwise forced to emit.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

* pkg/bpf: Implement unreachable tail call pruning

[ upstream commit 16033b98b97573437df991efb2dd1da369a683ba ]

This commit implements unreachable tail call pruning. When loading a
collection we check if a tail call is reachable. If not, we remove the
tail call from the collection. This saves us from having to load the
tail call program into the kernel.

Previously, we would conditionally not include tail calls in the
collection with pre-processor directives. Now that we do it in the
loader, we can remove the pre-processor directives.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

* pkg/bpf: Add test for removeUnreachableTailcalls

[ upstream commit 217426aae125fc3cc0257db611b7f8e9e519d3ab ]

[ backporter's notes: regenerated testdata ]

This commit adds a test to verify the behavior of the dead tail call
pruning. It consists of 5 tail calls, of which 2 are unreachable. The
test asserts that only the unreachable tail calls are removed from the
spec.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Timo Beckers <timo@isovalent.com>

* bpf: Fix missing tail calls

[ upstream commit e1afa06774191d65714576d55f2eadcbe29fc2e6 ]

The changes to the dead tail call elimination revealed 2 cases of
missing tail calls.

First is to do with NAT46x64 logic where there
still existed a call path from the IPv4 logic which would attempt to
tail call into IPv6 to recirculate the packet, even when the IPv6 tail
call wasn't compiled in.

The second was that when XDP offloaded, the IPv6 logic would tail call
into a ICMP6 tail call which is only compiled in for TC programs.

This commit fixes both possible missing tail calls.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Timo Beckers <timo@isovalent.com>

* bpf: fix go testdata

[ upstream commit d6666ed76ddf5dc23f8fde07c3af0f28621f1907 ]

Currently, check-go-testdata.sh doesn't work as expected on CI.

It reports the following error and the GitHub action (Go Precheck)
succeeds without error.

```
contrib/scripts/check-go-testdata.sh
make[1]: Entering directory '/home/runner/work/cilium/cilium/src/github.com/cilium/cilium/pkg/bpf/testdata'
docker run -it --rm -v /home/runner/work/cilium/cilium/src/github.com/cilium/cilium:/cilium quay.io/cilium/cilium-builder:819a4f1e57eaacb6aabfc6a1a39d11d4fd794a88@sha256:24781dc80f2be2d8fd66b0ce1405e1f117a3a0ef388758b1ede7831778e3a4f7 clang -target bpf -Wall -O2 -g -I -I/usr/include -I/cilium/bpf -I/cilium/bpf/include -c /cilium/pkg/bpf/testdata/unreachable-tailcall.c -o /cilium/pkg/bpf/testdata/unreachable-tailcall.o
the input device is not a TTY
make[1]: *** [Makefile:9: build] Error 1
make[1]: Leaving directory '/home/runner/work/cilium/cilium/src/github.com/cilium/cilium/pkg/bpf/testdata'
```

This commit fixes the following issues:

- Don't execute docker interactive
- Use 'set -e' in check script

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

* ipsec: New IPsec secret bit to indicate per-node-pair keys

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

* ipsec: Move enableIPsecIPv{4,6} preconditions to caller

This small bit of refactoring will ease a subsequent commit.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

* ipsec: Compute per-node-pair IPsec keys

We need to ensure the (key used, sequence number) tuple for each
encrypted packet is always unique on the network. Today that's not the
case because the key is the same for all nodes and the sequence number
starts at 0 on node reboot.

To enable this, we will derive one key per node pair from …
sayboras pushed a commit that referenced this issue Jun 10, 2024
[ upstream commit 44e9005 ]

Currently in cases where reporters emit status on stopped reporters, a warning is logged.
This causes failures in CI related to endpoints, as those close their reporters after endpoint is deleted.

This scenario is probably not a valuable log for users and generally such occurrences should be harmless.
This moves the log to debug level.

Fixes: #31147

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
(cherry picked from commit 612a151)
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
sayboras pushed a commit that referenced this issue Jun 10, 2024
[ upstream commit d20f15e ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
(cherry picked from commit 93e60a8)
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
ubergesundheit added a commit to giantswarm/cilium-upstream that referenced this issue Jun 17, 2024
* docs: Remove Hubble-OTel from roadmap

[ upstream commit 0976a1b6bd085f6c95a8f0ed4c24fac83e244bcd ]

The Hubble OTel repo is going to be archived so it should be removed from the roadmap

Signed-off-by: Bill Mulligan <billmulligan516@gmail.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

* bitlpm: Document and Fix Descendants Bug

[ upstream commit 9e89397d2d81fe915bfbd74d41bd72e5d0c6ad5b ]

Descendants and Ancestors cannot share the same
traversal method, because Descendants needs to be
able to select at least one in-trie key-prefix match
that may not be a full match for the argument key-prefix.
The old traversal method worked for the Descendants
method if there happened to be an exact match of the
argument key-prefix in the trie. These new tests ensure
that Descendants will still return a proper list of
Descendants even if there is not an exact match in the
trie.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

* Prepare for release v1.15.4

Signed-off-by: Andrew Sauber <2046750+asauber@users.noreply.github.com>

* install: Update image digests for v1.15.4

Generated from https://github.com/cilium/cilium/actions/runs/8654016733.

`quay.io/cilium/cilium:v1.15.4@sha256:b760a4831f5aab71c711f7537a107b751d0d0ce90dd32d8b358df3c5da385426`
`quay.io/cilium/cilium:stable@sha256:b760a4831f5aab71c711f7537a107b751d0d0ce90dd32d8b358df3c5da385426`

`quay.io/cilium/clustermesh-apiserver:v1.15.4@sha256:3fadf85d2aa0ecec09152e7e2d57648bda7e35bdc161b25ab54066dd4c3b299c`
`quay.io/cilium/clustermesh-apiserver:stable@sha256:3fadf85d2aa0ecec09152e7e2d57648bda7e35bdc161b25ab54066dd4c3b299c`

`quay.io/cilium/docker-plugin:v1.15.4@sha256:af22e26e927ec01633526b3d2fd5e15f2c7f3aab9d8c399081eeb746a4e0db47`
`quay.io/cilium/docker-plugin:stable@sha256:af22e26e927ec01633526b3d2fd5e15f2c7f3aab9d8c399081eeb746a4e0db47`

`quay.io/cilium/hubble-relay:v1.15.4@sha256:03ad857feaf52f1b4774c29614f42a50b370680eb7d0bfbc1ae065df84b1070a`
`quay.io/cilium/hubble-relay:stable@sha256:03ad857feaf52f1b4774c29614f42a50b370680eb7d0bfbc1ae065df84b1070a`

`quay.io/cilium/operator-alibabacloud:v1.15.4@sha256:7c0e5346483a517e18a8951f4d4399337fb47020f2d9225e2ceaa8c5d9a45a5f`
`quay.io/cilium/operator-alibabacloud:stable@sha256:7c0e5346483a517e18a8951f4d4399337fb47020f2d9225e2ceaa8c5d9a45a5f`

`quay.io/cilium/operator-aws:v1.15.4@sha256:8675486ce8938333390c37302af162ebd12aaebc08eeeaf383bfb73128143fa9`
`quay.io/cilium/operator-aws:stable@sha256:8675486ce8938333390c37302af162ebd12aaebc08eeeaf383bfb73128143fa9`

`quay.io/cilium/operator-azure:v1.15.4@sha256:4c1a31502931681fa18a41ead2a3904b97d47172a92b7a7b205026bd1e715207`
`quay.io/cilium/operator-azure:stable@sha256:4c1a31502931681fa18a41ead2a3904b97d47172a92b7a7b205026bd1e715207`

`quay.io/cilium/operator-generic:v1.15.4@sha256:404890a83cca3f28829eb7e54c1564bb6904708cdb7be04ebe69c2b60f164e9a`
`quay.io/cilium/operator-generic:stable@sha256:404890a83cca3f28829eb7e54c1564bb6904708cdb7be04ebe69c2b60f164e9a`

`quay.io/cilium/operator:v1.15.4@sha256:4e42b867d816808f10b38f555d6ae50065ebdc6ddc4549635f2fe50ed6dc8d7f`
`quay.io/cilium/operator:stable@sha256:4e42b867d816808f10b38f555d6ae50065ebdc6ddc4549635f2fe50ed6dc8d7f`

Signed-off-by: Andrew Sauber <2046750+asauber@users.noreply.github.com>

* cilium-dbg: remove section with unknown health status.

The "unknown" status simply refers to components that accept a health
reporter scope, but have not declared their state as being either "ok"
or degraded.

This is a bit confusing, as this does not necessarily mean any problems
with Cilium.

In the future we may want to rework this state to distinguish between
unreported states and components that are "timing-out" reconciling a desired
state.

This PR simply removes displaying this information in `cilium-dbg
status`

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>

* chore(deps): update all github action dependencies

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* chore(deps): update azure/login action to v2.1.0

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* fix k8s versions tested in CI

- Remove older versions we do not officially support anymore on v1.15.
- Make K8s 1.29 the default version on all platforms.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>

* chore(deps): update docker.io/library/golang:1.21.9 docker digest to 81811f8

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* golangci: Enable errorlint

[ upstream commit d20f15ecab7c157f6246a07c857662bec491f6ee ]

Enable errorlint in golangci-lint to catch uses of improper formatters
for Go errors. This helps avoid unnecessary error/warning logs that
cause CI flakes, when benign error cases are not caught due to failing
error unwrapping when a string or value formatter has been used instead
of the dedicated `%w`.

Related: #31147
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

* errors: Precede with colon

[ upstream commit fe46958e319cd754696eb2dd71dbc0957fa13368 ]

Precede each `%w` formatter with a colon.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

* Docs: mark Tetragon as Stable

[ upstream commit c3108c9fa3e72b6821410e4534d835b7ccdeee25 ]

Signed-off-by: Natalia Reka Ivanko <natalia@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* Minor nit according to Liz's comment

[ upstream commit aea1ab82cca926d59f3434bfe5fe44ca9ab31e47 ]

Signed-off-by: Natalia Reka Ivanko <natalia@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* clustermesh: document global services limitations with KPR=false

[ upstream commit b8050e527aa916c97d51b43d809d8d9fb13b54ea ]

Global services do not work when KPR is disabled if accessed through
a NodePort, or from a host-netns pod, as kube-proxy doesn't know about
the remote backends. Let's explicit these limitations.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* alibabacloud/eni: avoid racing node mgr in test

[ upstream commit 56ca9238b3cece3d97beaaa6e8837e7340966ad7 ]

[ backporter's notes: minor adaptations as the InterfaceCandidates
  field was not scoped by IP family. ]

TestPrepareIPAllocation attempts to check that PerpareIPAllocation
produces the expected results. It avoids starting the ipam node manager
it constructs, likely trying to avoid starting the background
maintenance jobs.

However, manager.Upsert _does_ asynchronously trigger pool maintenance,
which results in a automated creation of an ENI with different params
than the test expects, and hence assertion failures.

This patch avoids the race condition by explicitly setting the instance
API readiness to false, which causes background pool maintenance to be
delayed and hence guarantees that the PrepareIPAllocation call runs in
the environment expected.

The following was used to reproduce this flake:

  go test -c -race ./pkg/alibabacloud/eni && stress -p 50 ./eni.test

The likelihood of hitting this flake approximates 0.02%, hence
reproducing requires a reasonably large number of runs, as well as high
load on the system to increase the likelihood of the flake (since it
does depend on the test being somewhat starved for CPU).

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* ClusterMesh/helm: support multiple replicas

[ upstream commit df3c02f0044393b00a4fe2c3968a53466eb9032a ]

[ backporter's notes: dropped the session affinity changes, and
  backported only the introduction of the unique cluster id which,
  together with the interceptors backported as part of the next
  commit, prevents Cilium agents from incorrectly restarting an
  etcd watch against a different clustermesh-apiserver instance. ]

This commit makes changes to the helm templates for
clustermesh-apiserver to support deploying multiple replicas.

- Use a unique cluster id for etcd:

Each replica of the clustermesh-apiserver deploys its own discrete etcd
cluster. Utilize the K8s downward API to provide the Pod UUID to the
etcd cluster as an initial cluster token, so that each instance has a
unique cluster ID. This is necessary to distinguish connections to
multiple clustermesh-apiserver Pods using the same K8s Service.

- Use session affinity for the clustermesh-apiserver Service

Session affinity ensures that connections from a client are passed to
the same service backend each time. This will allow a Cilium Agent or
KVStoreMesh instance to maintain a connection to the same backend for
both long-living, streaming connections, such as watches on the kv
store, and short, single-response connections, such as checking the
status of a cluster. However, this can be unreliable if the l3/l4
loadbalancer used does not also implement sticky sessions to direct
connections from a particular client to the same cluster node.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* ClusterMesh: validate etcd cluster ID

[ upstream commit 174e721c0e693a4e0697e4abfd1434f9c796859b ]

[ backporter's notes: backported a stripped down version of the upstream
  commit including the introduction of the interceptors only, as fixing
  a bug occurring in a single clustermesh-apiserver configuration as
  well (during rollouts), by preventing Cilium agents from incorrectly
  restarting an etcd watch against a different clustermesh-apiserver
  instance. ]

In a configuration where there are mutliple replicas of the
clustermesh-apiserver, each Pod runs its own etcd instance with a unique
cluster ID. This commit adds a `clusterLock` type, which is a wrapper
around a uint64 that can only be set once. `clusterLock` is used to
create gRPC unary and stream interceptors that are provided to the etcd
client to intercept and validate the cluster ID in the header of all
responses from the etcd server.

If the client receives a response from a different cluster, the
connection is terminated and restarted. This is designed to prevent
accepting responses from another cluster and potentially missing events
or retaining invalid data.

Since the addition of the interceptors allows quick detection of a
failover event, we no longer need to rely on endpoint status checks to
determine if the connection is healthy. Additionally, since service session
affinity can be unreliable, the status checks could trigger a false
failover event and cause a connection restart. To allow creating etcd
clients for ClusterMesh that do not perform endpoint status checks, the
option NoEndpointStatusChecks was added to ExtraOptions.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* bpf: Update checkpatch image

[ upstream commit 7a98d56c101579ceda2ae6f88c187525ba785a38 ]

Update checkpatch image to pull the latest changes we've added: namely,
remove the wrapping of individual patch results in GitHub's workflows
interface, as it's annoying to click many times to find the commits with
issues.

Signed-off-by: Quentin Monnet <qmo@qmon.net>

* images: Update bpftool image

[ upstream commit f3e65bccfaee2ae058aac7aabd4d6efe1cf6dd61 ]

We want bpftool to be able to dump netkit programs, let's update the
image with a version that supports it.

Signed-off-by: Quentin Monnet <qmo@qmon.net>

* images: update cilium-{runtime,builder}

Signed-off-by: Quentin Monnet <qmo@qmon.net>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* proxy: skip rule removal if address family is not supported

Fixes: #31944

Signed-off-by: Robin Gögge <r.goegge@isovalent.com>

* chore(deps): update all-dependencies

Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* chore(deps): update hubble cli to v0.13.3

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* chore(deps): update all github action dependencies

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* envoy: Bump envoy version to v1.27.5

This is mainly to address the below CVE

https://github.com/envoyproxy/envoy/security/advisories/GHSA-3mh5-6q8v-25wj

Related release: https://github.com/envoyproxy/envoy/releases/tag/v1.27.5

Signed-off-by: Tam Mach <tam.mach@cilium.io>

* tables: Sort node addresses also by public vs private IP

[ upstream commit 7da651454c788e6b240d253aa9cb94b3f28fc500 ]

The firstGlobalAddr in pkg/node tried to pick public IPs over private IPs
even after picking by scope. Include this logic in the address sorting
and add a test case to check the different sorting predicates.

For NodePort pick the first private address if any, otherwise pick
the first public address.

Fixes: 5342d0104f ("datapath/tables: Add Table[NodeAddress]")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* gha: configure fully-qualified DNS names as external targets

[ upstream commit 100e6257952a1466b1b3fb253959efff16b74c92 ]

This prevents possible shenanigans caused by search domains possibly
configured on the runner, and propagated to the pods.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* docs: Document six-month feature release cadence

[ upstream commit 302604da6b82b808bf65e652698514b5fb631744 ]

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* docs: Fix github project link

[ upstream commit 6f0a0596a568a0f8040bda03f4ad361bb1424db9 ]

GitHub changed the URL for the classic projects that we are currently
using to track patch releases. Fix the link.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* fqdn: Fix goroutine leak in transparent-mode

[ upstream commit 804d5f0f22a82495c5d12d9a25d0fe7377615e6f ]

When number of concurrent dns requests was moderately high, there was a
chance that some of the gorutines would get stuck waiting for response.
Contains fix from https://github.com/cilium/dns/pull/10

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>

* xds: Return nil error after context cancel

[ upstream commit c869e6c057be8eed6d0960a06fbf7d9706b2bc83 ]

Do not return an error from xds server when the context is cancelled, as
this is part of normal operation, and we test for this in
server_e2e_test.

This resolves a test flake:

panic: Fail in goroutine after  has completed

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* test: Wait for stream server checks to complete

[ upstream commit 3cde59c6e67b836ef1ea8c47b772d0431254a852 ]

The main test goroutine might be completed before checks on the server
goroutine are completed, hence cause the below panic issue. This commit
is to defer the streamDone channel close to make sure the error check on
the stream server is done before returning from the test. We keep the
time check on the wait in the end of each test to not stall the tests in
case the stream server fails to exit.

Panic error
```
panic: Fail in goroutine after Test/ServerSuite/TestRequestStaleNonce has completed
```

Testing was done as per below:
```
$ go test -count 500 -run Test/ServerSuite/TestRequestStaleNonce ./pkg/envoy/xds/...
ok      github.com/cilium/cilium/pkg/envoy/xds  250.866s
```

Fixes: #31855
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* stream_test: Return io.EOF upon channel close

[ upstream commit df6afbd306d160179fd07ef6b4910b42967b5fdd ]

[ backporter's note: moved changes from pkg/envoy/xds/stream_test.go to
pkg/envoy/xds/stream.go as v1.15 doesn't have the former file ]

Return io.EOF if test channel was closed, rather than returning a nil
request. This mimics the behavior of generated gRPC code, which never
returns nil request with a nil error.

This resolves a test flake with this error logs:

time="2024-04-16T08:46:23+02:00" level=error msg="received nil request from xDS stream; stopping xDS stream handling" subsys=xds xdsClientNode="node0~10.0.0.0~node0~bar" xdsStreamID=1

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>

* test: Eliminate XDS CacheUpdateDelay

[ upstream commit d3c1fee6f7cd69ea7975193b25dd881c0194889d ]

Speed up tests by eliminating CacheUpdateDelay, as it is generally not
needed.

When needed, replace with IsCompletedInTimeChecker that waits for upto
MaxCompletionDuration before returning, in contrast with
IsCompletedChecker that only returns the current state without any wait.

This change makes the server_e2e_test tests run >1000x faster.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* test: Eliminate duplicate SendRequest

[ upstream commit 4efe9ddc35f918d870ff97d148c8d0e2255b4a73 ]

TestRequestStaleNonce test code was written with the assumption that no
response would be reveived for a request with a stale nonce, and a second
SendRequest was done right after with the correct nonce value. This
caused two responses to be returned, and the first one could have been
with the old version of the resources.

Remove this duplicate SendRequest.

This resolves test flakes like this:

        --- FAIL: Test/ServerSuite/TestRequestStaleNonce (0.00s)
            server_e2e_test.go:784:
                ... response *discoveryv3.DiscoveryResponse = &discoveryv3.DiscoveryResponse{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), VersionInfo:"3", Resources:[]*anypb.Any{(*anypb.Any)(0x40003a63c0), (*anypb.Any)(0x40003a6410)}, Canary:false, TypeUrl:"type.googleapis.com/envoy.config.v3.DummyConfiguration", Nonce:"3", ControlPlane:(*corev3.ControlPlane)(nil)} ("version_info:\"3\" resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource1\"}} resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource0\"}} type_url:\"type.googleapis.com/envoy.config.v3.DummyConfiguration\" nonce:\"3\"")
                ... VersionInfo string = "4"
                ... Resources []protoreflect.ProtoMessage = []protoreflect.ProtoMessage{(*routev3.RouteConfiguration)(0xe45380)}
                ... Canary bool = false
                ... TypeUrl string = "type.googleapis.com/envoy.config.v3.DummyConfiguration"

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* test: increase xds stream timeout to avoid test flakes

[ upstream commit 75d144c56579c8e0349b9b1446514c335a8d0909 ]

Stream timeout is a duration we use in tests to make sure the stream does
not stall for too long. In production we do not have such a timeout at
all, and in fact the requests are long-lived and responses are only sent
when there is something (new) to send.

Test stream timeout was 2 seconds, and it would occasionally cause a test
flake, especially if debug logging is enabled. This seems to happen due
to goroutine scheduling, and for this reason debug logging should not be
on for these tests.

Bump the test stream timeout to 4 seconds to further reduce the chance of
a test flake due to it.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* gha: configure max unavailable in clustermesh upgrade/downgrade

[ upstream commit 350e9d33ed8ae99d0d9bb832c0cf3814284c39ac ]

The goal being to slow down the rollout process, to better highlight
possible connection disruption occurring in the meanwhile. At the same
time, this also reduces the overall CPU load caused by datapath
recompilation, which is a possible additional cause for connection
disruption flakiness.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* gha: explicitly configure IPAM mode in clustermesh upgrade/downgrade

[ upstream commit 7d2505ee18a0015df500de8a4618f94452e1d034 ]

The default IPAM mode is cluster-pool, which gets automatically
overwritten by the Cilium CLI to kubernetes when running on kind.
However, the default helm value gets restored upon upgrade due to
--reset-values, causing confusion and possible issues. Hence, let's
explicitly configure it to kubernetes, to prevent changes.

Similarly, let's configure a single replica for the operator.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* gha: fix incorrectly named test in clustermesh upgrade/downgrade

[ upstream commit a0d7d3767f26276e3e70e1ec026ab2001ced3572 ]

So that it gets actually executed.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* gha: don't wait for hubble relay image in clustermesh upgrade/downgrade

[ upstream commit 1e28a102213700d65cf5a48f95257ca47a2afaf5 ]

Hubble relay is not deployed in this workflow, hence it doesn't make
sense to wait for the image availability.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* gha: enable hubble in clustermesh upgrade/downgrade

[ upstream commit 0c211e18814a1d22cbb6f9c6eadf7017e8d234f9 ]

As it simplifies troubleshooting possible connection disruptions.
However, let's configure monitor aggregation to medium (i.e., the
maximum, and default value) to avoid the performance penalty due
to the relatively high traffic load.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* labels: Add controller-uid into default ignore list

[ upstream commit aef6814f7e895d644234c679a956ef86c832a1b8 ]

[ backporter's note: minor conflicts in pkg/k8s/apis/cilium.io/const.go ]

Having uid in security labels will significantly increase the number of
identities, not to mention about high cardinality in metrics. This
commit is to add *controller-uid related labels into default exclusion
list.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>

* gha: drop double installation of Cilium CLI in conformance-eks

[ upstream commit 9dc89f793de82743d88448aef4c23095c7fd1130 ]

Fixes: 5c06c8e2ea5c ("ci-eks: Add IPsec key rotation tests")
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* loader: sanitize bpffs directory strings for netdevs

[ upstream commit 9b35bc5e8b42e0a8c9a94a6fd59f49f2aa0f1ab9 ]

bpffs directory paths cannot contain the character ".", thus we must
sanitize device names that contain any "." characters. Our solution is
to replace "." with "-". This introduces a risk of naming collisions,
e.g. "eth.0" and "eth-0", in practice the probability of this happening
should be very small.

Fixes: #31813

Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>

* clustermesh-apiserver: use distroless/static-debian11 as the base image

[ upstream commit 1d5615771b3dab1d5eeb9fda18ae0d98a22de73d ]

[ backporter's notes: replaced the nonroot base image with the root one,
  to avoid requiring the Helm changes to configure the fsGroup, which
  could cause issues if users only updated the image version, without
  a full helm upgrade. ]

gops needs to write data (e.g., the PID file) to the file-system, which
turned out to be tricky when using scratch as base image, in case the
container is then run using a non-root UID.

Let's use the most basic version of a distroless image instead, which
contains:

- ca-certificates
- A /etc/passwd entry for a root, nonroot and nobody users
- A /tmp directory
- tzdata

This aligns the clustermesh-apiserver image with the Hubble Relay one,
and removes the need for manually importing the CA certificates. The
GOPS_CONFIG_DIR is explicitly configured to use a temporary directory,
to prevent permission issues depending on the UID configured to run
the entrypoint.

Finally, we explicitly configure the fsGroup as part of the
podSecurityContext, to ensure that mounted files are accessible
by the non-root user as well.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* helm: same securityContext for etcd-init and etcd cmapiserver containers

[ upstream commit 85218802f37ade30d2c14197e797b9ee6b8f832c ]

Configure the specified clustermesh-apiserver etcd container security
context for the etcd-init container as well, to make sure that they
always match, and prevent issues caused by the init container creating
files that cannot be read/written by the main instance later on.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* envoy: Update envoy 1.27.x to 1.28.3

This is as part of regular maintenance as Envoy v1.27 will be EOL in
coming July.

Upstream release: https://github.com/envoyproxy/envoy/releases/tag/v1.28.3
Upstream release schedule: https://github.com/envoyproxy/envoy/blob/main/RELEASES.md#major-release-schedule

Signed-off-by: Tam Mach <tam.mach@cilium.io>

* ingress translation: extract http and https filterchains functions

[ upstream commit 215f5e17c3ca0b55f7534285a64652bb23614ca7 ]

This commit extracts the creation of envoy listener filterchains
for HTTP and HTTPS into dedicated functions `httpFilterChain` &
`httpsFilterChains`.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

* ingress translation: extract tls-passthrough filterchains function

[ upstream commit db72b8203af794081883cb0e843e0f7ebae9b27d ]

This commit extracts a function `tlsPassthroughFilterChains` that contains
the logic of building the Envoy listener filterchains for Ingress TLS passthrough
and Gateway API `TLSRoute`.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

* ingress translation: merge tlspassthrough and http envoy listener

[ upstream commit 9797fa1bd71c290e4cab1234d485fd2b31877a6f ]

Currently, the translation from Ingress Controller and Gateway API resources
into Envoy resources creates two different Envoy listeners for TLS Passthrough
(Ingress TLS Passthrough & Gateway API TLSRoute) and HTTP (non-TLS and TLS) handling.

This comes with some problems when combining multiple Ingresses of different "types"
into one `CiliumEnvoyConfig` in case of using the shared Ingress feature.

- The names of the listeners is the same (`listener`): only one listener gets applied
- Transparent proxying no longer works becauses TLS passthrough and HTTP with TLS uses the
  frontend port `443` on the loadbalancer service. The current implementation of transparently
  forwarding the requests on the service to the corresponding Envoy listener can't distinguish
  between the two existing Envoy listeners.

This leads to situations where requests fail (listener not applied at all) or being forwarded to
the wrong backend (e.g. HTTP with TLS matching the existing TLS passthrough listener filter chain).

Therefore, this commit combines the two Envoy listeners into one - being responsible for HTTP, HTTPS
and TLS passthrough. The corresponding filterchains match the requests by their SNI (HTTPS and TLS passthrough).

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

* ingress passthrough: set hostname in TLSRoute hostnames

[ upstream commit b68d086f8432c305c200bea956ccbc9ffde0a3ec ]

Currently, ingestion of Ingress doesn't set the hostname of the Ingress route
in the internal model.

This way, the resulting Envoy listener filterchain doesn't contain the hostname
in its servernames. In combination with other filterchains (HTTP and HTTPS), this is
to less restrictive.

Therefore, this commit adds the hostname to the hostnames within TLSRoute (internal
model).

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

* proxy/routes: Rename fromProxyRule to fromIngressProxyRule

[ upstream commit: 287dd6313b70de16c65ca235e3d6446687e52211 ]

Because we are introducing fromEgressProxyRule soon, it's better to make
clear that the fromProxyRule is for ingress proxy only.

This commit also changes its mark from MagicMarkIsProxy to
MagicMarkIngress. They hold the same value 0xA00 while have the
different semantics.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>

* proxy/routes: Rename RulePriorityFromProxyIngress

[ upstream commit: 1dd21795ccbebd89f51db292f3e7a8f52e37eed1 ]

No logic changes, just rename it to "RulePriorityFromProxy" without
"Ingress" suffix, because egress rule is using the same priority.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>

* proxy/routes: Introduce fromEgressProxyRule

[ upstream commit: 7d278affe093889bbc245303628caf2677b93cac ]

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>

* proxy/routes: Remove fromEgressProxyRule for cilium downgrade

[ upstream commit: 53133ff80ef4408ae673e5aa030f645bcd449afa ]

Although we don't install fromEgressProxyRule for now, this commit
insists on removing it to make sure further downgrade can go smoothly.

Soon We'll have another PR to install fromEgressProxyRule, and cilium
downgrade from that PR to branch tip (patch downgrade, 1.X.Y ->
1.X.{Y-1}) will be broken if we don't handle the new ip rule carefullly.

Without this patch, downgrade from higher version will leave
fromEgressProxyRule on the lower version cilium, cluster will be in a
wrong status of "having stale ip rule + not having other necessary
settings (iptables)", breaking the connectivity.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>

* chore(deps): update all-dependencies

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* route: dedicated net ns for each subtest of runListRules

[ upstream commit 6c6c121361e97521b1d2b70f24cfa33338dbf297 ]

Currently, there are cases where the test TestListRules/return_all_rules fails
with the following error:

```
--- FAIL: TestListRules (0.02s)
    --- FAIL: TestListRules/returns_all_rules#01 (0.00s)
        route_linux_test.go:490: expected len: 2, got: 3
              []netlink.Rule{
              {
            - Priority: -1,
            + Priority: 9,
              Family:   10,
            - Table:    255,
            + Table:    2004,
            - Mark:     -1,
            + Mark:     512,
            - Mask:     -1,
            + Mask:     3840,
              Tos:      0,
              TunID:    0,
              ... // 11 identical fields
              IPProto:  0,
              UIDRange: nil,
            - Protocol: 2,
            + Protocol: 0,
              },
            + s"ip rule 100: from all to all table 255",
              {Priority: 32766, Family: 10, Table: 254, Mark: -1, ...},
              }
```

It looks like there's a switch of the network namespace during the test
execution.

Therefore, this commit locks the OS thread for the execution of the test
that runs in a dedicated network namespace. In addition, each sub-test of
the table driven testset runs in its own network namespaceas they run
in their own go-routine.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* CI: bump default FQDN datapath timeout from 100 to 250ms

[ upstream commit 34caeb233b0d47a0e4f9553fbd7ac532e0c1a5f8 ]

This timeout can be CPU sensitive, and the CI environments can be CPU
constrained.

Bumping this timeout ensures that performance regressions will still be
caught, as those tend to cause delays of 1+ seconds. This will, however,
cut down on CI flakes due to noise.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* ci: Set hubble.relay.retryTimeout=5s

[ upstream commit 2428b08ad7032db7c2b92744a149a1615cc82144 ]

I noticed CI seemed to flake on hubble-relay being ready.
Based on the logs it was due to it being unable to connect to it's
peers.

Based on the sysdump the endpoints did eventually come up, so I think it
just needs to retry more often, and the default is 30s so lower it to 5s
in CI.

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* Fix helm chart incompatible types for comparison

[ upstream commit 19f2268dfab23e37ae5a6d4f6e9379c4327007f6 ]

Fixes: #32024

Signed-off-by: lou-lan <loulan@loulan.me>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* Agent: add kubeconfigPath to initContainers

[ upstream commit 284ee43f82ad8230ca013f283bb9ad141f5531df ]

This commit adds the missing pass of
the Helm value `kubeConfigPath` to the
initContainer of the Cilium-agent.

Signed-off-by: darox <maderdario@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* Remove aks-preview from AKS workflows

[ upstream commit a758d21bbae09ab0c4a8cd671e05e043a8c1ea5a ]

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* vendor: Bump cilium/dns to fix bug where timeout was not respected

[ upstream commit c76677d81aa58c00698818bc1be55d1f5b4d0b0d ]

This pulls in cilium/dns#11 which fixes a bug where the `SharedClient`
logic did not respect the `c.Client.Timeout` field.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* dnsproxy: Fix bug where DNS request timed out too soon

[ upstream commit 931b8167ea29bfd3ae8e6f11f41a8a1c531c33c8 ]

This fixes a bug where DNS requests would timeout after 2 seconds,
instead of the intended 10 seconds. This resulted in a `Timeout waiting
for response to forwarded proxied DNS lookup` error message whenever the
response took longer than 2 seconds.

The `dns.Client` used by the proxy is [already configured][1] to use
`ProxyForwardTimeout` value of 10 seconds, which would apply also to the
`dns.Client.DialTimeout`, if it was not for the custom `net.Dialer` we
use in Cilium. The logic in [dns.Client.getTimeoutForRequest][2]
overwrites the request timeout with the timeout from the custom
`Dialer`. Therefore, the intended `ProxyForwardTimeout` 10 second
timeout value was overwritten with the much shorter `net.Dialer.Timeout`
value of two seconds. This commit fixes that issue by using
`ProxyForwardTimeout` for the `net.Dialer` too.

Fixes: cf3cc16289b7 ("fqdn: dnsproxy: fix forwarding of the original security identity for TCP")

[1]: https://github.com/cilium/cilium/blob/50943dbc02496c42a4375947a988fc233417e163/pkg/fqdn/dnsproxy/proxy.go#L1042
[2]: https://github.com/cilium/cilium/blob/94f6553f5b79383b561e8630bdf40bd824769ede/vendor/github.com/cilium/dns/client.go#L405

Reported-by: Andrii Iuspin <andrii.iuspin@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* ipam: retry netlink.LinkList call when setting up ENI devices

[ upstream commit cf9bde54bd6eb6dbebe6c5f3e44500019b33b524 ]

LinkList is prone to interrupts which are surfaced by the netlink library.  This leads to stability issues when using the ENI datapath.  This change makes it part of the retry loop in waitForNetlinkDevices.

Fixes: #31974
Signed-off-by: Jason Aliyetti <jaliyetti@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* bpf: nodeport: fix double-SNAT for LB'ed requests in from-overlay

[ upstream commit e984f657636c76ec9a8ac3c8b44a9b014831cff2 ]

The blamed commit attempted to remove all usage of MARK_MAGIC_SNAT_DONE
inside to-overlay, so that we can use MARK_MAGIC_OVERLAY instead.

But it accidentally also touched the LB's NAT-egress path (which is used
by from-overlay to forward LB'ed requests to the egress path, on their way
to a remote backend). Here we need to maintain the usage of
MARK_MAGIC_SNAT_DONE, so that the egress program (either to-overlay or
to-netdev) doesn't apply a second SNAT operation. Otherwise the LB node is
unable to properly RevNAT the reply traffic.

Fix this by restoring the code that sets the MARK_MAGIC_SNAT_DONE flag
for traffic in from-overlay. When such a request is forwarded to the
overlay interface, then to-overlay will consume the mark, override it with
MARK_MAGIC_OVERLAY and skip the additional SNAT step.

Fixes: 5b37dc9d6a4f ("bpf: nodeport: avoid setting SNAT-done mark for to-overlay")
Reported-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* workflows: Fix CI jobs for push events on private forks

[ upstream commit 715906adf2388ef238bf189830434324780c927c ]

Those workflows are failing to run on push events in private forks. They
fail in the "Deduce required tests from code changes" in which we
compute a diff of changes. To compute that diff, the dorny/paths-filter
GitHub action needs to be able to checkout older git references.
Unfortunately, we checkout only the latest reference and drop
credentials afterwards.

This commit fixes it by checking out the full repository. This will take
a few seconds longer so probably not a big issue.

Reported-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* docs: Fix prometheus port regex

[ upstream commit 49334a5b9b79b3804865a084e5b4b2e8909cef6b ]

Signed-off-by: James Bodkin <james.bodkin@amphora.net>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* endpoint: Skip build queue warning log is context is canceled

[ upstream commit 8f0b10613443ffe30bcdc958addcab91416cf316 ]

The warning log on failure to queue endpoint build is most likely not
meaningful when the context is canceled, as this typically happends when
the endpoint is deleted.

Skip the warning log if error is context.Canceled. This fixes CI flakes
like this:

    Found 1 k8s-app=cilium logs matching list of errors that must be investigated:
    2024-04-22T07:48:47.779499679Z time="2024-04-22T07:48:47Z" level=warning msg="unable to queue endpoint build" ciliumEndpointName=kube-system/coredns-76f75df574-9k8sp containerID=3791acef13 containerInterface=eth0 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=637 error="context canceled" identity=25283 ipv4=10.0.0.151 ipv6="fd02::82" k8sPodName=kube-system/coredns-76f75df574-9k8sp subsys=endpoint

Fixes: #31827
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* build(deps): bump pydantic from 2.3.0 to 2.7.1 in /Documentation

[ upstream commit b971e46f02be77e02195ae7654fa3ad99018e00e ]

Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.3.0 to 2.4.0.
- [Release notes](https://github.com/pydantic/pydantic/releases)
- [Changelog](https://github.com/pydantic/pydantic/blob/main/HISTORY.md)
- [Commits](https://github.com/pydantic/pydantic/compare/v2.3.0...v2.4.0)

[ Quentin: The pydantic update requires an update of pydantic_core, too.
    Bump both packages to their latest available version (pydantic 2.7.1
    and pydantic_core 2.18.2). ]

---
updated-dependencies:
- dependency-name: pydantic
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* ci: update docs-builder

[ upstream commit 6e53ad73238d244c0dac5dccabc375729225fdae ]

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* install/kubernetes: update nodeinit image to latest version

[ upstream commit a2069654ea618fa80d2175f9f453ff50c183e7bf ]

For some reason the renovate configuration added in commit ac804b6980aa
("install/kubernetes: use renovate to update
quay.io/cilium/startup-script") did not pick up the update. Bump the
image manually for now while we keep investigating.

Signed-off-by: Tobias Klauser <tobias@cilium.io>

* daemon: don't go ready until CNI configuration has been written

[ upstream commit 77b1e6cdaa6c16c9e17cc54c9e3548b43e4de262 ]

If the daemon is configured to write a CNI configuration file, we should
not go ready until that CNI configuration file has been written. This
prevents a race condition where the controller removes the taint from a
node too early, meaning pods may be created with a different CNI
provider.

In #29405, Cilium was configured in chaining mode, but the "primary" CNI
provider hadn't written its configuration yet. This caused the not-ready
taint to be removed from the node too early, and pods were created in a
bad state.

By hooking in the CNI cell's status in the daemon's Status type, we
prevent the daemon's healthz endpoint from returning a successful
response until the CNI cell has been successful.

Fixes: #29405

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* enable kube cache mutation detector

[ upstream commit de075aaf9b5f77e530f56b74b42c9b78d0b4b1bd ]

This file was accidentally removed while refactoring the commits before
merging the PR.

Fixes: 8021e953e630 (".github/actions: enable k8s cache mutation detector in the CI")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* envoy: pass idle timeout configuration option to cilium configmap

[ upstream commit 889b688668ec29f18e98a66d2df597006cfa9266 ]

Currently, Envoys idle timeout configuration option can be configured in
Helm `envoy.idleTimeoutDurationSeconds` and the agent/operator Go flag
`proxy-idle-timeout-seconds`.

Unfortunately, changing the value in the Helm values doesn't have an effect
when running in embedded mode, because the helm value isn't passed to the
Cilium ConfigMap.

This commit fixes this, by setting the value in the configmap.

Fixes: #25214

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* ci: Increase timeout for images for l4lb test

[ upstream commit 8cea46d58996f248df2a1c8f706b89dcb43048d5 ]

Followup for #27706

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* ci: update docs-builder

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* l7 policy: add possibility to configure Envoy proxy xff-num-trusted-hops

[ upstream commit 1bc2c75da2d3430b752645854a5693ac05817c4d ]

Currently, when L7 policies (egress or ingress) are enforced for traffic
between Pods, Envoy might change x-forwarded-for related headers because the corresponding
Envoy listeners don't trust the downstream headers because `XffNumTrustedHops`
is set to `0`.

e.g. `x-forwarded-proto` header:

> Downstream x-forwarded-proto headers will only be trusted if xff_num_trusted_hops is non-zero. If xff_num_trusted_hops is zero, downstream x-forwarded-proto headers and :scheme headers will be set to http or https based on if the downstream connection is TLS or not.

https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/headers#x-forwarded-proto

This might be problematic if L7 policies are used for egress traffic for Pods from
a non-Cilium ingress controller (e.g. nginx). If the Ingress Controller is terminating TLS
traffic and forwards the protocol via `x-forwarded-proto=https`, Cilium Envoy Proxy changes
this header to `x-forwarded-proto=http` (if no tls termination itself is used in the policy configuration).

This breaks applications that depend on the forwarded protocol.

Therefore, this commit introduces two new config flags `proxy-xff-num-trusted-hops-ingress` and
`proxy-xff-num-trusted-hops-egresss` that configures the property `XffNumTrustedHops` on the respective
L7 policy Envoy listeners.

For backwards compabitility and security reasons, the values still default to `0`.

Note: It's also possible to configure these values via Helm (`envoy.xffNumTrustedHopsL7PolicyIngress` &
`envoy.xffNumTrustedHopsL7PolicyEgress`).

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>

* chore(deps): update docker.io/library/golang:1.21.9 docker digest to d83472f

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* loader: implement seamless tcx to legacy tc downgrade

[ upstream commit d4a159a2e772f87afdf3ef6a2d3cb56799cc4f5a ]

Due to confusion around the semantics of the tcx API, we believed downgrading
tcx to legacy tc attachments couldn't be done seamlessly. This doesn't seem to
be the case. As documented in the code, install the legacy tc attachment first,
then remove any existing tcx attachment for the program.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

* contrib: bump etcd image version in cilium-etcd systemd service

[ upstream commit 26bb9f05677f7e47f6011baa92e90a054e541187 ]

As per Mahé's debugging in https://github.com/cilium/cilium/pull/31823#issuecomment-2051601958,
we need a more recent etcd image for compatibility with docker v26.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>

* chore(deps): update stable lvh-images

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* chore(deps): update dependency cilium/cilium-cli to v0.16.6

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* chore(deps): update all github action dependencies

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* envoy: Update to use source port in connection pool hash

[ upstream commit d5efd282b8bdbe9ee99cf82c02df07555e6fdb9c ]

Update Envoy image to a version that includes the source port in upstream
connection pool hash, so that each unique downstream connection gets a
dedicated upstream connection.

Fixes: #27762
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

* chore(deps): update docker.io/library/ubuntu:22.04 docker digest to a6d2b38

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* bpf: lb: un-break terminating backends for service without backend

[ upstream commit 0de6f0f230be10e084f30fb3128c215edde1611f ]

[ backporter's notes: add the ->count checks in slightly different
  locations, as we're missing a bunch of LB cleanup PRs. ]

Continue to forward traffic for established connections, even when a
service loses its last active backends.

This needs a small adjustment in a BPF test that was relying on this
behaviour.

Fixes: 183501124869 ("bpf: drop SVC traffic if no backend is available")
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>

* bpf: test: add LB test for terminating backend

[ upstream commit 7ece278d42cca541b2e8e862e717f2536935af11 ]

Once a LB connection has been established, we expect to continue using
its CT entry to obtain the backend. Even if the backend is in terminating
state, and the service has lost all of its backends.

Keeping this separate from the fix, in case we can't easily backport.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>

* chore(deps): update golangci/golangci-lint-action action to v6

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* bpf: host: simplify MARK_MAGIC_PROXY_EGRESS_EPID handling

[ upstream commit feaf6c7369d5bb3d0ca615c709979a4731a41454 ]

inherit_identity_from_host() already knows how to handle
MARK_MAGIC_PROXY_EGRESS_EPID and extract the endpoint ID from the mark. Use
it to condense the code path, similar to how to-container looks like.

Also fix up the drop notification, which currently uses the endpoint ID in
place of the source security identity.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* bpf: host: clean up duplicated send_trace_notify() code

[ upstream commit 446cf565f6b059196ce7661d8bb91421d8a3cc02 ]

Use a single send_trace_notify() statement, with parameters that can be
trivially optimized out in the from-netdev path.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* clustermesh: fix panic if the etcd client cannot be created

[ upstream commit 58b74f5cd9902363e11e6d78799be474090ba47c ]

The blamed commit anticipated the execution of the watchdog in charge of
restarting the etcd connection to a remote cluster in case of errors.
However, this can lead to a panic if the etcd client cannot be created
(e.g., due to an invalid config file), as in that case the returned
backend is nil, and the errors channel cannot be accessed.

Let's push again this logic below the error check, to make sure that
the backend is always valid at that point. Yet, let's still watch
for possible reconnections during the initial connection establishment
phase, so that we immediately restart it in case of issues. Otherwise,
this phase may hang due to the interceptor preventing the establishment
to succeed, given that it would continue returning an error.

Fixes: 174e721c0e69 ("ClusterMesh: validate etcd cluster ID")
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* install/kubernetes: add AppArmor profile to Cilium Daemonset

[ upstream commit 2136418dab6c2cd799d14a1443e29ca1ab5b5238 ]

[ backporter's notes: values.schema.json has been introduced in #30631
and thus is not present in v1.15. Changes to that file have been
ignored. ]

Starting from k8s 1.30 together with Ubuntu 24.04, Cilium fails to
initialize with the error:

```
Error: applying apparmor profile to container 43ed6b4ba299559e8eac46a32f3246d9c54aca71a9b460576828b662147558fa: empty localhost AppArmor profile is forbidden
```

This commit adds the "Unconfined" as default, where users can overwrite
it with any of the AppArmor profiles available on their environments, to
all the pods that have the "container.apparmor.security.beta.kubernetes.io"
annotations.

Signed-off-by: André Martins <andre@cilium.io>

* docs: Add annotation for Ingress endpoint

[ upstream commit 2949b16f4a7a58f246622e3ae9d9dbc1cb09efcf ]

Relates: #19764
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* pkg: don't cache Host identity rule matches

[ upstream commit 8397e45ee46ba65ef803805e4317d09a508862fe ]

Unlike every other identity, the set of labels for the reserved:host
identity is mutable. That means that rules should not cache matches for
this identity.

So, clean up the code around determining matches.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* ipsec: Refactor temporary removal of XFRM state

[ upstream commit e7db879706e62a882a5e80696a5d0b5812b515ff ]

Context: During IPsec upgrades, we may have to temporarily remove some
XFRM states due to conflicts with the new states and because the Linux
API doesn't enable us to perform this atomically as we do for XFRM
policies.

This commit moves this removal logic to its own function. That logic
will grow in subsequent commits as I'll add debugging information to the
log message.

This commit doesn't make any functional changes.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* ipsec: Log duration of temporary XFRM state removal

[ upstream commit bba016e37ccc30f8b06b6837a72d184fa55eecd2 ]

Context: During IPsec upgrades, we may have to temporarily remove some
XFRM states due to conflicts with the new states and because the Linux
API doesn't enable us to perform this atomically as we do for XFRM
policies.

This temporary removal should be very short but can still cause drops
under heavy throughput. This commit logs the duration of the removal so
we can validate that it's actually always short and estimate the impact
on packet drops.

Note the log message will now be displayed only once the XFRM state is
re-added, instead of when it's removed like before.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* ipsec: Log XFRM errors during temporary state removal

[ upstream commit 76d66709b9a20677010ec3bfa3fef61b2c4665fe ]

Context: During IPsec upgrades, we may have to temporarily remove some
XFRM states due to conflicts with the new states and because the Linux
API doesn't enable us to perform this atomically as we do for XFRM
policies.

This temporary removal should be very short but can still cause drops
under heavy throughput. This commit logs how many such drops happened.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* ci: Filter supported versions of AKS

[ upstream commit dbcdd7dbe2e9a72a7c8c9f9231c4349ea7e94c0c ]

Whenever AKS stopped supporting a particular version of AKS, we had to
manually remove it from all stable branches. Now instead of that, we
will dynamically check if it's supported and only then run the test.

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* cni: Use correct route MTU for various cloud cidrs

[ upstream commit 29a340ee6c49f48e70d01ba715e016e9bb5a2d13 ]

This commit corrects the MTU that is used by the cilium-cni plugin when
creating routes for CIDRs received from ENI, Azure or Alibaba Cloud.

The cilium-agent daemon returns two MTUs to the cilium-cni plugin: a
"device" MTU, which is used to set the MTU on a Pod's interface in
its network namespace, and a "route" MTU, which is used to set the MTU
on the routes created inside the Pod's network namespace that handle
traffic leaving the Pod. The "route" MTU is adjusted based on the Cilium
configuration to account for any configured encapsulation protocols,
such as VXLAN or WireGuard. Before this commit, when ENI, Azure or Alibaba
Cloud IPAM was enabled, the routes created in a Pod's network namespace
were using the "device" MTU, rather than the "route" MTU, leading to
fragmentation issues.

Signed-off-by: Ryan Drew <ryan.drew@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* fqdn: Change error log to warning

[ upstream commit f1925b59247739ab3beb5ed3b395025d887c2825 ]

There is no reason why the log level of "Timed out waiting for datapath
updates of FQDN IP information" log message should be an error. Change it
to a warning instead.

Add a reference to --tofqdns-proxy-response-max-delay parameter to make
this warning actionable.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* bpf: lxc: don't take RevDNAT tailcall for service backend's ICMP messages

[ upstream commit b1fae235092158ef642f81a4150db3cb3801d2ac ]

When tail_nodeport_rev_dnat_ingress_ipv*() is called by from-container to
apply RevDNAT for a local backend's reply traffic, it drops all unhandled
packets. This would mostly be traffic where the pod-level CT entry was
flagged as .node_port = 1, but there is no corresponding nodeport-level
CT entry to provide a rev_nat_index for RevDNAT. Essentially an unexpected
error situation.

But we didn't consider that from-container might also see ICMP error
messages by a service backend, and the CT_RELATED entry for such traffic
*also* has the .node_port flag set. As we don't have RevDNAT support for
such traffic, there's no good reason to send it down the RevDNAT tailcall.
Let it continue in the normal from-container flow instead. This avoids
subsequent drops with DROP_NAT_NO_MAPPING.

Alternative solutions would be to tolerate ICMP traffic in
tail_nodeport_rev_dnat_ingress_ipv*(), or not propagate the .node_port flag
into CT_RELATED entries.

Fixes: 6936db59e3ee ("bpf: nodeport: drop reply by local backend if revDNAT is skipped")
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* docs: add link to sig-policy meeting

[ upstream commit d74bc1c90fc4edfb1f223d9460b31e0e52885e65 ]

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* gha: bump post-upgrade timeout in clustermesh upgrade/downgrade tests

[ upstream commit 01c3b8376046369571176fe6c58e7502bf3c4fd2 ]

[ backporter's notes: The workflow looks different in v1.15, the option
to bump the timeout has been applied to the wait for cluster mesh to be
ready after rolling out Cilium in cluster2. ]

The KPR matrix entry recently started failing quite frequently due to
cilium status --wait failing to complete within the timeout after the
upgrading Cilium to the tip of main. Part of the cause seems to be
cilium/test-connection-disruption@619be5ab79a6, which made the connection
disruption test more aggressive. In turn, causing more CPU contention
during endpoint regeneration, which now requires longer. Hence, let's
bump the timeout, to avoid failing too early due to the endpoint
regeneration not having terminated yet.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

* fqdn: Fix Upgrade Issue Between PortProto Versions

[ upstream commit a682a621e4e555b766c0cc1bd66770fa9ffbecc3 ]

Users of this library need Cilium to both check
restore and updated DNS rules for the new PortProto
version. Otherwise upgrade incompatibilities exist
between Cilium and programs that utilize this library.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* docs: Alias "kubectl exec" command in Host Firewall guide

[ upstream commit 0e07aa5f86d043af722e9c297d817d2889d72ae3 ]

The documentation for the Host Firewall contains a number of invocations
of the cilium-dbg command-line tool, to run on a Cilium pod. For
convenience, the full, standalone command starting with "kubectl -n
<namespace> exec <podname> -- cilium-dbg ..." is provided everytime. But
this makes for long commands that are hard to read, and that sometimes
require scrolling on the rendered HTML version of the guide.

Let's shorten them by adding an alias for the recurrent "kubectl exec"
portion of the commands.

Signed-off-by: Quentin Monnet <qmo@qmon.net>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* docs: Clean up Host Firewall documentation

[ upstream commit 21d5f0157e9fc94856b4fb18342401b5798a0d5f ]

This commit is a mix of various minor improvements for the host firewall
guide and the host policies documentation:

- Fix some commands in troubleshooting steps:
    - "cilium-dbg policy get" does not show host policies
    - "kubectl get nodes -o wide" does not show labels
    - Add the policy audit mode check

- (Attempt to) improve style

- Mark references with the ":ref:" role in a consistent way

- Improve cross-referencing (in particular, the text displayed for the
  K8s label selector references was erroneous)

- Strip trailing whitespaces

Signed-off-by: Quentin Monnet <qmo@qmon.net>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* docs: Document Host Policies known issues

[ upstream commit 642921dc760c731b4eac5131ac60faef672878ff ]

Make sure that users of the Host Firewall are aware of the current
limitations, or rather the known issues, for the feature.

Signed-off-by: Quentin Monnet <qmo@qmon.net>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* docs: Link Host Policies "troubleshooting"/"known issues" from HFW tuto

[ upstream commit 5dd2f14a732d325ba498045331ab8b8ebba76a6f ]

Help users reading the Host Firewall tutorial to find the sections
related to troubleshooting host policies or listing known issues for the
Host Firewall.

Signed-off-by: Quentin Monnet <qmo@qmon.net>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>

* chore(deps): update go to v1.21.10

Signed-off-by: renovate[bot] <bot@renovateapp.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <noreply@cilium.io>

* envoy: Bump go version to 1.22.3

This is to for security fixes in go 1.22.

Related upstream: https://groups.google.com/g/golang-announce/c/wkkO4P9stm0/
Related build: https://github.com/cilium/proxy/actions/runs/8994624308/job/24708355529

Signed-off-by: Tam Mach <tam.mach@cilium.io>

* api: Introduce ipLocalReservedPorts field

[ upstream commit 23a9607e4e88dd9a8e139108bd2ba20b441ae0ff ]

This field will be used by the CNI plugin to set the
`ip_local_reserved_ports` sysctl option.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* wireguard: Remove unused constants

[ upstream commit 02fd4eeeefdde71ddabb6dbcf66e605a9eeb866c ]

These values were never used in any released version of Cilium. Let's
remove them.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* wireguard: Export ListenPort constant

[ upstream commit 117cb062266e61bbb57391e6750aaf88dd0a2328 ]

This allows other packages to obtain the standard WireGuard port.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* cilium-cni: Reserve ports that can conflict with transparent DNS proxy

[ upstream commit 11fe7cc144aaff1a44b6ed9c733a9025c674e683 ]
[ backporter notes:

   1. Fixes conflicts due to sysctl reconciler not being present
   2. Also reserve Geneve tunnel port in `auto` configuration

]

This commit adds the ability for the cilium-cni plugin to reserve IP
ports (via the `ip_local_reserved_ports` sysctl knob) during CNI ADD.
Reserving these ports will prevent the container network namespace to use
these ports as an ephemeral source port, while still allowing the port to
be explicitly allocated (see [1]).

This functionality is added as a workaround for cilium/cilium#31535
where transparent DNS proxy mode causes conflicts when an ephemeral
source port is being chosen that is already being used in the host
network namespace. By reserving ports used by Cilium itself (such as
WireGuard and VXLAN), we hopefully reduce the number of such conflicts.

The set of reserved ports can be configured via a newly introduced agent
flag. By default, it will reserve an auto-generated set of ports.  The
list of ports is configurable such that users running custom UDP
services on ports in the ephemeral port range can provide their own set
of ports. The flag may be set to an empty string to disable reservations
altogether.

[1] https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

* k8s/watcher: Remove outdated comments about K8sEventHandover

[ upstream commit 03d8e73068cb422af0aca14c42ee3443c913fbf6 ]

K8sEventHandover is no longer a flag so remove any remaining references
to it.

Signed-off-by: Chris Tarazi <chris@isovalent.com>

* daemon, endpoint: Consolidate K8s metadata into struct

[ upstream commit b17a9c64d3a16041f53870587331fc441a2a6c1b ]

Refactor the Kubernetes-fetched metadata into a consolidated struct type
for ease of adding a new type without increasing the function signatures
of the relevant functions.

Signed-off-by: Chris Tarazi <chris@isovalent.com>

* watchers: Detect Pod UID changes

[ upstream commit aa628ad60fb0af5056c500a111e8ddbf703c8c24 ]

This commit causes the Kubernetes Pod watcher to trigger endpoint
identity updates if the Pod UID changes. This covers the third scenario
pointed out in [1], see issue for more details.

To summarize, with StatefulSet restarts, it is possible for the
apiserver to coalesce Pod events where the delete and add event are
replaced with a single update event that includes, for example a label
change. If Cilium indexes based on namespace/name of the pod, then it
will miss this update…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants