v1.13 backports 2022-12-21 #22822

joestringer · 2022-12-21T03:57:19Z

v1.13 backports 2022-12-21

test/l4lb, nat64x46: pass k8s api server to the standalone proxy #22627 -- test/l4lb, nat64x46: pass k8s api server to the standalone proxy (@squeed)
Introduce v3 backend maps #21797 -- Introduce v3 backend maps (@YutaroHayakawa)
ipam/crd: Fix agent fatal on router initialization #22477 -- ipam/crd: Fix agent fatal on router initialization (@jaffcheng)
daemon: Do not remove PERM L2 entries in L4LB #22676 -- daemon: Do not remove PERM L2 entries in L4LB (@brb)
clustermesh: Add an infrastructure for connect time parameter exchange and capability negotiation #22553 -- clustermesh: Add an infrastructure for connect time parameter exchange and capability negotiation (@YutaroHayakawa)
gha: Add retry mechanism for conformance ingress (shared) #22673 -- gha: Add retry mechanism for conformance ingress (shared) (@sayboras)
daemon, fqdn: Fix race between parallel DNS requests for the same name #22252 -- daemon, fqdn: Fix race between parallel DNS requests for the same name (@christarazi)
Add policy listener support #21600 -- Add policy listener support (@jrajahalme)
test/helpers: Fix retry condition for CiliumExecContext #22726 -- test/helpers: Fix retry condition for CiliumExecContext (@christarazi)
datapath: Don't enforce ingress policies at overlay if endpoint routes are enabled #22333 -- datapath: Don't enforce ingress policies at overlay if endpoint routes are enabled (@pchaigno)
gh/workflows: Add DP CI for encryption #22418 -- gh/workflows: Add DP CI for encryption (@brb)
Per-node configuration overrides (v2) #22656 -- Per-node configuration overrides (v2) (@squeed)
- @squeed The last commit here changes the workflow files for master, which are deleted on the v1.13 branch. I think we need to also update the master branch's CI workflow definitions for the v1.13 tree to match.
install/kubernetes: make securityContext SELinux options configurable #22721 -- install/kubernetes: make securityContext SELinux options configurable (@tklauser)
bpf: add drop notification for missed L7 LB tailcall in to-netdev #22679 -- bpf: add drop notification for missed L7 LB tailcall in to-netdev (@julianwiedmann)
backporting: leave backport/author PRs alone #22654 -- backporting: leave backport/author PRs alone (@bimmlerd)
cilium: Add deprecation warning for service ids #22700 -- cilium: Add deprecation warning for service ids (@joamaki)
fqdn: dnsproxy: fix data race in dns proxy implementation #22619 -- fqdn: dnsproxy: fix data race in dns proxy implementation (@aspsk)
Add tests for hubble metrics handlers #22518 -- Add tests for hubble metrics handlers (@marqc)
test: service: fix formatting of error msg in doFragmentedRequest() #22772 -- test: service: fix formatting of error msg in doFragmentedRequest() (@julianwiedmann)
ci: Replace deprecated hubble observe -o json with -o jsonpb #22796 -- ci: Replace deprecated hubble observe -o json with -o jsonpb (@gandro)
Add sphinxcontrib-googleanalytics to doc requirements #22821 -- Add sphinxcontrib-googleanalytics to doc requirements (@chalin)
daemon/cmd: improve stale cilium endpoint error handling. #22600 -- daemon/cmd: improve stale cilium endpoint error handling. (@tommyp1ckles)
Datapath support for Cilium mTLS #21822 -- Datapath support for Cilium mTLS (@jrajahalme)

Skipped due to conflicts, seems some datapath testing PRs are not tagged for backport:

test/l4lb,nat46x64: Replace Kind/Helm with DinD #22653 -- test/l4lb,nat46x64: Replace Kind/Helm with DinD (@brb)
cilium: completion of nat46/64 gateway #22421 -- cilium: completion of nat46/64 gateway (@borkmann)

Once this PR is merged, you can update the PR labels via:

$ for pr in 22627 21797 22477 22676 22553 22673 22252 21600 22726 22333 22418 22656 22721 22679 22654 22700 22619 22518 22772 22796 22821 22600 21822; do contrib/backporting/set-labels.py $pr done 1.13; done

[ upstream commit bdc23f5 ] These tests deploy the standalone agent in a funny way; using the existing helm chart, but not configuring it fully. This breaks the config-resolver, which expects to be able to reach a functioning in-cluster apiserver. So, just pass the correct apiserver to the config resolver. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 0f2f3fe ] Add some missing helper functions to implement v3 backend map. 1. A new AddrCluster constructor AddrClusterFrom 2. A new method of AddrCluster to extract ClusterID Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit c6e53ea ] Extend current backend map values (struct lb4_backend and struct lb6_backend) to contain a new cluster_id field. With this field, we can distinguish two backends which have the same IP address, but belong to different clusters. Since we don't have available padding bit for both structs anymore, we need to extend the structs and bump the map version as well. This commit also contains migration code from v2 backend map to v3 backend map. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 4d7acac ] Currently, when a cilium-agent in eni/alibabacloud mode initializes router info, it might encounter the following fatal: ``` level=fatal msg="Error while creating daemon" error="failed to create router info invalid ip: " subsys=daemon ``` The gateway IP of routing info is derived from the CIDR of the subnet, the eni.Subnet.CIDR in InstanceMap is set as empty after ENI creation. In normal cases it will be filled by a later resyncTrigger after maintainIPPool. But if another goroutine (periodic resync or pool maintainer of another node) happens to sync local InstanceMap cache to k8s, cilium-agent would be informed of that ENI IP pool with empty cidr and router IP allocation would fatal due to empty gateway IP. This patch fixes this by filling the CIDR right after ENI creation. Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit ced77b3 ] In the L4LB mode, the PERM L2 entries are managed by users. So, in this mode Cilium should not mess with them. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 857060b ] Add a new type CiliumClusterConfig which represents a cluster configuration. This will be serialized and stored into kvstore during the clustermesh-apiserver startup time. Later on, cilium-agent on each node reads it when connecting to new clusters. The current use case of this is getting ClusterID at connect time, but by exposing the cluster configuration, we can also do some useful validation such as - Make sure the cluster id is not conflicting with existing clusters. - Make sure the new cluster doesn't have any capability mismatch. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit b24973e ] Add hepler functions to set/get cluster information on kvstore. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 5e5a26e ] Implement a basic connect-time validation using CiliumClusterConfig. clustermesh-apiserver is modified to set local CiliumClusterConfig on start-up time and cilium-agent is modified to get CiliumClusterConfig of remote clusters and validates it. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 398cf5e ] For compatibility with an old Cilium that doesn't support cluster configuration feature, we should be able to connect to the remote cluster even if the configuration is missing. Test it. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 303d2e7 ] This is to make sure http retry was done as part of the test. Fixes: #21710 Fixes: #21993 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 25fa14c ] This allows us to stress / benchmark the main callback function of the DNS proxy within Cilium. This callback is called on each DNS request and response made by each endpoint managed by Cilium. For this commit, we are only considering the DNS response path because it is the path which contains the most work to be done. This includes policy calculation, identity allocation, and ipcache upsert. It is useful to establish baseline performance numbers so that we can compare against changes to this code path. In the following commits, a mutex will be added to this path to fix a race condition during parallel DNS response handling. Baseline: ``` $ go test -v -tags integration_tests ./daemon/cmd -check.b -check.bmem -check.f DaemonFQDNSuite.Benchmark_notifyOnDNSMsg -test.benchtime 100x ... PASS: fqdn_test.go:180: DaemonFQDNSuite.Benchmark_notifyOnDNSMsg 20000 96078 ns/op 36567 B/op 482 allocs/op ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 604eb21 ] It is possible for parallel DNS requests handled by separate goroutines (controlled by --dnsproxy-concurrency-limit) to race if they are for the same DNS name, i.e. both goroutines handling cilium.io. Here is a diagram for this race: ``` G1 G2 T0 --> NotifyOnDNSMsg() NotifyOnDNSMsg() <-- T0 T1 --> UpdateGenerateDNS() UpdateGenerateDNS() <-- T1 T2 ----> mutex.Lock() +---------------------------+ |No identities need updating| T3 ----> mutex.Unlock() +---------------------------+ T4 --> UpsertGeneratedIdentities() UpsertGeneratedIdentities() <-- T4 T5 ----> Upsert() DNS released back to pod <-- T5 | T6 --> DNS released back to pod | | | | | v v Traffic flows fine Leads to policy drop ``` Note how G2 releases the DNS msg back to the pod at T5 because UpdateGenerateDNS() was a no-op. It's a no-op because G1 had executed UpdateGenerateDNS() first at T1 and performed the necessary identity allocation for the response IPs. Due to G1 performing all the work first, G2 executes T4 also as a no-op and releases the msg back to the pod at T5 before G1 would at T6. The above results in a policy drop because it is assumed that when the DNS msg is released back to the pod / endpoint, then the ipcache has been updated with the proper IP <-> identity mapping. In other words, because G2 releases the DNS msg back to the pod / endpoint, the app attempts to use the IPs returned in the DNS response, but the ipcache lookup fails in the datapath because the IP <-> identity mappings don't exist. In essence, this is a race between DNS requests to the ipcache. To fix, we create a critical section when handling the DNS response IPs. Allocating identities and associating them to IPs in the ipcache is now atomic because of the critical section. This is especially important as we've already seen for multiple DNS requests are in-flight for the same name (i.e. cilium.io). The critical section is implemented using a fixed sized array of mutexes. Why? Because the other options (listed below) all have flaws which may result in drops. Other options considered: 1) A map of mutexes keyed on DNS name 2) A map of mutexes keyed on set of DNS response IPs 3) A map of mutexes keyed on endpoint ID 4) A slice of mutexes with hash based on individual IPs from the set of DNS response IPs (1) could cause a race between the DNS response IPs from `a.com` and the set of DNS response IPs from `b.com` especially if the sets are the same. (2) could race between a super / subset of DNS response IPs, i.e. only an intersection of sets would have protection. Any IPs that are not in the intersection are vulnerable to race. (3) could cause a race between different endpoints querying the same name, i.e. two endpoints querying cilium.io. (4) could work but this is approach is not ideal as the slice scales with the number of unique IPs seen by the DNS proxy. This has memory consumption concerns. Therefore, we opt for a fixed size array of mutexes where we hash the individual IPs from the set of DNS response IPs. For example, say we attempt to resolve cilium.io and the response IPs are A and B. hash(A) % mod = 8 hash(B) % mod = 3 where mod is --dnsproxy-lock-count, i.e. size of array What's returned is a slice containing the above array indices that's sorted in ascending order. This is the order in which the mutexes must be locked and unlocked. To be specific, [3, 8] is the sorted resulted and it means we acquire & release mutex at index 3 and then do the same for mutex at index 8. What this does is essentially serialize parallel DNS requests which involve the same IP <-> identity mappings in the ipcache. It's possible that the hash of the IPs collide (map to the same mutex) when the IPs are different, but this is OK because it's overprotecting, rather then underprotecting which the other options above suffer from. Overprotecting results in a small performance hit because we are potentially serializing completely different / unrelated DNS requests, i.e. no intersection in the response IP sets. For --dnsproxy-lock-count, a default of 128 was chosen as a reasonable balance between memory usage and hash collisions, for most deployments of Cilium. An increased lock count will result in more memory usage, but faster DNS processing as there are less IP hash collisions, and therefore less mutexes to acquire. A decreased lock count will save memory at the cost of slower DNS processing as the IP hash collisions are more likely, and this tends towards behavior similar to a single, shared mutex. Users who need to tune this value because they have many parallel DNS requests (1000s) can make the trade-off of using more memory to avoid hash collisions. It's worth mentioning that the memory usage is for storing a instance of a mutex which is 64 bits as of Go 1.19 [1]. Benchmark run comparing the code before this commit, changing the fixed sized array to one to simulate a single mutex, and this commit (multiple-locks): ``` $ go test -v -tags integration_tests ./daemon/cmd -check.b -check.bmem -check.f DaemonFQDNSuite.Benchmark_notifyOnDNSMsg -test.benchtime 100x -test.count 8 ``` ``` $ benchstat no-lock.txt single-lock.txt multiple-locks.txt name \ time/op no-lock.txt single-lock.txt multiple-locks.txt _notifyOnDNSMsg 93.2µs ± 9% 179.6µs ± 3% 143.2µs ±28% name \ alloc/op no-lock.txt single-lock.txt multiple-locks.txt _notifyOnDNSMsg 36.5kB ± 2% 35.3kB ± 0% 33.8kB ± 1% name \ allocs/op no-lock.txt single-lock.txt multiple-locks.txt _notifyOnDNSMsg 485 ± 1% 474 ± 0% 453 ± 1% ``` Reproducing the bug and performance testing on a real cluster included running the 3 different modes and comparing the average processing time of the DNS proxy code. The cluster was 2 nodes with 32 replicas of the following workload: # websites.txt contains different common URLs, some duplicated. while [[ true ]]; do curl --parallel --parallel-immediate --parallel-max 5 --config websites.txt sleep 0.25s done No lock: 4.61ms, 18.3ms Single lock: 7.42ms, 18.3ms Multiple locks: 5.25ms, 17.9ms [1]: https://cs.opensource.google/go/go/+/refs/tags/go1.19.3:src/sync/mutex.go;l=34 Co-authored-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 9eb0bfb ] Optimize policy processing by precomputing redirect types for the policy when ready to avoid multiple scans of the policy filters later. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 38589e7 ] Add a sticky parser type for CiliumEnvoyConfig CRDs. This will be used for policy based redirect to custom Envoy listeners. While CRD parser type will be redirected to Envoy, it is generally handled by a custom Listener, which may not perform HTTP policy enforcement. Thus this parser type is incompatible with HTTP rules. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 27861ae ] Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 7b01459 ] Listener config can fail if we don't wait for clusters to be configured first, so wait for clusters to be configured if both clusters and listeners are configured at the same time. While failure like this was once seen on local testing for the custom listener support in a CNP, similar failure could also happen for Cilium Ingress and Gateway API. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 0036e24 ] Support injecting Cilium Network filter also to TcpProxy filter chains. This enables SNI and other L3/L4 policy enforcement features also on custom Envoy listeners applied via Cilium Envoy Config resources that contain Envoy filter chains using TcpProxy filter, in addition to Http Connection Manager filter. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 41fe8c8 ] Tell envoy.ParseResources() if the resources are intended for a L7 load balancer. If so, the Envoy filter chain is configured to not use original source address and use the socket mark as the source endpoint ID. This was the current behavior with all CEC CRDs. When the CEC CRD does not contain a frontend service that is redirected to the defined listener, then the new 'isL7LB' flag is passed as 'false'. In this case the Envoy filter chain is configured to allow use of the original source address, and mark the upstream socket with the numeric source security identity. This is also the way in which Cilium uses Envoy Listener filter chains when enforcing policy for HTTP, for example. In this mode L4 LB is assumed to be taken place, if applicable, before traffic is redirected to the Envoy listener. Prior to this change CEC CRDs were mostly only usable for L7 LB uses, such as with Cilium Ingress. After this change CEC CRDs without a Service redirect can be used in place of Cilium policy enforcement filter chains, if desired. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

… now [ upstream commit a679c40 ] Specify proxy port as egress (or not-ingress) in test cases as the datapath currently supports ProxyTypeCRD only for L7 LB which is always on the egress path. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit aff0655 ] Add a new no-op CRDRedirect type to be used with Envoy listeners defined in CEC CRDs. In this case the listeners already exist and new Envoy Listener resources do not need to be created for them. This is needed for the forthcoming policy feature where policy can refer to a Listener defined in CEC CRD. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 06ed0a4 ] Find CRD proxy port by name instead of type. This is needed for enabling CEC CRD defined listeners to be used in CNPs. Prior to this CRD proxy ports did not use this code path, which is only called from endpoint policy updates, so there was no need to find CRD proxy ports by name. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit ede26fb ] Move resourceQualifiedName() to policy/api and export it so that it can be used in policy as well as in envoy package. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 9159609 ] Add `listener` field to CNP and CCNP, that causes traffic to the specified port(s) to be redirected to the named Envoy listener. If the listener does not exist the traffic is not allowed at all. When `listener.envoyConfig.kind` is left out it defaults to namespaced `CiliumEnvoyConfig` for rules in namespaced policies (CNP) or to cluster-scoped `CiliumClusterwideEnvoyConfig` for rules in cluster-scoped policies (CCNP). Namespaced policies can also refer to cluster-scoped listeners with an explicit `listener.envoyConfig.kind: CiliumClusterwideEnvoyConfig`. Cluster-scoped policies can not refer to namespaced listeners. Endpoint policies are regenerated whenever Envoy listeners change to update potential listener redirections in the bpf policy maps. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 527f5fd ] This makes it much easier to track down which code path the executor logs are coming from, so that when debugging issues, we can focus on the relevant code path. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit bf3fd5f ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit dab8723 ] Previously, 11cb4d0 assumed that 137 was the exit code for when a process exists due to a SIGKILL. However, upon reading the Go source code as of 1.20 rc1, this is not the case, and that -1 is set for all exit codes due to signals [1]. Fixes: 11cb4d0 ("test: Keep trying exec if killed") Fixes: #22570 [1]: https://github.com/golang/go/blob/go1.20rc1/src/os/exec_posix.go#L128-L130 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit e49ab12 ] When endpoint routes are enabled, we should enforce ingress policies at the destination lxc interface, to which a BPF program will be attached. Nevertheless, today, for packets coming from the overlay, we enforce ingress policies twice, once at the e.g. cilium_vxlan interface and a second time at the lxc device. This is happening for two reasons: 1. bpf_overlay is not aware of the endpoint routes settings so it doesn't even know that it's not responsible for enforcing ingress policies. 2. We have a flag to force the enforcement of ingress policies at the source in this case. This flag exists for historic reasons that are not valid anymore. A separate patch will fix the reason 2 above. This commit fixes reason 1 by telling bpf_overlay to *not* enforce ingress policies when endpoint routes are enabled. Note that we do not support the case where some endpoint have endpoint routes enabled and others don't. If we did, additional logic would be required. Fixes: 3179a47 ("datapath: Support enable-endpoint-routes with encapsulation") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 3d2ceaf ] The previous commit changed the packet handling on the path overlay->lxc to fix a bug. More presicely, when endpoint routes are enabled, we won't enforce ingress policies on both the overlay and the lxc devices but only on the latter. However, as a consequence of that patch, we don't go through the policy-only program in bpf_lxc and we therefore changed the way the packet is transmitted between overlay and lxc devices in some cases. As a summary of changes made in the previous path, consider the following table for the path overlay -> lxc. Before the previous patch: | Endpoint routes | Enforcement | Path | |-----------------|-----------------|----------------------| | Enable | overlay AND lxc | bpf_redirect if KPR; | | | | stack otherwise | | Disabled | overlay | bpf_redirect | Now: | Endpoint routes | Enforcement | Path | |-----------------|-------------|--------------| | Enable | lxc | bpf_redirect | | Disabled | overlay | bpf_redirect | The previous patch intended to fix the enforcement to avoid the double policy enforcement, but it also changed the packet path in case endpoint routes are enabled. This patch now fixes this by adding the same exception we have in bpf_lxc to the l3.h logic we have. Hence, with the current patch, the table will look like: | Endpoint routes | Enforcement | Path | |-----------------|-------------|----------------------| | Enable | lxc | bpf_redirect if KPR; | | | | stack otherwise | | Disabled | overlay | bpf_redirect | I've kept this in a separate commit from the previous in an attempt to split up and the logic and more clearly show the deltas. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 843c072 ] The encryption test details - cilium/cilium-cli#1241. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io>

[ upstream commit 884ccc1 ] This generates the CiliumNodeConfig type, a new way to set configuration overrides on a set of nodes. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

joestringer · 2022-12-21T04:08:46Z

I'm not sure how #22656 is going to interact with the v1.13 CI, but just in case I've also posted #22823. cc @squeed

joestringer · 2022-12-21T04:12:05Z

/test-backport-1.13

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) with L4 policy Tests NodePort with L4 Policy

Failure Output

FAIL: Request from testclient-6xfrm pod to service tftp://[fd04::12]:31948/hello failed

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create one.

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sKafkaPolicyTest Kafka Policy Tests KafkaPolicies

Failure Output

FAIL: Unable to restart unmanaged pods with 'kubectl -n kube-system delete pods coredns-8cfc78c54-5cgdk': Exitcode: 1

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

brb

My changes LGTM, thanks!

test/l4lb,nat46x64: Replace Kind/Helm with DinD #22653 -- test/l4lb,nat46x64: Replace Kind/Helm with DinD (@brb)

I will do a manual backport of these.

gandro

My commit (and the Hubble changes) looks good, thanks!

sayboras

Looks good for my commit, thanks 💯

joestringer · 2022-12-22T01:28:46Z

CI triage:

Cilium L4LB XDP:
- Failing due to Per-node configuration overrides (v2) #22656 . It looks like some combination of (a) Backporting test/l4lb,nat46x64: Replace Kind/Helm with DinD #22653 across (to be done by @brb) and/or porting the L4LB testing changes across to the v1.13 version of the github workflow should resolve this. Filed CI: v1.13: Cilium L4LB XDP (ci-l4lb-1.13) fails Run LoadBalancing test: timed out waiting for the condition on pods #22833 to follow up.
ConformanceAKS, ConformanceEKS, ConformanceGKE:
- During install, hits Error: Unable to install Cilium: roles.rbac.authorization.k8s.io "cilium-config-agent" already exists. Fail sdue to the changes in Per-node configuration overrides (v2) #22656 . These will be resolved by .github: Clean up RBAC artifacts for v1.13 CI #22823.
k8s-1.16-kernel-4.9
- Hit known flake CI: K8sFQDNTest Restart Cilium validate that FQDN is still working #20998
k8s-1.21-kernel-4.9
- Hit known flake CI: K8sServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) Tests NodePort inside cluster (kube-proxy) #21279
runtime
- Hit known flake CI: RuntimeDatapathPrivilegedUnitTests Run Tests / TestControlPlane/IdentityGC #22470. I've marked test: Keep trying exec if killed #22570 and operator: Wait for informers to shut down when stopping #22761 as needing backport in an attempt to make this test more reliable.

squeed and others added 30 commits December 21, 2022 16:34

daemon: Do not remove PERM L2 entries in L4LB

f9318f5

[ upstream commit ced77b3 ] In the L4LB mode, the PERM L2 entries are managed by users. So, in this mode Cilium should not mess with them. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io>

clustermesh: Add helper functions to set/get cluster configuration

dad8659

[ upstream commit b24973e ] Add hepler functions to set/get cluster information on kvstore. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

gha: Add retry mechanism for conformance ingress (shared)

093a1be

[ upstream commit 303d2e7 ] This is to make sure http retry was done as part of the test. Fixes: #21710 Fixes: #21993 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>

policy: Cache redirect status on PerSelectorPolicy

c571a77

[ upstream commit 27861ae ] Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

test/helpers: Use switch case for readability

5e5b17b

[ upstream commit bf3fd5f ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

gh/workflows: Enable encryption tests in conformance DP

4e3097e

[ upstream commit 843c072 ] The encryption test details - cilium/cilium-cli#1241. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io>

CRDs: add CiliumNodeConfig CRD and scaffolding

cf53659

[ upstream commit 884ccc1 ] This generates the CiliumNodeConfig type, a new way to set configuration overrides on a set of nodes. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>

joestringer requested review from bimmlerd, borkmann, brb, christarazi, gandro, joamaki, jrajahalme, julianwiedmann, pchaigno, sayboras, squeed, tklauser and tommyp1ckles December 21, 2022 03:57

joestringer mentioned this pull request Dec 21, 2022

.github: Clean up RBAC artifacts for v1.13 CI #22823

Merged

joestringer changed the title ~~pr/v1.13 backport 2022 12 21~~ v1.13 backports 2022-12-21 Dec 21, 2022

christarazi approved these changes Dec 21, 2022

View reviewed changes

brb approved these changes Dec 21, 2022

View reviewed changes

julianwiedmann approved these changes Dec 21, 2022

View reviewed changes

aspsk approved these changes Dec 21, 2022

View reviewed changes

gandro approved these changes Dec 21, 2022

View reviewed changes

sayboras approved these changes Dec 21, 2022

View reviewed changes

nbusseneau approved these changes Dec 21, 2022

View reviewed changes

joestringer mentioned this pull request Dec 22, 2022

CI: v1.13: Cilium L4LB XDP (ci-l4lb-1.13) fails Run LoadBalancing test: timed out waiting for the condition on pods #22833

Closed

tommyp1ckles approved these changes Dec 22, 2022

View reviewed changes

joestringer merged commit 698cb2a into v1.13 Dec 22, 2022

joestringer deleted the pr/v1.13-backport-2022-12-21 branch December 22, 2022 02:25

joestringer mentioned this pull request Dec 22, 2022

Prepare for release v1.13.0-rc4 #22851

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.13 backports 2022-12-21 #22822

v1.13 backports 2022-12-21 #22822

joestringer commented Dec 21, 2022

joestringer commented Dec 21, 2022

joestringer commented Dec 21, 2022 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

Test Name

Failure Output

brb left a comment

gandro left a comment •

edited

sayboras left a comment

joestringer commented Dec 22, 2022

v1.13 backports 2022-12-21 #22822

v1.13 backports 2022-12-21 #22822

Conversation

joestringer commented Dec 21, 2022

joestringer commented Dec 21, 2022

joestringer commented Dec 21, 2022 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

Test Name

Failure Output

brb left a comment

Choose a reason for hiding this comment

gandro left a comment • edited

Choose a reason for hiding this comment

sayboras left a comment

Choose a reason for hiding this comment

joestringer commented Dec 22, 2022

joestringer commented Dec 21, 2022 •

edited by maintainer-s-little-helper bot

gandro left a comment •

edited