Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.15 Backports 2024-03-19 #31490

Merged
merged 24 commits into from Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
0db73f1
controlplane: fix panic: send on closed channel
bimmlerd Feb 22, 2024
dc76693
controlplane: add mechanism to wait for watchers
bimmlerd Feb 22, 2024
a5fc7a6
controlplane: wait for watcher establishment
bimmlerd Feb 22, 2024
1ef26a1
job: avoid a race condition in ExitOnCloseFnCtx
bimmlerd Feb 23, 2024
3a71d40
controlplane: fix mechanism for ensuring watchers
bimmlerd Feb 28, 2024
42ef0c1
Handle InvalidParameterValue as well for PD fallback
hemanthmalla Feb 27, 2024
7fef7f5
Adding unit test for PD fallback
hemanthmalla Feb 28, 2024
171b64a
slices: don't modify missed input slice in test
bimmlerd Mar 4, 2024
30dbb6b
bgpv1: avoid object tracker vs informer race
bimmlerd Feb 27, 2024
08b7d91
doc: Clarified GwAPI KPR prerequisites
PhilipSchmid Mar 13, 2024
7733839
helm: Add pod affinity for cilium-envoy
sayboras Mar 5, 2024
5f194fa
bgpv1: fix Test_PodIPPoolAdvert flakiness
rastislavs Mar 13, 2024
a1e7b6b
bpf, maps: Don't propagate nodeID to bpf map when allocation fails.
marseel Mar 13, 2024
46d3c06
policy: Fix missing labels from SelectorCache selectors
christarazi Mar 12, 2024
b606421
ci: Bump lvh-kind ssh-startup-wait-retries
YutaroHayakawa Mar 14, 2024
d7d315e
datapath: Remove unnecessary IPsec code
pchaigno Mar 12, 2024
349478e
operator: fix errors/warnings metric.
tommyp1ckles Mar 7, 2024
3fa18da
loader: add message if error is ENOTSUP
kkourt Mar 15, 2024
9edeb88
ipam: fix azure ipam test panics due to shared pointers.
tommyp1ckles Jul 21, 2023
a95ca7c
gateway-api: Retrieve LB service from same namespace
sayboras Mar 10, 2024
6f827e2
ci-e2e: Add matrix for bpf.tproxy
sayboras Mar 10, 2024
b2f8e8d
gha: disable fail-fast on integration tests
giorio94 Mar 15, 2024
5738e81
hive/cell/health: don't warn when reporting on stopped reporter.
tommyp1ckles Mar 8, 2024
0ef35be
docs: Warn on key rotations during upgrades
pchaigno Mar 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/actions/lvh-kind/action.yaml
Expand Up @@ -28,6 +28,7 @@ runs:
mem: 12G
install-dependencies: 'true'
port-forward: '6443:6443'
ssh-startup-wait-retries: 600
cmd: |
git config --global --add safe.directory /host

Expand Down
11 changes: 11 additions & 0 deletions .github/workflows/conformance-e2e.yaml
Expand Up @@ -234,6 +234,17 @@ jobs:
lb-acceleration: 'testing-only'


- name: '15'
# renovate: datasource=docker depName=quay.io/lvh-images/kind
kernel: 'bpf-next-20240307.011705'
kube-proxy: 'none'
kpr: 'true'
devices: '{eth0,eth1}'
secondary-network: 'true'
tunnel: 'disabled'
ingress-controller: 'true'
misc: 'bpf.tproxy=true'

timeout-minutes: 60
steps:
- name: Checkout context ref (trusted)
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/integration-test.yaml
Expand Up @@ -67,6 +67,7 @@ jobs:
integration-test:
name: Integration Test
strategy:
fail-fast: false
matrix:
arch: [ubuntu-22.04, ubuntu-22.04-arm64]
runs-on: ${{ matrix.arch }}
Expand Down
2 changes: 1 addition & 1 deletion Documentation/helm-values.rst

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

@@ -1,9 +1,10 @@
Prerequisites
#############

* Cilium must be configured with ``kubeProxyReplacement=true``.
Please refer to :ref:`kube-proxy replacement <kubeproxy-free>`
for more details.
* Cilium must be configured with NodePort enabled, using
``nodePort.enabled=true`` or by enabling the kube-proxy replacement with
``kubeProxyReplacement=true``. For more information, see :ref:`kube-proxy
replacement <kubeproxy-free>`.
* Cilium must be configured with the L7 proxy enabled using the ``--enable-l7-proxy`` flag (enabled by default).
* The below CRDs from Gateway API v1.0.0 ``must`` be pre-installed.
Please refer to this `docs <https://gateway-api.sigs.k8s.io/guides/?h=crds#getting-started-with-gateway-api>`_
Expand Down
6 changes: 6 additions & 0 deletions Documentation/security/network/encryption-ipsec.rst
Expand Up @@ -162,6 +162,12 @@ commands:
Key Rotation
============

.. attention::

Key rotations should not be performed during upgrades and downgrades. That
is, all nodes in the cluster (or clustermesh) should be on the same Cilium
version before rotating keys.

To replace cilium-ipsec-keys secret with a new key:

.. code-block:: shell-session
Expand Down
2 changes: 1 addition & 1 deletion install/kubernetes/cilium/README.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 14 additions & 1 deletion install/kubernetes/cilium/values.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 14 additions & 1 deletion install/kubernetes/cilium/values.yaml.tmpl
Expand Up @@ -2179,7 +2179,20 @@ envoy:
labelSelector:
matchLabels:
k8s-app: cilium-envoy

podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
k8s-app: cilium
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cilium.io/no-schedule
operator: NotIn
values:
- "true"
# -- Node selector for cilium-envoy.
nodeSelector:
kubernetes.io/os: linux
Expand Down
4 changes: 4 additions & 0 deletions operator/metrics/metrics.go
Expand Up @@ -95,5 +95,9 @@ func registerMetricsManager(p params) {
Registry.MustRegister(metric.(prometheus.Collector))
}

metrics.InitOperatorMetrics()
Registry.MustRegister(metrics.ErrorsWarnings)
metrics.FlushLoggingMetrics()

p.Lifecycle.Append(mm)
}
2 changes: 1 addition & 1 deletion operator/pkg/gateway-api/gateway_reconcile.go
Expand Up @@ -310,7 +310,7 @@ func (r *gatewayReconciler) setAddressStatus(ctx context.Context, gw *gatewayv1.
svcList := &corev1.ServiceList{}
if err := r.Client.List(ctx, svcList, client.MatchingLabels{
owningGatewayLabel: gw.GetName(),
}); err != nil {
}, client.InNamespace(gw.GetNamespace())); err != nil {
return err
}

Expand Down
69 changes: 39 additions & 30 deletions operator/pkg/gateway-api/gateway_reconcile_test.go
Expand Up @@ -39,7 +39,30 @@ var gwFixture = []client.Object{
Namespace: "default",
},
},

&corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: "cilium-gateway-valid-gateway",
Namespace: "another-namespace",
Annotations: map[string]string{
"pre-existing-annotation": "true",
},
},
Status: corev1.ServiceStatus{
LoadBalancer: corev1.LoadBalancerStatus{
Ingress: []corev1.LoadBalancerIngress{
{
IP: "10.10.10.11",
Ports: []corev1.PortStatus{
{
Port: 80,
Protocol: "TCP",
},
},
},
},
},
},
},
&corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: "cilium-gateway-valid-gateway",
Expand All @@ -48,6 +71,21 @@ var gwFixture = []client.Object{
"pre-existing-annotation": "true",
},
},
Status: corev1.ServiceStatus{
LoadBalancer: corev1.LoadBalancerStatus{
Ingress: []corev1.LoadBalancerIngress{
{
IP: "10.10.10.10",
Ports: []corev1.PortStatus{
{
Port: 80,
Protocol: "TCP",
},
},
},
},
},
},
},

// Service in another namespace
Expand Down Expand Up @@ -278,35 +316,6 @@ func Test_gatewayReconciler_Reconcile(t *testing.T) {
result, err := r.Reconcile(context.Background(), ctrl.Request{NamespacedName: key})

// First reconcile should wait for LB status
require.Error(t, err)
require.Equal(t, "load balancer status is not ready", err.Error())
require.Equal(t, ctrl.Result{}, result)

// Simulate LB service update
lb := &corev1.Service{}
err = c.Get(context.Background(), client.ObjectKey{Namespace: "default", Name: "cilium-gateway-valid-gateway"}, lb)
require.NoError(t, err)
require.Equal(t, corev1.ServiceTypeLoadBalancer, lb.Spec.Type)
require.Equal(t, "valid-gateway", lb.Labels["io.cilium.gateway/owning-gateway"])
require.Equal(t, "true", lb.Annotations["pre-existing-annotation"])

// Update LB status
lb.Status.LoadBalancer.Ingress = []corev1.LoadBalancerIngress{
{
IP: "10.10.10.10",
Ports: []corev1.PortStatus{
{
Port: 80,
Protocol: "TCP",
},
},
},
}
err = c.Status().Update(context.Background(), lb)
require.NoError(t, err)

// Perform second reconciliation
result, err = r.Reconcile(context.Background(), ctrl.Request{NamespacedName: key})
require.NoError(t, err)
require.Equal(t, ctrl.Result{}, result)

Expand Down
4 changes: 4 additions & 0 deletions pkg/alibabacloud/eni/types/types.go
Expand Up @@ -126,6 +126,10 @@ type ENI struct {
Tags map[string]string `json:"tags,omitempty"`
}

func (e *ENI) DeepCopyInterface() types.Interface {
return e.DeepCopy()
}

// InterfaceID returns the identifier of the interface
func (e *ENI) InterfaceID() string {
return e.NetworkInterfaceID
Expand Down
12 changes: 12 additions & 0 deletions pkg/aws/ec2/ec2.go
Expand Up @@ -27,6 +27,18 @@ import (
"github.com/cilium/cilium/pkg/spanstat"
)

const (
SubnetFullErrMsgStr = "There aren't sufficient free Ipv4 addresses or prefixes"

// InsufficientPrefixesInSubnetStr AWS error code for insufficient /28 prefixes in a subnet, possibly due to
// fragmentation
InsufficientPrefixesInSubnetStr = "InsufficientCidrBlocks"

// InvalidParameterValueStr sort of catch-all error code from AWS to indicate request params are invalid. Often,
// requires looking at the error message to get the actual reason. See SubnetFullErrMsgStr for example.
InvalidParameterValueStr = "InvalidParameterValue"
)

// Client represents an EC2 API client
type Client struct {
ec2Client *ec2.Client
Expand Down
7 changes: 6 additions & 1 deletion pkg/aws/ec2/mock/mock.go
Expand Up @@ -9,6 +9,7 @@ import (
"net"

"github.com/cilium/cilium/pkg/api/helpers"
"github.com/cilium/cilium/pkg/aws/ec2"
eniTypes "github.com/cilium/cilium/pkg/aws/eni/types"
"github.com/cilium/cilium/pkg/aws/types"
"github.com/cilium/cilium/pkg/ip"
Expand All @@ -19,6 +20,7 @@ import (
"github.com/cilium/cilium/pkg/lock"
"github.com/cilium/cilium/pkg/time"

"github.com/aws/smithy-go"
"github.com/google/uuid"
log "github.com/sirupsen/logrus"
"golang.org/x/time/rate"
Expand Down Expand Up @@ -424,7 +426,10 @@ func assignPrefixToENI(e *API, eni *eniTypes.ENI, prefixes int32) error {
}

if int(prefixes)*option.ENIPDBlockSizeIPv4 > subnet.AvailableAddresses {
return fmt.Errorf("subnet %s has not enough addresses available", eni.Subnet.ID)
return &smithy.GenericAPIError{
Code: ec2.InvalidParameterValueStr,
Message: ec2.SubnetFullErrMsgStr,
}
}

for i := int32(0); i < prefixes; i++ {
Expand Down
11 changes: 6 additions & 5 deletions pkg/aws/eni/node.go
Expand Up @@ -19,6 +19,7 @@ import (
"github.com/sirupsen/logrus"

operatorOption "github.com/cilium/cilium/operator/option"
"github.com/cilium/cilium/pkg/aws/ec2"
"github.com/cilium/cilium/pkg/aws/eni/limits"
eniTypes "github.com/cilium/cilium/pkg/aws/eni/types"
"github.com/cilium/cilium/pkg/defaults"
Expand All @@ -39,9 +40,6 @@ const (

getMaximumAllocatableIPv4FailureWarningStr = "maximum allocatable ipv4 addresses will be 0 (unlimited)" +
" this could lead to ip allocation overflows if the max-allocate flag is not set"

// insufficientPrefixesInSubnetStr AWS error code for insufficient /28 prefixes in a subnet
insufficientPrefixesInSubnetStr = "InsufficientCidrBlocks"
)

type ipamNodeActions interface {
Expand Down Expand Up @@ -266,7 +264,9 @@ func (n *Node) PrepareIPAllocation(scopedLog *logrus.Entry) (a *ipam.AllocationA
func isSubnetAtPrefixCapacity(err error) bool {
var apiErr smithy.APIError
if errors.As(err, &apiErr) {
return apiErr.ErrorCode() == insufficientPrefixesInSubnetStr
return apiErr.ErrorCode() == ec2.InsufficientPrefixesInSubnetStr ||
(apiErr.ErrorCode() == ec2.InvalidParameterValueStr &&
strings.Contains(apiErr.ErrorMessage(), ec2.SubnetFullErrMsgStr))
}
return false
}
Expand Down Expand Up @@ -352,7 +352,8 @@ func (n *Node) errorInstanceNotRunning(err error) (notRunning bool) {
func isAttachmentIndexConflict(err error) bool {
var apiErr smithy.APIError
if errors.As(err, &apiErr) {
return apiErr.ErrorCode() == "InvalidParameterValue" && strings.Contains(apiErr.ErrorMessage(), "interface attached at device")
return apiErr.ErrorCode() == ec2.InvalidParameterValueStr &&
strings.Contains(apiErr.ErrorMessage(), "interface attached at device")
}
return false
}
Expand Down
20 changes: 19 additions & 1 deletion pkg/aws/eni/node_manager_test.go
Expand Up @@ -160,7 +160,8 @@ func (e *ENISuite) TestNodeManagerDefaultAllocation(c *check.C) {
func (e *ENISuite) TestNodeManagerPrefixDelegation(c *check.C) {
const instanceID = "i-testNodeManagerDefaultAllocation-0"

ec2api := ec2mock.NewAPI([]*ipamTypes.Subnet{testSubnet}, []*ipamTypes.VirtualNetwork{testVpc}, testSecurityGroups)
pdTestSubnet := *testSubnet
ec2api := ec2mock.NewAPI([]*ipamTypes.Subnet{&pdTestSubnet}, []*ipamTypes.VirtualNetwork{testVpc}, testSecurityGroups)
instances := NewInstancesManager(ec2api)
c.Assert(instances, check.Not(check.IsNil))
eniID1, _, err := ec2api.CreateNetworkInterface(context.TODO(), 0, "s-1", "desc", []string{"sg1", "sg2"}, true)
Expand Down Expand Up @@ -198,6 +199,23 @@ func (e *ENISuite) TestNodeManagerPrefixDelegation(c *check.C) {
totalPrefixes += len(eni.Prefixes)
}
c.Assert(totalPrefixes, check.Equals, 2)

// Test fallback to /32 IPs when /28 blocks aren't available
//
// Set available IPs to a value insufficient to allocate a /28 block, but enough for /32 IPs to resolve
// pre-allocate deficit.
pdTestSubnet.AvailableAddresses = 15
ec2api.UpdateSubnets([]*ipamTypes.Subnet{&pdTestSubnet})

// Use 25 out of 32 IPs
mngr.Upsert(updateCiliumNode(cn, 32, 25))
c.Assert(testutils.WaitUntil(func() bool { return reachedAddressesNeeded(mngr, "node1", 0) }, 5*time.Second), check.IsNil)

node = mngr.Get("node1")
c.Assert(node, check.Not(check.IsNil))
// Should allocate only 1 additional IP after fallback, not an entire /28 prefix
c.Assert(node.Stats().AvailableIPs, check.Equals, 33)
c.Assert(node.Stats().UsedIPs, check.Equals, 25)
}

// TestNodeManagerENIWithSGTags tests ENI allocation + association with a SG based on tags
Expand Down
4 changes: 4 additions & 0 deletions pkg/aws/eni/types/types.go
Expand Up @@ -208,6 +208,10 @@ type ENI struct {
Tags map[string]string `json:"tags,omitempty"`
}

func (e *ENI) DeepCopyInterface() types.Interface {
return e.DeepCopy()
}

// InterfaceID returns the identifier of the interface
func (e *ENI) InterfaceID() string {
return e.ID
Expand Down
4 changes: 4 additions & 0 deletions pkg/azure/types/types.go
Expand Up @@ -126,6 +126,10 @@ type AzureInterface struct {
resourceGroup string `json:"-"`
}

func (a *AzureInterface) DeepCopyInterface() types.Interface {
return a.DeepCopy()
}

// SetID sets the Azure interface ID, as well as extracting other fields from
// the ID itself.
func (a *AzureInterface) SetID(id string) {
Expand Down