Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for release v1.14.11 #639

Closed
wants to merge 259 commits into from
Closed

Prepare for release v1.14.11 #639

wants to merge 259 commits into from

Conversation

aanm
Copy link
Owner

@aanm aanm commented Apr 29, 2024

Summary of Changes

Bugfixes:

CI Changes:

Misc Changes:

Other Changes:

nathanjsweet and others added 30 commits February 21, 2024 16:58
[ upstream commit 27430d4 ]

This bitwise lpm trie is a non-thread-safe binary
trie that indexes arbitrarily long bit-based keys
with associated prefixes indexed from most
significant bit to least significant bit using
the longest prefix match algorithm.

Documenting the behavior of the datastructure is
localized around the method calls in the trie.go
file.

The tests specifically test boundary cases for the
various methods and fuzzes the RangeLookup method.

Updating CODEOWNERS to put sig-policy and ipcache
in charge of this library.

Fixes: cilium#29519

Co-authored-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit b19321e ]

This commit updates the Ariane configuration to include the GitHub organization team 'organization-members' in the list of allowed teams.
Consequently, only members of this specific team will have the authorization to initiate test runs via issue comments.

Signed-off-by: Birol Bilgin <birol@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 9916824 ]

Only record an old entry in ChangeState if it existed before this round
of changes. We do this by testing if the entry is already in Adds. If
not, then we record the old entry key and value. If the Adds entry
exists, however, this entry may have only been added on this round of
changes and we do not record the old value. This is safe due to the fact
that when the Adds entry is created, the Old value is stored before
adding the Adds entry, so for the first Adds entry the Old value does not
yet exist and will be added.

This removes extraneous Old entries that did not actually originally
exist. Before this ChangeState.Revert did restore an entry the should not
exists based on these extraneous Old entries.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 2853c52 ]

[ backporter's notes: obtaining the pre-SNAT address is a bit more
  complicated in the v1.14 code base ...]

When applying SNAT to a packet, also report the original source address in
the subsequent trace event. This helps to associate the internal and
external view of a connection.

We use the `orig_addr` field in the trace event, which was originally
introduced back with b3aa583 ("bpf: Report original source IP in TRACE_TO_LXC")

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 1113d70 ]

This helps to clarify the exact origin of a TO_NETWORK trace event.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Now that a known interoperability issue between external kvstore and
wireguard has been fixed [1], let also switch the last conformance
clustermesh matrix entry to use the external kvstore, for symmetry
with v1.15 and main.

[1]: 2e7a1c3 ("node: Fix inconsistent EncryptKey index handling")

Fixes: a5de29e ("gha: extend conformance clustermesh to also cover external kvstores")
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
…21.111541

Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
Looks like I missed some parts when resolving conflicts in the backport for
1113d70.

Fixes: d086a71 ("bpf: nodeport: populate ifindex in NAT trace event")
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 32543a4 ]

In Go 1.22, slices.CompactFunc will clear the slice elements that got
discarded. This makes TestSortedUniqueFunc fail if it is run in
succession to other tests modifying the input slice.

Avoid this case by not modifying the input slice in the test case but
make a copy for the sake of the test.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 3441800 ]

In Go 1.22, slices.Delete will clear the slice elements that got
discarded. This leads to the slice containing the existing ranges in
(*LBIPAM).handlePoolModified to be cleared while being looped over,
leading to the following nil dereference in TestConflictResolution:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃   PANIC  package: github.com/cilium/cilium/operator/pkg/lbipam • TestConflictResolution   ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a8c814]

goroutine 22 [running]:
testing.tRunner.func1.2({0x1d5e400, 0x39e3fe0})
	/home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1631 +0x1c4
testing.tRunner.func1()
	/home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1634 +0x33c
panic({0x1d5e400?, 0x39e3fe0?})
	/home/travis/.gimme/versions/go1.22.0.linux.arm64/src/runtime/panic.go:770 +0x124
github.com/cilium/cilium/operator/pkg/lbipam.(*LBRange).EqualCIDR(0x400021d260?, {{0x24f5388?, 0x3fce4e0?}, 0x400012c018?}, {{0x1ea5e20?, 0x0?}, 0x400012c018?})
	/home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/range_store.go:151 +0x74
github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolModified(0x400021d260, {0x24f5388, 0x3fce4e0}, 0x40000ed200)
	/home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:1392 +0xfa0
github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).poolOnUpsert(0x400021d260, {0x24f5388, 0x3fce4e0}, {{0xffff88e06108?, 0x10?}, {0x4000088808?, 0x40003ea910?}}, 0x40000ed080?)
	/home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:279 +0xe0
github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolEvent(0x400021d260, {0x24f5388?, 0x3fce4e0?}, {{0x214e78e, 0x6}, {{0x400034d1d8, 0x6}, {0x0, 0x0}}, 0x40000ed080, ...})
	/home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:233 +0x1d8
github.com/cilium/cilium/operator/pkg/lbipam.(*newFixture).UpsertPool(0x40008bfe18, 0x40002a4b60, 0x40000ed080)
	/home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam_fixture_test.go:177 +0x148
github.com/cilium/cilium/operator/pkg/lbipam.TestConflictResolution(0x40002a4b60)
	/home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam_test.go:56 +0x3fc
testing.tRunner(0x40002a4b60, 0x22a2558)
	/home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1689 +0xec
created by testing.(*T).Run in goroutine 1
	/home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1742 +0x318
FAIL	github.com/cilium/cilium/operator/pkg/lbipam	0.043s

Fix this by cloning the slice before iterating over it.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit cb15333 ]

When endpoint is created and `EndpointChangeRequest`
contains labels, it might cause the endpoint regeneration to not be
triggered as it is only triggered when labels are changed.
Unfortunately this does not happen when epTemplate.Labels are set
with the same labels as `EndpointChangeRequest`.

This commit fixes the above issue by not setting epTemplate.Labels.

Fixes: cilium#29776

Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz>
[ upstream commit 329fefb ]

[ backporter's note: Fix minor conflict due to the
  c.BGPMgr.ConfigurePeers fixture change. ]

Controller generate a log for every single reconciliation. This is noisy
and doesn't make much sense since users doesn't care about
reconciliation happening, but the outcome of the reconciliation.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 4c5f79d ]

[ backporter's note: Initialize LocalNodeStore on test init and
deinitialize on test deinit. ]

When users stop selecting the node with CiliumBGPPeeringPolicy, BGP
Control Plane removes all running virtual router instances. However, it
is only notified with Debug level. Upgrade it to Info level since this
is an important information which helps users to investigate session
disruption with configuration miss.

Also, the log is generated and full reconciliation happens even if there
is no previous policy applied. This means when there's no policy applied
and any relevant resource (e.g. Service) is updated, it will generate
the log and does full withdrawal meaninglessly. Introduce a flag that
indicates whether there is a previous policy and conditionally trigger
log generation and full withdrawal.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 66e5de6 ]

[ backporter's note: neighbor.go is still under
pkg/bgpv1/manager/. Do the same change for
pkg/bgpv1/manager/reconcile.go. ]

Remove noisy logs generated for every single reconciliation.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit c00330c ]

[ backporter's note: neighbor.go is still under
pkg/bgpv1/manager/. Do the same change for
pkg/bgpv1/manager/reconcile.go. ]

We don't need to show create/update/delete counts because we show logs
for all create/update/delete operation anyways.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 148f81f ]

Users can now easily check the current peering state with `cilium bgp
peers` command. Thus state transition logs become relatively unimportant
for users. Downgrade the logs to debug level.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 29a7918 ]

On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: cilium#30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
[ upstream commit 9620979 ]

[ backporter's note: Had a following conflict

++<<<<<<< HEAD
++=======
+       // If the endpoint is in an 'init' state we need to remove this label
+       // regardless of the "sourceFilter". Otherwise, we face risk of leaving the
+       // endpoint with the reserved:init state forever.
+       // We will perform the replacement only if:
+       // - there are new identity labels being added;
+       // - the sourceFilter is not any; If it is "any" then it was already
+       //   replaced by the previous replaceIdentityLabels call.
+       // - the new identity labels don't contain the reserved:init label
+       // - the endpoint is in this init state.
+       if len(identityLabels) != 0 &&
+               sourceFilter != labels.LabelSourceAny &&
+               !identityLabels.Has(labels.NewLabel(labels.IDNameInit, "", labels.LabelSourceReserved)) &&
+               e.IsInit() {
+
+               idLabls := e.OpLabels.IdentityLabels()
+               delete(idLabls, labels.IDNameInit)
+               rev = e.replaceIdentityLabels(labels.LabelSourceAny, idLabls)
+       }
+
++>>>>>>> 4ec84be6b1 (pkg/endpoint: remove reserved:init from endpoints)

Took a 4ec84be6b1's one.

]

Previously, a bug introduced in e43b759 caused the 'reserved:init'
label to persist even after an endpoint received its security identity
labels. This resulted in endpoints being unable to send or receive any
network traffic. This fix ensures that the 'reserved:init' label is
properly removed once initialization is complete.

Fixes: e43b759 ("pkg/endpoint: keep endpoint labels for their original sources")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 6fee46f ]

[ backporter's note:
  - e2e upgrade test doesn't exist in this branch. Removed it.
  - Minor conflict in tests-clustermesh-upgrade.yaml

++<<<<<<< HEAD
 +          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
 +            SHA="${{ inputs.SHA }}"
 +          else
 +            SHA="${{ github.sha }}"
 +          fi
++=======
+           CILIUM_DOWNGRADE_VERSION=$(contrib/scripts/print-downgrade-version.sh stable)
+           echo "downgrade_version=${CILIUM_DOWNGRADE_VERSION}" >> $GITHUB_OUTPUT
++>>>>>>> 8c3b175f5d (ci/ipsec: Fix downgrade version retrieval)

]

Figuring out the right "previous patch release version number" to
downgrade to in print-downgrade-version.sh turns out to be more complex
than expected [0][1][2][3].

This commit is an attempt to 1) fix issues with the current script and
2) overall make the script clearer, so we can avoid repeating these
mistakes.

As for the fixes, there are two things that are not correct with the
current version. First, we're trying to validate the existence of the
tag to downgrade to, in case the script runs on top of a release
preparation commit for which file VERSION has been updated to a value
that does not yet contains a corresponding tag. This part of the script
is actually OK, but not the way we call it in the IPsec workflow: we use
"fetch-tags: true" but "fetch-depth: 0" (the default), and the two are
not compatible, a shallow clone results in no tags being fetched.

To address this, we retrieve the tag differently: instead of relying on
"fetch-tags" from the workflow, we call "git fetch" from the script
itself, provided the preconditions are met (we only run it from a Git
repository, if the "origin" remote is defined). If the tag exists,
either locally or remotely, then we can use it. Otherwise, the script
considers that it runs from a release preparation Pull Request, and
decrements the patch release number.

The second issue is that we would return no value from the script if the
patch release is zero. This is to avoid any attempt to find a previous
patch release when working on a development branch. However, this logics
is incorrect (it comes from a previous version of the script where we
would always decrement the patch number). After the first release of a
new minor version, it's fine to have a patch number at 0. What we should
check instead is whether the version ends with "-dev".

This commit brings additional changes for clarity: more comments, and a
better separation between the "get latest patch release" and "get
previous stable branch" cases, moving the relevant code to independent
functions, plus better argument handling. We also edit the IPsec
workflow to add some logs about the version retrieved. The logs should
also display the script's error messages, if any, that are printed to
stderr.

Sample output from the script:

    VERSION     Tag exists  Prevous minor   Previous patch release

    1.14.3      Y           v1.13           v1.14.3
    1.14.1      Y           v1.13           v1.14.1
    1.14.0      Y           v1.13           v1.14.0
    1.14.1-dev  N           v1.13           <error>
    1.15.0-dev  N           v1.14           <error>
    1.13.90     N           v1.12           v1.13.89  <- decremented
    2.0.0       N           <error>         <error>
    2.0.1       N           <error>         v2.0.0    <- decremented
    2.1.1       N           v2.0            v2.1.0    <- decremented

[0] 56dfec2 ("contrib/scripts: Support patch releases in print-downgrade-version.sh")
[1] 4d7902f ("contrib/scripts: Remove special handling for patch release number 90")
[2] 5581963 ("ci/ipsec: Fix version retrieval for downgrades to closest patch release")
[3] 3803f53 ("ci/ipsec: Fix downgrade version for release preparation commits")

Fixes: 3803f53 ("ci/ipsec: Fix downgrade version for release preparation commits")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
[ upstream commit cfb1158 ]

The --cluster-name flag got removed in cilium/cilium-cli#2351.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
renovate bot and others added 23 commits April 22, 2024 10:07
Signed-off-by: renovate[bot] <bot@renovateapp.com>
This is mainly to address the below CVE

GHSA-3mh5-6q8v-25wj

Related release: https://github.com/envoyproxy/envoy/releases/tag/v1.27.5

Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit 100e625 ]

This prevents possible shenanigans caused by search domains possibly
configured on the runner, and propagated to the pods.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 804d5f0 ]

When number of concurrent dns requests was moderately high, there was a
chance that some of the gorutines would get stuck waiting for response.
Contains fix from cilium/dns#10

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit c869e6c ]

Do not return an error from xds server when the context is cancelled, as
this is part of normal operation, and we test for this in
server_e2e_test.

This resolves a test flake:

panic: Fail in goroutine after  has completed

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 3cde59c ]

The main test goroutine might be completed before checks on the server
goroutine are completed, hence cause the below panic issue. This commit
is to defer the streamDone channel close to make sure the error check on
the stream server is done before returning from the test. We keep the
time check on the wait in the end of each test to not stall the tests in
case the stream server fails to exit.

Panic error
```
panic: Fail in goroutine after Test/ServerSuite/TestRequestStaleNonce has completed
```

Testing was done as per below:
```
$ go test -count 500 -run Test/ServerSuite/TestRequestStaleNonce ./pkg/envoy/xds/...
ok      github.com/cilium/cilium/pkg/envoy/xds  250.866s
```

Fixes: cilium#31855
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit df6afbd ]

[ backporter's note: moved changes from pkg/envoy/xds/stream_test.go to
pkg/envoy/xds/stream.go as v1.14 doesn't have the former file ]

Return io.EOF if test channel was closed, rather than returning a nil
request. This mimics the behavior of generated gRPC code, which never
returns nil request with a nil error.

This resolves a test flake with this error logs:

time="2024-04-16T08:46:23+02:00" level=error msg="received nil request from xDS stream; stopping xDS stream handling" subsys=xds xdsClientNode="node0~10.0.0.0~node0~bar" xdsStreamID=1

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit d3c1fee ]

Speed up tests by eliminating CacheUpdateDelay, as it is generally not
needed.

When needed, replace with IsCompletedInTimeChecker that waits for upto
MaxCompletionDuration before returning, in contrast with
IsCompletedChecker that only returns the current state without any wait.

This change makes the server_e2e_test tests run >1000x faster.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 4efe9dd ]

TestRequestStaleNonce test code was written with the assumption that no
response would be reveived for a request with a stale nonce, and a second
SendRequest was done right after with the correct nonce value. This
caused two responses to be returned, and the first one could have been
with the old version of the resources.

Remove this duplicate SendRequest.

This resolves test flakes like this:

        --- FAIL: Test/ServerSuite/TestRequestStaleNonce (0.00s)
            server_e2e_test.go:784:
                ... response *discoveryv3.DiscoveryResponse = &discoveryv3.DiscoveryResponse{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), VersionInfo:"3", Resources:[]*anypb.Any{(*anypb.Any)(0x40003a63c0), (*anypb.Any)(0x40003a6410)}, Canary:false, TypeUrl:"type.googleapis.com/envoy.config.v3.DummyConfiguration", Nonce:"3", ControlPlane:(*corev3.ControlPlane)(nil)} ("version_info:\"3\" resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource1\"}} resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource0\"}} type_url:\"type.googleapis.com/envoy.config.v3.DummyConfiguration\" nonce:\"3\"")
                ... VersionInfo string = "4"
                ... Resources []protoreflect.ProtoMessage = []protoreflect.ProtoMessage{(*routev3.RouteConfiguration)(0xe45380)}
                ... Canary bool = false
                ... TypeUrl string = "type.googleapis.com/envoy.config.v3.DummyConfiguration"

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 75d144c ]

Stream timeout is a duration we use in tests to make sure the stream does
not stall for too long. In production we do not have such a timeout at
all, and in fact the requests are long-lived and responses are only sent
when there is something (new) to send.

Test stream timeout was 2 seconds, and it would occasionally cause a test
flake, especially if debug logging is enabled. This seems to happen due
to goroutine scheduling, and for this reason debug logging should not be
on for these tests.

Bump the test stream timeout to 4 seconds to further reduce the chance of
a test flake due to it.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 350e9d3 ]

[ backporter's note: minor conflicts in
.github/workflows/tests-clustermesh-upgrade.yaml ]

The goal being to slow down the rollout process, to better highlight
possible connection disruption occurring in the meanwhile. At the same
time, this also reduces the overall CPU load caused by datapath
recompilation, which is a possible additional cause for connection
disruption flakiness.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit 7d2505e ]

[ backporter's note: minor conflicts in
.github/workflows/tests-clustermesh-upgrade.yaml ]

The default IPAM mode is cluster-pool, which gets automatically
overwritten by the Cilium CLI to kubernetes when running on kind.
However, the default helm value gets restored upon upgrade due to
--reset-values, causing confusion and possible issues. Hence, let's
explicitly configure it to kubernetes, to prevent changes.

Similarly, let's configure a single replica for the operator.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit a0d7d37 ]

So that it gets actually executed.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 1e28a10 ]

[ backporter's note: minor conflicts in
.github/workflows/tests-clustermesh-upgrade.yaml ]

Hubble relay is not deployed in this workflow, hence it doesn't make
sense to wait for the image availability.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit 0c211e1 ]

[ backporter's note: minor conflicts in
.github/workflows/tests-clustermesh-upgrade.yaml ]

As it simplifies troubleshooting possible connection disruptions.
However, let's configure monitor aggregation to medium (i.e., the
maximum, and default value) to avoid the performance penalty due
to the relatively high traffic load.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit 9dc89f7 ]

Fixes: 5c06c8e ("ci-eks: Add IPsec key rotation tests")
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit f61651f ]

[ Backporter's notes: Resolved conflict by moving metric register code to old function ]

This was broken during transition of pkg/metrics to integrate with Hive where relevant operator metrics where never initialized.
This adds a init func specific for operator and cleans up the "flush" logic used as a work around for errors/warnings emitted prior to agent starting (in the case of the operator).

Addresses: cilium#29525

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: renovate[bot] <bot@renovateapp.com>
Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
[ upstream commit: 7d278af ]

[ backporter's note: v1.14 uses bpf/init.sh to install proxy rules so we
have to do a customized backport. ]

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit: 53133ff ]

[ backporter's note: v1.14 uses bpf/init.sh to install proxy rules so we
have to do a customized backport. ]

Although we don't install fromEgressProxyRule for now, this commit
insists on removing it to make sure further downgrade can go smoothly.

Soon We'll have another PR to install fromEgressProxyRule, and cilium
downgrade from that PR to branch tip (patch downgrade, 1.X.Y ->
1.X.{Y-1}) will be broken if we don't handle the new ip rule carefullly.

Without this patch, downgrade from higher version will leave
fromEgressProxyRule on the lower version cilium, cluster will be in a
wrong status of "having stale ip rule + not having other necessary
settings (iptables)", breaking the connectivity.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm temporarily deployed to release-base-images April 29, 2024 15:00 — with GitHub Actions Inactive
@aanm aanm closed this Apr 29, 2024
@aanm aanm deleted the pr/prepare-v1.14.11 branch April 29, 2024 15:10
@aanm aanm restored the pr/prepare-v1.14.11 branch April 29, 2024 15:11
@aanm aanm deleted the pr/prepare-v1.14.11 branch April 29, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet