-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.15 Backports 2024-01-02 #30079
Merged
dylandreimerink
merged 25 commits into
cilium:v1.15
from
pippolo84:pr/v1.15-backport-2024-01-02
Jan 12, 2024
Merged
v1.15 Backports 2024-01-02 #30079
dylandreimerink
merged 25 commits into
cilium:v1.15
from
pippolo84:pr/v1.15-backport-2024-01-02
Jan 12, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit 4838ca2 ] Currently, the test `TestGetIdentity` fails with the following error. ``` --- FAIL: TestGetIdentity (0.85s) --- FAIL: TestGetIdentity/Multiple_identities (0.26s) identity_test.go:226: Identity not found in the store ``` The reason is that the watch of the backend (started in its own goroutine) might not be ready when the test starts to create test identities. Therefore, this commit introduces that the test waits with its continuation until the watch is ready. Fixes: cilium#23856 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 8c2dfda ] This commit changes the log fields for the "stale identity observed" message to include the full context (i.e. trace observation point, source and destination addresses etc) for identifying which flow caused the message to be emitted. In addition, the "old" and "new" identity field names are replaced with the better fitting "userspace" and "datapath" terminology. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit caef2d3 ] This was reported by Paul in [1]. After investigating the sysdump, it is clear that Hubble should not complain about such packets. [1] cilium#15283 (comment) Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 4196935 ] This limits the amount of "stale identities observed" messages to one every 30 seconds. This is particularly important if monitor aggregation is disabled, as otherwise any unknown discrepancy fills the log buffers and thus makes it hard to e.g. debug CI flakes. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 0d35af0 ] To make it easy for callers, ipv4_handle_fragmentation() should always return a DROP_* reason on error. But for errors from l4_load_ports() we're currently just propagating those raw errors back. Return a drop reason instead. This also makes us consistent with the non-fragment path in ipv4_load_l4_ports(). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit cab1568 ] This changes BGP Controller's event handling from hive hooks and workerpool to hive jobs, mainly to avoid creating resource.Store in the Start() hive hook of the BGP controller. Since creating resource.Store() blocks until the store is initialized, we should avoid calling it in the Start() hive hook to not slow down the startup process and to allow progressing with the startup even when the CRD is not yet installed. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 08017c2 ] This changes event handling in BGP CP resource.Store wrappers (DiffStore, BGPCPResourceStore) from hive hooks and a goroutine to a hive job, mainly to avoid creating resource.Store in the Start() hive hook. This also makes them wrappers rather than a super set of the resource.Store, exposing only API that is actually used within the BGP CP, and allowing all methods to return an error if the store has not yet initialized. Since creating resource.Store() blocks until the store is initialized, we should avoid calling it in the Start() hive hook to not slow down the startup process and to allow progressing with the startup even when the CRD is not yet installed. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 1bf697f ] The IPsec upgrade test started timing out from time to time on main's CI. This commit bumps the timeout a bit to avoid such failures. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit a573bb4 ] Commit 10f04fd (endpoint: Resolve named ports for redirects) fixed redirect creation for L7 policies using a named port, but failed to use the resolved destination port also in proxy stats. This commit does that. Fixes: cilium#29023 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b825850 ] Modularize the agent stale CiliumEndpoint init procedure in an independent cell. The logic is functionally equivalent to the previous implementation. The tests have been refactored to use the hive and cell framework, thus implementing unit-style integration tests. As an additional benefit, this remove the direct dependency from the CEP and CES stores managed in the related k8s watchers. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit e5b1de4 ] GetIndexer and SetIndexer were meant to ease the testability of the previous non-modularized implementation of the agent endpoint GC. Since they are not needed anymore, it is possible to safely remove them. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 7f0e373 ] During external workloads initialization, the clustermesh-apiserver allocates a new identity for the external workload, which is then propagated through etcd to the agent running on the external workload; yet, the agent eventually overrides it with the default value (1). While this appears to be harmless from a functional point of view, as the clustermesh-apiserver configures the original identity as part of the associated the CiliumNode and CiliumEndpoint resources (which are then watched by the other agents), it also triggers an error, due to the unexpected mismatch: CEW: Invalid identity 1 in ... Let's fix this by avoiding to override the originally assigned identity from the external workload agent. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit a495abd ] This commit modifies calls to `send_trace_notify` in the datapath to drop trace events related to encrypted packets when monitor aggregation is enabled. More specifically, this commit ensures that whenever `send_trace_notify` is called with a `trace_reason` of `TRACE_REASON_ENCRYPTED`, the `monitor` argument is set to zero. A Coccinelle script is provided in this commit to add a build-time check for this requirement moving forward. This change helps to reduce the overall CPU usage of Cilium Agents when IPSec encryption is enabled, by reducing the number of trace events emitted by the datapath. Normally monitor aggregation can be used in order to reduce the number of trace events, however IPSec-related trace events are not able to be aggregated since they lack the necessary connection tracking information. See the function `emit_trace_notify` in `bpf/lib/trace.h` for more information. This same change was applied in `bpf/bpf_lxc.c` in commit 3e52822. Thank you to Lorenz who added a workaround for passing the verifier with Clang 10. See also: cilium#27168 Co-authored-by: Lorenz Bauer <lmb@isovalent.com> Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit e95f2e0 ] Add an initial bpf_network configuration. The defines are taken from a sysdump for a failing CI run that seems to exhibit verifier problems. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 3d3f69c ] Datapath equalness of the old and new entry must be evaluated before entries are merged, afterwards the entries will be the same and the key is not added in ChangeState.Adds, which may lead to a policy map entry not being updated during incremental updates. Fixes: cilium#26331 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit d24fa08 ] d9a4594 ("etcd: rework lease manager to additionally create sessions") appears to have caused a regression in the duration of the endpoint tests, due to a slightly different implementation of the isConnectedAndHasQuorum() etcd client method, which is in turn executed before starting a Watcher (through Connected()). The regression is the combination of multiple causes: * Incorrect endpoint tests tear down ordering, due to the interplay of TearDownTest and Cleanup; specifically, the identity allocator manager (which internally starts the watch operation) is stopped after closing the kvstore client; * The identity allocator does not propagate a context to the watch operation, but instead relies on a external channel to stop it; * The etcd Connected() function does no longer terminate immediately when the client is closed (i.e., when the session gets closed). The reason is that the session is no longer guaranteed to be unique. * The etcd client appears to retry multiple times certain operations if it is closing, but the given context is not canceled (as in this case, as the given context is never canceled). This is the reason for the extra delay, and comes with 100 repetitions of the warning: {"level":"warn","ts":"...","logger":"etcd-client","caller":"v3/retry_interceptor.go:62", "msg":"retrying of unary invoker failed","target":"...","attempt":0, "error":"rpc error: code = Canceled desc = grpc: the client connection is closing"} Let's adopt the easiest fix here, and make sure that the client is closed after closing the manager. This ensures correct tear down ordering and prevents the etcd client retry process. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit babba79 ] Historically, the Cilium helm chart allowed to override the routing mode leveraged in combination with {gke,aksbyocni}.enabled. This is no longer possible since aff16b2 ("Change routing-mode and tunnel interaction."). According to the Cilium documentation [1,2], this appears to be the correct behavior, as the routing mode must be respectively set to native and tunnel in these cases. Hence, let's validate that users didn't configure a different routing mode, to avoid falling back silently, which may be confusing. [1]: https://docs.cilium.io/en/stable/network/concepts/routing/#id6 [2]: https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/#install-cilium (AKS tab) Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b26d9be ] Previously, when determing a keyid before the rotation, the doc suggested to run "cut -c 1". This returns only the first digit (e.g., if keyid is "15", then "1" is returned). This breaks the rotation 15=>1. Fixes: 42ef7f3 ("docs: Update IPsec key rotation command") Reported-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 70ec61c ] With the newer, asynchronous APIs, the ipcache takes on the responsibility of updating the policy engine (i.e. the SelectorCache) whenever it allocates an identity. However, the legacy APIs don't do that. This commit changes that. By centralizing this in the ipcache, we can (in a subsequent commit) stop regenerating all endpoints whenever an identity is allocated. This is wasteful, since local identities can only ever be allocated by the ipcache now. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit afc3b5f ] AllocateIdentity() takes a parameter, `notifyOwner`, that when true, tells the allocator to propagate those updates to the policy subsystem. However, the local (i.e. CIDR) identity allocator ignores this field, and *always* propagates identity updates to the policy subsystem as well as **always regenerating all endpoints**. This is needless, since the ipcache always updates the policy system as well. This change is safe since the ipcache is the only caller that passes `notifyOwner = false`, and the ipcache updates the policy system itself. All other callers to `AllocateIdentity()` set `notifyOwner = true`. Fixes: cilium#29681 Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 7ebaf63 ] This should be fixed now. Fixes: cilium#29681 Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 9f845ab ] Commit bd5ec0b introduced pre-declaring metrics label value ranges for k8s watcher metrics. This also performs a check when the metric is updated to check whether the metric label value is within the (optional) declared range. However, k8s watchers would require declaring all possible CRD types in the scope. We could make this change, but this list would be hard to maintain, and more importantly, we probably don't want to initialize metrics for CRDs that are not necessarily going to be used (i.e. CES if disabled). This reverts this change, until a better solution for this type of metric is developed. Reported-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit c084bc0 ] When a metric declared with XWithLabels(...) is updated with a vector label value that is outside the expected range of values for that label then a error log is emitted (with the metric being still emitted). This improves this by: * Switch error log to warning, as these logs do not indicate a serious runtime error but are meant to help developers catch use of undeclared label values. * Add information about what metric/label/value caused the violation. As well, this DRY's up the label value checking code. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 0bf036e ] Hive uses pflag and viper to parse configuration flags from multiple sources. If a flag is set via command-line then the pflag parser is invoked to get to the destination type as defined in the FlagSet ("flags.StringSlice" [1]), however if the flag comes from environment or config-map, then the parsing was done by a mapstructure hook [2]. This is all well and good as long as these two ways of parsing into say []string are aligned with each other. And they were. Unfortunately these were not aligned with the pre-Hive way of parsing which used viper.GetStringSlice and similar methods. Specifically viper.GetStringSlice is implemented ([3]) via cast.ToStringSlice, which uses strings.Fields that splits by whitespace instead of by commas. So to summarize the different ways a StringSlice can be parsed: - [1]: flags.StringSlice: parses with csv.Reader (split by comma) - [2]: stringToSliceHookFunc: splits by comma - [3]: viper.GetStringSlice: splits by whitespaces So while arguably the first two are more consistent, we can't just flip from splitting by spaces to splitting by commas as that creates a huge foot-gun when fields are moved from option.Config to individual hive.Config structs. To solve this we allow splitting both ways by using two mapstructure hooks that process the values before they're pushed to the config struct: - mapstructure.StringToSliceHookFunc(",") splits first string by commas. This only impacts input coming from environment, configmap or flags.String and going to []string. - fixupStringSliceHookFunc takes []string coming from flags.StringSlice or from the StringToSliceHookFunc and resplits it by whitespace if it was of length 1. With this, we have unified the parsing of []string across all the config input methods: "foo,bar,baz" => []string{"foo", "bar", "baz"} "foo bar baz" => []string{"foo", "bar", "baz"} "foo bar,baz" => []string{"foo bar", "baz"} [1]: https://github.com/spf13/pflag/blob/master/string_slice.go#L27 [2]: https://github.com/mitchellh/mapstructure/blob/main/decode_hooks.go#L104 [3]: https://github.com/spf13/viper/blob/9154b900c34ad9d88897f7e5288ce43f457f698b/viper.go#L1067 Fixes: cilium#29210 Fixes: b407ffc ("hive: Reimplement on top of dig") Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 1571cce ] BGP Manager to unset running flag when Stop is called. This is to fix flaky tests where reconcile gets called after Stop, which recreates BGP servers at shutdown stage. Signed-off-by: harsimran pabla <hpabla@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84
added
kind/backports
This PR provides functionality previously merged into master.
backport/1.15
This PR represents a backport for Cilium 1.15.x of a PR that was merged to main.
labels
Jan 2, 2024
pippolo84
requested review from
jrajahalme,
squeed,
harsimran-pabla,
rastislavs,
learnitall and
joamaki
January 2, 2024 09:18
julianwiedmann
approved these changes
Jan 2, 2024
mhofstetter
approved these changes
Jan 2, 2024
/test-backport-1.15 |
pchaigno
approved these changes
Jan 2, 2024
rastislavs
approved these changes
Jan 3, 2024
harsimran-pabla
approved these changes
Jan 3, 2024
squeed
approved these changes
Jan 4, 2024
giorio94
approved these changes
Jan 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commits look good, thanks!
joamaki
approved these changes
Jan 8, 2024
gandro
approved these changes
Jan 8, 2024
learnitall
approved these changes
Jan 9, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
jrajahalme
approved these changes
Jan 11, 2024
tommyp1ckles
approved these changes
Jan 11, 2024
julianwiedmann
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Jan 11, 2024
dylandreimerink
approved these changes
Jan 12, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.15
This PR represents a backport for Cilium 1.15.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resource.Store
inStart()
hive hooks of BGP CP #29954 (@rastislavs)Once this PR is merged, a GitHub action will update the labels of these PRs: