Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pr/v1.11 backport 2023 05 11 #25382

Merged
merged 6 commits into from May 12, 2023

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented May 11, 2023

v1.11 backports 2023-05-11

Once this PR is merged, you can update the PR labels via:

$ for pr in 23254 24838 25212 25222 25257; do contrib/backporting/set-labels.py $pr done 1.11; done

[ upstream commit d39ca10 ]

While extending datapath coverage, Martynas found a new bug affecting
IPsec + VXLAN + endpoint routes. In that configuration, cross-node
pod connectivity seems to fail and we see the IPsec XfrmInNoPols error
counter increasing.

By tracing the connection, we can see that the packet disappears on the
receiving node, after decryption, between bpf_overlay (second traversal)
and the lxc device. On that path, given decryption already happened,
we should match the FWD XFRM policy:

    src 0.0.0.0/0 dst 10.244.0.0/24 uid 0
	dir fwd action allow index 106 priority 2975 share any flag  (0x00000000)
	lifetime config:
	  limit: soft (INF)(bytes), hard (INF)(bytes)
	  limit: soft (INF)(packets), hard (INF)(packets)
	  expire add: soft 0(sec), hard 0(sec)
	  expire use: soft 0(sec), hard 0(sec)
	lifetime current:
	  0(bytes), 0(packets)
	  add 2023-01-13 14:34:18 use 2023-01-13 14:34:22
	mark 0/0xf00

Clearly, given the non-zero XfrmInNoPols, the packet doesn't match the
policy. Note XfrmInNoPols is also reported for FWD; there is no
XfrmFwdNoPols.

Checking the source code [1], we can see that when endpoint routes are
enabled, we encode the source security identity into the packet mark.
We thus won't match the 0/0xf00 mark on the FWD XFRM policy.

We shouldn't need to match on any packet mark for the FWD XFRM policy;
we want to allow all packets through. This commit therefore removes the
packet mark match for the FWD direction.

1 - https://github.com/cilium/cilium/blob/v1.13.0-rc4/bpf/lib/l3.h#L151-L154
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme added kind/backports This PR provides functionality previously merged into master. backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. labels May 11, 2023
@jrajahalme jrajahalme requested a review from a team as a code owner May 11, 2023 11:59
[ upstream commit 54fa995 ]

[ Backporter's notes: Dropped call to 'helpers.RunsOnAKS()' which does not exist in v1.11 ]

The previous commit fixed a connectivity bug affecting the above
configuration. We can now extend tests to cover that configuration.

Note that these tests are soon going to be removed and replaced by the
new GitHub workflow. However, we may need to backport this pull request
to stable branches where the GitHub workflow doesn't exist. Therefore,
the corresponding extension of the workflow test will be done in a
separate pull request.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the pr/v1.11-backport-2023-05-11 branch from fd817a7 to 64c6c25 Compare May 11, 2023 12:13
@jrajahalme
Copy link
Member Author

jrajahalme commented May 11, 2023

@pchaigno Dropped call to helpers.RunsOnAKS() as it does not exist in v1.11.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PRs look good. Thanks!

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my commit lgtm

@alan-kut
Copy link
Contributor

My changes LGTM

alan-kut and others added 4 commits May 11, 2023 15:51
[ upstream commit 71af0a2 ]

[ Backporter's node: Added '!privileged_tests' to the test ]

There are condition possible in which CEP changes CES.
This leads to CEP being present in multiple CESs for some time.
In such cases the standard logic may not work as it always expect
to have a single CEP representation.

This commit changes to logic to handle multiple CEPs properly.

Signed-off-by: Alan Kutniewski <kutniewski@google.com>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 7335aa3 ]

Another option would be to quarantine the test and find an assignee to
make the test more robust, but I assert that we don't need test coverage
for monitor verbose output.

Fixes: cilium#25178

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 9115e05 ]

We correctly detect that we failed to allocate a new node ID (due to
exhaustion of the idpool), but then still go ahead and map it. This
leads to spurious errors which include "Failed to map node IP address to
allocated ID".

Instead, don't try to map NoID and return it directly.

Fixes: af88b42 (datapath: Introduce node IDs)

Suggested-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 2045d59 ]

We currently install the default-drop XFRM policy when we install the
XFRM policies and states for the local node. It is however possible for
us to start installing XFRM policies and states for remote nodes before
we handle the local one.

The default-drop XFRM policy is a safety measure for when we move XFRM
policies around. Because we don't always have a way to atomically update
XFRM policies, it's possible that we end up with a very short time where
no encryption XFRM OUT policy is matching a subset of traffic. The
default-drop policy ensures that we drop such traffic instead of letting
it leave the node as plain-text.

We therefore want this default-drop XFRM policy to be installed before
we update any other other XFRM policy. This commit therefore moves its
installation before any other XFRM update instead of before just
local-node XFRM updates.

Fixes: 7d44f37 ("ipsec: Catch-default default drop policy for encryption")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the pr/v1.11-backport-2023-05-11 branch from 64c6c25 to b5aefd3 Compare May 11, 2023 12:51
@jrajahalme
Copy link
Member Author

Added !privileged_tests to the added Go test.

@jrajahalme
Copy link
Member Author

jrajahalme commented May 11, 2023

/test-backport-1.11

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed:

Click to show.

Test Name

K8sKafkaPolicyTest Kafka Policy Tests KafkaPolicies

Failure Output

FAIL: Found 2 k8s-app=cilium logs matching list of errors that must be investigated:

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.21-kernel-4.9/2628/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@ciliumbot
Copy link

Build finished.

@jrajahalme
Copy link
Member Author

/test-1.17-4.9

@jrajahalme
Copy link
Member Author

/test-1.21-4.9

@jrajahalme
Copy link
Member Author

/test-1.21-5.4

@jrajahalme
Copy link
Member Author

/test-1.22-4.19

@jrajahalme
Copy link
Member Author

/test-1.23-4.9

@jrajahalme
Copy link
Member Author

/test-runtime

@jrajahalme
Copy link
Member Author

/test-upstream-k8s

@jrajahalme
Copy link
Member Author

/test-1.16-netnext

@jrajahalme
Copy link
Member Author

/test-1.18-4.9

@jrajahalme
Copy link
Member Author

/test-1.19-4.9

@jrajahalme
Copy link
Member Author

test-1.17-4.9 failed with known flake #24394

@jrajahalme
Copy link
Member Author

test-1.21-4.9 failed with known flake #24394

@jrajahalme
Copy link
Member Author

Travis run succeeded, but build status was still "in progress". Two instances of kernel 4.9 flake. Merging.

@jrajahalme jrajahalme merged commit 5cf233e into cilium:v1.11 May 12, 2023
49 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants