datapath: ICMP CT fixes #15275

brb · 2021-03-09T16:56:36Z

See commit msgs.

The PR has been previously reviewed by @kkourt and @borkmann .

Fix ICMP Echo ID placement in CT maps

brb · 2021-03-10T05:27:28Z

test-me-please

brb · 2021-03-10T15:34:42Z

4.19 is hitting complexity issues.

brb · 2021-03-10T16:16:05Z

test-me-please

brb · 2021-03-10T19:58:49Z

test-4.9

brb · 2021-03-11T09:30:38Z

Converted to draft until the complexity issue has been resolved.

brb · 2021-03-23T13:34:47Z

The complexity issue should be resolved by #15217.

pchaigno · 2021-03-26T11:45:22Z

test-1.19-4.19

The [1] changed the ICMP ECHO/ECHO_REPLY ID placement in CT entries in order to fix the problem when an egress NAT entry for ECHO_REPLY cannot be found by a corresponding CT entry which lead to leaking NAT entries, as the CT GC could not find the NAT entries by the given CT entry. The changed placement introduced an interesting problem described below. What happens when a pod (10.154.0.89) sends ICMP EchoRequest to 8.8.8.8? A CT entry with the following key is created: dst src dport sport TUPLE_F_OUT | | | | | 0a 9a 00 59 08 08 08 08 00 00 08 00 01 00 <-- dst=pod because of the reverse before the second __ct_lookup. ("ICMP OUT 10.154.0.89:2048 -> 8.8.8.8:0 [...]" in the "cilium bpf ct list global" output). What happens when 8.8.8.8 sends ICMP EchoRequest to the pod? The lookup is performed for the reverse flow first with the following key: dst src dport sport TUPLE_F_OUT <-- dir is TUPLE_F_OUT | | | | | because we do the 0a 9a 00 59 08 08 08 08 00 00 08 00 01 00 lookup in reverse order first. The key matches the first __ct_lookup(), hence the return is CT_REPLY. Previously, before the changed ID placement, the CT key for 8.8.8.8 -> the pod lookup was: 0a 9a 00 59 08 08 08 08 08 00 00 00 01 00 This resulted in CT_NEW instead of CT_REPLY. [1]: #12729 Signed-off-by: Martynas Pumputis <m@lambda.lt>

Let's say that we have a pod sending ICMP ECHO request to outside. The handling of the request creates the following CT and NAT entries: CT | src | dst | dir | +------------+-----------+-----+ | outside:ID | pod:0 | OUT | NAT | src | dst | dir | +------------+-----------+-----+ | pod:ID | outside:0 | OUT | +------------+-----------+-----+ | outside:0 | host:ID | IN | Now, let's say that we have the outside sending ICMP echo request to the host running the pod with the same ID as above. The following NAT lookup is performed: outside:0 -> host:ID IN The lookup will find the NAT entry from the pod->outside case. This will translate the request making it to be delivered to the pod instead of the host. Fix this by making the ICMP ECHO ID placement in the NAT tuple to depend on the ICMP type instead of the packet direction. After this change, the NAT entries will be the same as above, but the lookup for the outside->host case is changed to the following: outside:ID -> host:0 IN (doesn't match any NAT entry above). Signed-off-by: Martynas Pumputis <m@lambda.lt>

Previously, when ICMP ECHO was sent from outside to a host managed by Cilium, the handling of the reply to it (ICMP ECHO_REPLY) used to create the following entries: CT | src | dst | dir | +------------+-----------+-----+ | outside:0 | host:ID | OUT | NAT | src | dst | dir | +------------+-----------+-----+ | host:0 | outside:ID| OUT | <-- ICMP ECHO_REPLY +------------+-----------+-----+ | outside:ID | host:ID | IN | <-- ICMP ECHO The NAT IN entry was useful only to avoid pod->outside to be SNAT-ed with the same ID, but this is no longer the case after the "datapath: Fix unintended SNAT of ICMP ECHO" commit. Also, this removes the problematic CT GC case in which for such a CT entry a corresponding NAT OUT entry with the existing GC logic could not be found. Signed-off-by: Martynas Pumputis <m@lambda.lt>

jibi · 2021-04-16T12:14:30Z

test-me-please

jibi · 2021-04-16T19:09:27Z

Looks like net-next is hitting #15737, otherwise should be good for review

bpf/lib/conntrack.h

This commit reduces the complexity of the 2/7 section of the bpf_host program by introducing a couple of state pruning points with the relax_verifier() helper. These points have have been determined by looking at the instructions that the verifier is spending the most passes on. We first start by obtaining the verifier logs: tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj bpf_host.o sec to-host verb With these logs we can count how many times an instruction is examined by the verifier, and look for groups of sequential instructions with the highest complexity. With that information we can then disassemble the bpf_host program and use the debug symbols to approximately match the line of code that may require placing an additional state pruning point. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

jibi · 2021-04-19T08:06:02Z

No need to rerun full CI, I just amended the latest commit message

brb · 2021-04-19T08:08:35Z

~~Marking it as ready-to-merge.~~ Let's wait for @pchaigno ACK.

borkmann · 2021-04-19T18:24:41Z

bpf/lib/conntrack.h

@@ -845,6 +846,7 @@ static __always_inline int ct_create4(const void *map_main,

 	entry.lb_loopback = ct_state->loopback;
 	entry.node_port = ct_state->node_port;
+	relax_verifier();


Btw, do we have a cheaper call for relax_verifier() internally with less overhead and which could potentially be inlined?

🤔 What do you have in mind? We also want something that has zero arguments to minimize impact on complexity.

brb added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Mar 9, 2021

brb requested a review from a team March 9, 2021 16:56

brb requested a review from a team as a code owner March 9, 2021 16:56

brb requested review from jrfastab and kkourt March 9, 2021 16:56

maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 9, 2021

maintainer-s-little-helper bot assigned jrfastab and kkourt Mar 9, 2021

maintainer-s-little-helper bot added this to In progress in 1.10.0 Mar 9, 2021

brb added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 9, 2021

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 9, 2021

kkourt approved these changes Mar 10, 2021

View reviewed changes

brb force-pushed the pr/brb/fix-icmp-ports branch from 6ba3495 to 1cb41bd Compare March 10, 2021 15:43

brb marked this pull request as draft March 11, 2021 09:30

brb added the priority/release-blocker label Mar 19, 2021

pchaigno added release-blocker/1.10 and removed priority/release-blocker labels Mar 26, 2021

pchaigno self-requested a review March 26, 2021 11:18

maintainer-s-little-helper bot assigned pchaigno Mar 26, 2021

pchaigno force-pushed the pr/brb/fix-icmp-ports branch from 1cb41bd to 3edf2fa Compare March 26, 2021 11:44

jibi force-pushed the pr/brb/fix-icmp-ports branch 2 times, most recently from 7065466 to c8de6ba Compare April 16, 2021 07:52

brb added 3 commits April 16, 2021 14:13

jibi force-pushed the pr/brb/fix-icmp-ports branch from c8de6ba to b696de3 Compare April 16, 2021 12:14

aanm added this to the 1.10.0 milestone Apr 16, 2021

jibi marked this pull request as ready for review April 16, 2021 19:10

brb commented Apr 16, 2021

View reviewed changes

bpf/lib/conntrack.h Show resolved Hide resolved

jibi force-pushed the pr/brb/fix-icmp-ports branch from b696de3 to 107080d Compare April 19, 2021 08:05

pchaigno approved these changes Apr 19, 2021

View reviewed changes

pchaigno unassigned kkourt and pchaigno Apr 19, 2021

pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 19, 2021

qmonnet merged commit c369861 into master Apr 19, 2021

1.10.0 automation moved this from In progress to Done Apr 19, 2021

qmonnet deleted the pr/brb/fix-icmp-ports branch April 19, 2021 15:00

borkmann reviewed Apr 19, 2021

View reviewed changes

This was referenced Apr 28, 2021

Prepare for release v1.10.0-rc1 #15896

Closed

Prepare for release v1.10.0-rc1 #15897

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datapath: ICMP CT fixes #15275

datapath: ICMP CT fixes #15275

brb commented Mar 9, 2021 •

edited

brb commented Mar 10, 2021

brb commented Mar 10, 2021

brb commented Mar 10, 2021

brb commented Mar 10, 2021

brb commented Mar 11, 2021

brb commented Mar 23, 2021

pchaigno commented Mar 26, 2021

jibi commented Apr 16, 2021

jibi commented Apr 16, 2021 •

edited

jibi commented Apr 19, 2021

brb commented Apr 19, 2021 •

edited

borkmann Apr 19, 2021 •

edited

pchaigno Apr 19, 2021

datapath: ICMP CT fixes #15275

datapath: ICMP CT fixes #15275

Conversation

brb commented Mar 9, 2021 • edited

brb commented Mar 10, 2021

brb commented Mar 10, 2021

brb commented Mar 10, 2021

brb commented Mar 10, 2021

brb commented Mar 11, 2021

brb commented Mar 23, 2021

pchaigno commented Mar 26, 2021

jibi commented Apr 16, 2021

jibi commented Apr 16, 2021 • edited

jibi commented Apr 19, 2021

brb commented Apr 19, 2021 • edited

borkmann Apr 19, 2021 • edited

Choose a reason for hiding this comment

pchaigno Apr 19, 2021

Choose a reason for hiding this comment

brb commented Mar 9, 2021 •

edited

jibi commented Apr 16, 2021 •

edited

brb commented Apr 19, 2021 •

edited

borkmann Apr 19, 2021 •

edited