Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datapath: ICMP CT fixes #15275

Merged
merged 4 commits into from
Apr 19, 2021
Merged

datapath: ICMP CT fixes #15275

merged 4 commits into from
Apr 19, 2021

Conversation

brb
Copy link
Member

@brb brb commented Mar 9, 2021

See commit msgs.

The PR has been previously reviewed by @kkourt and @borkmann .

Fix ICMP Echo ID placement in CT maps

@brb brb added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Mar 9, 2021
@brb brb requested a review from a team March 9, 2021 16:56
@brb brb requested a review from a team as a code owner March 9, 2021 16:56
@brb brb requested review from jrfastab and kkourt March 9, 2021 16:56
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 9, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Mar 9, 2021
@brb brb added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 9, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 9, 2021
@brb
Copy link
Member Author

brb commented Mar 10, 2021

test-me-please

@brb
Copy link
Member Author

brb commented Mar 10, 2021

4.19 is hitting complexity issues.

@brb
Copy link
Member Author

brb commented Mar 10, 2021

test-me-please

@brb
Copy link
Member Author

brb commented Mar 10, 2021

test-4.9

@brb brb marked this pull request as draft March 11, 2021 09:30
@brb
Copy link
Member Author

brb commented Mar 11, 2021

Converted to draft until the complexity issue has been resolved.

@brb
Copy link
Member Author

brb commented Mar 23, 2021

The complexity issue should be resolved by #15217.

@pchaigno
Copy link
Member

test-1.19-4.19

@jibi jibi force-pushed the pr/brb/fix-icmp-ports branch 2 times, most recently from 7065466 to c8de6ba Compare April 16, 2021 07:52
brb added 3 commits April 16, 2021 14:13
The [1] changed the ICMP ECHO/ECHO_REPLY ID placement in CT entries in
order to fix the problem when an egress NAT entry for ECHO_REPLY cannot
be found by a corresponding CT entry which lead to leaking NAT entries,
as the CT GC could not find the NAT entries by the given CT entry.

The changed placement introduced an interesting problem described below.

What happens when a pod (10.154.0.89) sends ICMP EchoRequest to 8.8.8.8?

A CT entry with the following key is created:

dst         src          dport sport    TUPLE_F_OUT
|           |            |     |        |
0a 9a 00 59 08 08 08 08  00 00 08 00 01 00 <-- dst=pod because of
					       the reverse before the
                                               second __ct_lookup.
("ICMP OUT 10.154.0.89:2048 -> 8.8.8.8:0 [...]" in the
"cilium bpf ct list global" output).

What happens when 8.8.8.8 sends ICMP EchoRequest to the pod? The lookup
is performed for the reverse flow first with the following key:

dst         src          dport sport    TUPLE_F_OUT <-- dir is TUPLE_F_OUT
|           |            |     |        |               because we do the
0a 9a 00 59 08 08 08 08  00 00 08 00 01 00              lookup in reverse
							order first.

The key matches the first __ct_lookup(), hence the return is CT_REPLY.
Previously, before the changed ID placement, the CT key for 8.8.8.8 ->
the pod lookup was:

0a 9a 00 59 08 08 08 08  08 00 00 00 01 00

This resulted in CT_NEW instead of CT_REPLY.

[1]: #12729

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Let's say that we have a pod sending ICMP ECHO request to outside. The
handling of the request creates the following CT and NAT entries:

CT  |    src     |     dst   | dir |
    +------------+-----------+-----+
    | outside:ID |   pod:0   | OUT |

NAT |    src     |    dst    | dir |
    +------------+-----------+-----+
    | pod:ID     | outside:0 | OUT |
    +------------+-----------+-----+
    | outside:0  |   host:ID | IN  |

Now, let's say that we have the outside sending ICMP echo request to the host
running the pod with the same ID as above. The following NAT lookup is
performed:

outside:0 -> host:ID IN

The lookup will find the NAT entry from the pod->outside case. This will
translate the request making it to be delivered to the pod instead of
the host.

Fix this by making the ICMP ECHO ID placement in the NAT tuple to depend on the
ICMP type instead of the packet direction.

After this change, the NAT entries will be the same as above, but the lookup
for the outside->host case is changed to the following:

outside:ID -> host:0 IN (doesn't match any NAT entry above).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Previously, when ICMP ECHO was sent from outside to a host managed by
Cilium, the handling of the reply to it (ICMP ECHO_REPLY) used to create
the following entries:

CT  |    src     |     dst   | dir |
    +------------+-----------+-----+
    | outside:0  |  host:ID  | OUT |

NAT |    src     |    dst    | dir |
    +------------+-----------+-----+
    | host:0     | outside:ID| OUT | <-- ICMP ECHO_REPLY
    +------------+-----------+-----+
    | outside:ID |   host:ID | IN  | <-- ICMP ECHO

The NAT IN entry was useful only to avoid pod->outside to be SNAT-ed
with the same ID, but this is no longer the case after the
"datapath: Fix unintended SNAT of ICMP ECHO" commit.

Also, this removes the problematic CT GC case in which for such a CT
entry a corresponding NAT OUT entry with the existing GC logic could not
be found.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@jibi
Copy link
Member

jibi commented Apr 16, 2021

test-me-please

@aanm aanm added this to the 1.10.0 milestone Apr 16, 2021
@jibi
Copy link
Member

jibi commented Apr 16, 2021

Looks like net-next is hitting #15737, otherwise should be good for review

@jibi jibi marked this pull request as ready for review April 16, 2021 19:10
This commit reduces the complexity of the 2/7 section of the bpf_host
program by introducing a couple of state pruning points with the
relax_verifier() helper.

These points have have been determined by looking at the instructions
that the verifier is spending the most passes on.

We first start by obtaining the verifier logs:

    tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj bpf_host.o sec to-host verb

With these logs we can count how many times an instruction is examined
by the verifier, and look for groups of sequential instructions with the
highest complexity.

With that information we can then disassemble the bpf_host program and
use the debug symbols to approximately match the line of code that may
require placing an additional state pruning point.

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi
Copy link
Member

jibi commented Apr 19, 2021

No need to rerun full CI, I just amended the latest commit message

@brb
Copy link
Member Author

brb commented Apr 19, 2021

Marking it as ready-to-merge. Let's wait for @pchaigno ACK.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 19, 2021
@qmonnet qmonnet merged commit c369861 into master Apr 19, 2021
1.10.0 automation moved this from In progress to Done Apr 19, 2021
@qmonnet qmonnet deleted the pr/brb/fix-icmp-ports branch April 19, 2021 15:00
@@ -845,6 +846,7 @@ static __always_inline int ct_create4(const void *map_main,

entry.lb_loopback = ct_state->loopback;
entry.node_port = ct_state->node_port;
relax_verifier();
Copy link
Member

@borkmann borkmann Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, do we have a cheaper call for relax_verifier() internally with less overhead and which could potentially be inlined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 What do you have in mind? We also want something that has zero arguments to minimize impact on complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants