Adds an `ACCEPT` rule for untracked pkts in `filter:CILIUM_OUTPUT` #17585

Weil0ng · 2021-10-12T19:54:52Z

Currently, when no-track-port is specified for a pod (the only use
case for now is nodelocaldns), we insert several iptable rules to skip
conntrack for packet to and from the pod to achieve pararity with OSS
node-local-dns.

However, we need to add a specific accept rule int the CILIUM_OUTPUT
chain to accept such packets. Otherwise, a dns query pkt originated from
the hostns will skip conntrack and gets dropped in the filter OUTPUT
chain. This rule is however NOT needed for standard OSS node-local-dns
because it relies on the loopback rule installed by the OS to allowlist
this traffic pattern. With Cilium, we DNAT such packet in a way that its
dst is the pod IP of the local node-cache pod, so it will NOT hit the
loopback dev, hence we need to punch a specific hole to allowlist it.

Fixes: #16694

Signed-off-by: Weilong Cui cuiwl@google.com

aditighag · 2021-10-12T21:51:41Z

Thanks for tracking this down. Any idea why we need these additional rules only for packets originating from hostns pods destined to node-local dns pod? Traffic from regular pods destined to port 53 should also hit the NO-TRACK rules, no?

I was under the impression that only when endpoint routes are disabled, intra-node pod-pod traffic will be redirected by BPF code to the destination device, in which case we wouldn't hit the issue you described. OTOH, when endpoints are enabled (default on GKE) traffic is handed over to the network stack for routing.

Weil0ng · 2021-10-13T00:19:54Z

I was under the impression that only when endpoint routes are disabled, intra-node pod-pod traffic will be redirected by BPF code to the destination device, in which case we wouldn't hit the issue you described. OTOH, when endpoints are enabled (default on GKE) traffic is handed over to the network stack for routing.

Exactly. For non-hostns pods, pkt never hits raw:OUTPUT because it goes through PREROUTING -> FORWARD -> POSTROUTING path, so it never hits the NOTRACK rule on the "egress" path (from originating pod's perspective).

pchaigno · 2021-10-13T14:20:22Z

Otherwise, a dns query pkt originated from the hostns will skip conntrack and gets dropped in the filter OUTPUT chain.

Why does it get dropped if conntrack is skipped? Is there a specific rule that drops it?

The change is trivial but the explanation is a bit hard to follow right now. Any chance you could illustrate with iptables rules?

Weil0ng · 2021-10-13T15:03:47Z

Why does it get dropped if conntrack is skipped? Is there a specific rule that drops it?

The change is trivial but the explanation is a bit hard to follow right now. Any chance you could illustrate with iptables rules?

The detailed analysis is in here: #16694 (comment) :) There I listed the iptable rules and why the pkt was dropped.

aditighag · 2021-10-13T20:37:35Z

I was under the impression that only when endpoint routes are disabled, intra-node pod-pod traffic will be redirected by BPF code to the destination device, in which case we wouldn't hit the issue you described. OTOH, when endpoints are enabled (default on GKE) traffic is handed over to the network stack for routing.

Exactly. For non-hostns pods, pkt never hits raw:OUTPUT because it goes through PREROUTING -> FORWARD -> POSTROUTING path, so it never hits the NOTRACK rule on the "egress" path (from originating pod's perspective).

I'm confused by this. The reason NO_TRACK rule was added for traffic going to node-local dns pods is because we wanted to skip tracking DNS traffic, no?
Here is your comment from the original issue - #13686 (comment).

These rules will be hit when we are running in endpoint routing mode where the pkts will be handed to the stack after existing the pod namespace.

Am I missing something?

Weil0ng · 2021-10-14T01:04:19Z

I'm confused by this. The reason NO_TRACK rule was added for traffic going to node-local dns pods is because we wanted to skip tracking DNS traffic, no?
Here is your comment from the original issue - #13686 (comment).

These rules will be hit when we are running in endpoint routing mode where the pkts will be handed to the stack after existing the pod namespace.

There are 3 separate sets of NOTRACK rules:

https://github.com/cilium/cilium/blob/master/pkg/datapath/iptables/iptables.go#L807 This one matches pkt that goes into the node-local-dns pods, so if the originating pod is in pod namespace, it will hit this rule to skip conntrack (but never hit the 3rd rule)
https://github.com/cilium/cilium/blob/master/pkg/datapath/iptables/iptables.go#L817 This one matches pkt that comes from node-local-dns pod
https://github.com/cilium/cilium/blob/master/pkg/datapath/iptables/iptables.go#L813 This one matches pkt that is originated from local process in the root ns, aka from host-network pods

Issue here is that for the first 2 rules, we added ACCEPT rules to allowlist them, but for the last, we are missing that ACCEPT rule.

Weil0ng · 2021-10-15T16:31:47Z

Friendly ping :)

aditighag · 2021-10-15T16:38:52Z

Thanks for the context.

For non-hostns pods, pkt never hits raw:OUTPUT because it goes through PREROUTING -> FORWARD -> POSTROUTING path, so it never hits the NOTRACK rule on the "egress" path (from originating pod's perspective).

I was confused by your earlier statement. From my reading of the code, traffic from regular pods do hit the NOTRACK rule, but we've also added an ACCEPT rule to allow such traffic. If that's the case, the fix looks good to me.

Weil0ng · 2021-10-15T17:42:50Z

I was confused by your earlier statement. From my reading of the code, traffic from regular pods do hit the NOTRACK rule, but we've also added an ACCEPT rule to allow such traffic. If that's the case, the fix looks good to me.

Yes, sorry I wasn't being clear in my initial comment, when I said "it never hits the NOTRACK rule" I really meant pkts from regular pods do NOT hit the rule in the raw:CILIUM_OUTPUT table, but they DO hit the NOTRACK rule in the PREROUTING table and as you said we have an ACCEPT rule to allow such pkts to go through.

Weil0ng · 2021-10-19T18:00:20Z

Friendly ping @pchaigno :)

pchaigno · 2021-11-02T16:33:20Z

However, we need to add a specific accept rule int the CILIUM_OUTPUT chain to accept such packets. Otherwise, a dns query pkt originated from the hostns will skip conntrack and gets dropped in the filter OUTPUT chain.

I might be missing something obvious, but what's the relation between skipping conntrack and not matching any rule in filter:output? You seem to suggest the first causes the second.

Issue here is that for the first 2 rules, we added ACCEPT rules to allowlist them, but for the last, we are missing that ACCEPT rule.

Why wasn't the ACCEPT rule added before? Was it simply an oversight?

Weil0ng · 2021-11-02T23:24:00Z

I might be missing something obvious, but what's the relation between skipping conntrack and not matching any rule in filter:output? You seem to suggest the first causes the second.

Yes, if you look at the filter:output rules here #16694 (comment), specificaly

3     300K  136M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW,RELATED,ESTABLISHED
4        0     0 ACCEPT     all  --  *      lo      0.0.0.0/0            0.0.0.0/0

rule 3 only allows for traffic that is conntracked, and rule 4 only allows for traffic to go through lo device

Why wasn't the ACCEPT rule added before? Was it simply an oversight?

When we added the rules, we followed the OSS node-local-dns implementation, where they do not require such rule because same traffic flow would hit lo dev in their deployment mode. So yes, we overlooked this subtlety.

pchaigno · 2021-11-03T09:50:51Z

rule 3 only allows for traffic that is conntracked, and rule 4 only allows for traffic to go through lo device

Aah 🤦 Not sure how I missed that rule; that's exactly what I was looking for.

When we added the rules, we followed the OSS node-local-dns implementation, where they do not require such rule because same traffic flow would hit lo dev in their deployment mode.

Ah, makes sense 👍

Weil0ng · 2021-11-03T17:35:18Z

/test

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with direct routing Tests NodePort with sessionAffinity

Failure Output

FAIL: Expected

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

Weil0ng · 2021-11-03T22:09:59Z

This change should not affect any of the failed tests, they are most likely flakes.

pchaigno · 2021-11-03T22:31:42Z

This change should not affect any of the failed tests, they are most likely flakes.

Can we ensure we have issues tracking those flakes?

Currently, when `no-track-port` is specified for a pod (the only use case for now is nodelocaldns), we insert several iptable rules to skip conntrack for packet to and from the pod to achieve pararity with OSS node-local-dns. However, we need to add a specific accept rule int the CILIUM_OUTPUT chain to accept such packets. Otherwise, a dns query pkt originated from the hostns will skip conntrack and gets dropped in the filter OUTPUT chain. This rule is however NOT needed for standard OSS node-local-dns because it relies on the loopback rule installed by the OS to allowlist this traffic pattern. With Cilium, we DNAT such packet in a way that its dst is the pod IP of the local node-cache pod, so it will NOT hit the loopback dev, hence we need to punch a specific hole to allowlist it. Fixes: 16694 Signed-off-by: Weilong Cui <cuiwl@google.com>

Weil0ng · 2021-11-10T19:08:36Z

/test

Weil0ng added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Oct 12, 2021

Weil0ng requested review from aditighag and a team October 12, 2021 19:54

maintainer-s-little-helper bot assigned aditighag Oct 12, 2021

Weil0ng requested a review from jrfastab October 12, 2021 19:54

maintainer-s-little-helper bot assigned jrfastab Oct 12, 2021

pchaigno self-requested a review October 13, 2021 14:18

maintainer-s-little-helper bot assigned pchaigno Oct 13, 2021

pchaigno removed the request for review from jrfastab October 13, 2021 14:18

pchaigno unassigned jrfastab Oct 13, 2021

pchaigno added the needs-backport/1.10 label Oct 13, 2021

maintainer-s-little-helper bot added this to Needs backport from master in 1.10.5 Oct 13, 2021

aditighag added the area/lrp Impacts Local Redirect Policy. label Oct 13, 2021

joestringer added this to Needs backport from master in 1.10.6 Oct 13, 2021

joestringer removed this from Needs backport from master in 1.10.5 Oct 13, 2021

aditighag approved these changes Oct 15, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned aditighag Oct 15, 2021

pchaigno approved these changes Nov 3, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned pchaigno Nov 3, 2021

Weil0ng force-pushed the nldnotrack branch from a00230f to 6cae180 Compare November 3, 2021 17:34

Weil0ng force-pushed the nldnotrack branch from 6cae180 to b908885 Compare November 10, 2021 19:02

pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 11, 2021

aanm merged commit 2215c02 into cilium:master Nov 11, 2021

ti-mo mentioned this pull request Nov 12, 2021

v1.10 backports 2021-11-12 #17861

Merged

ti-mo added backport-pending/1.10 and removed needs-backport/1.10 labels Nov 12, 2021

gandro added backport-done/1.10 and removed backport-pending/1.10 labels Nov 23, 2021

maintainer-s-little-helper bot moved this from Needs backport from master to Backport done to v1.10 in 1.10.6 Nov 23, 2021

joestringer mentioned this pull request Dec 10, 2021

Prepare for release v1.10.6 #18214

Merged

Weil0ng mentioned this pull request Mar 10, 2023

Adds a new NOTRACK rule for node-local-dns #24230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds an `ACCEPT` rule for untracked pkts in `filter:CILIUM_OUTPUT` #17585

Adds an `ACCEPT` rule for untracked pkts in `filter:CILIUM_OUTPUT` #17585

Weil0ng commented Oct 12, 2021 •

edited

aditighag commented Oct 12, 2021

Weil0ng commented Oct 13, 2021

pchaigno commented Oct 13, 2021

Weil0ng commented Oct 13, 2021

aditighag commented Oct 13, 2021 •

edited

Weil0ng commented Oct 14, 2021 •

edited

Weil0ng commented Oct 15, 2021

aditighag commented Oct 15, 2021 •

edited

Weil0ng commented Oct 15, 2021

Weil0ng commented Oct 19, 2021

pchaigno commented Nov 2, 2021

Weil0ng commented Nov 2, 2021

pchaigno commented Nov 3, 2021

Weil0ng commented Nov 3, 2021 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

Weil0ng commented Nov 3, 2021

pchaigno commented Nov 3, 2021

Weil0ng commented Nov 10, 2021

Adds an ACCEPT rule for untracked pkts in filter:CILIUM_OUTPUT #17585

Adds an ACCEPT rule for untracked pkts in filter:CILIUM_OUTPUT #17585

Conversation

Weil0ng commented Oct 12, 2021 • edited

aditighag commented Oct 12, 2021

Weil0ng commented Oct 13, 2021

pchaigno commented Oct 13, 2021

Weil0ng commented Oct 13, 2021

aditighag commented Oct 13, 2021 • edited

Weil0ng commented Oct 14, 2021 • edited

Weil0ng commented Oct 15, 2021

aditighag commented Oct 15, 2021 • edited

Weil0ng commented Oct 15, 2021

Weil0ng commented Oct 19, 2021

pchaigno commented Nov 2, 2021

Weil0ng commented Nov 2, 2021

pchaigno commented Nov 3, 2021

Weil0ng commented Nov 3, 2021 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

Weil0ng commented Nov 3, 2021

pchaigno commented Nov 3, 2021

Weil0ng commented Nov 10, 2021

Adds an `ACCEPT` rule for untracked pkts in `filter:CILIUM_OUTPUT` #17585

Adds an `ACCEPT` rule for untracked pkts in `filter:CILIUM_OUTPUT` #17585

Weil0ng commented Oct 12, 2021 •

edited

aditighag commented Oct 13, 2021 •

edited

Weil0ng commented Oct 14, 2021 •

edited

aditighag commented Oct 15, 2021 •

edited

Weil0ng commented Nov 3, 2021 •

edited by maintainer-s-little-helper bot