Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: fix missing ipv6 ct entry for SNATed pod traffic #28813

Merged
merged 1 commit into from
Nov 24, 2023

Conversation

oblazek
Copy link
Contributor

@oblazek oblazek commented Oct 26, 2023

When combining HostFW with kube-proxy based masquerading
host traffic will have HOST_ID for both - packet mark based
src_id and ip cache based src_id.
While SNATed pod traffic will only have HOST_ID from the
ipcache (as it doesn't pass through the iptables rules that
set the mark).
Therefore for SNATed traffic we get HOST_ID only from
ipcache and not from packet mark it means we are handling
SNATed traffic from pod that needs to have a conntrack entry
so that return traffic doesn't go through host firewall and
is skipped.

This commit fixes the issue and adds a missing ct entry for
SNATed traffic.

cc @julianwiedmann

Fixes: #26886

related discussion: https://cilium.slack.com/archives/CDKG8NNHK/p1698155364279479

bpf: Fix drop of IPv6 reply traffic when 1) pod-originating connection is SNATed by iptables, and 2) Host Firewall is enabled.

@oblazek oblazek requested a review from a team as a code owner October 26, 2023 11:33
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 26, 2023
@oblazek oblazek force-pushed the ob-ipv6-hostfw-fix-no-masq branch 2 times, most recently from 0660eb3 to 5b609bb Compare October 26, 2023 11:51
@oblazek oblazek marked this pull request as draft October 26, 2023 11:56
@oblazek oblazek marked this pull request as ready for review October 26, 2023 12:04
@oblazek
Copy link
Contributor Author

oblazek commented Oct 26, 2023

/test

@julianwiedmann
Copy link
Member

julianwiedmann commented Oct 27, 2023

Hi @oblazek , ty for looking into this! Will have a closer look ASAP.

But just to set some context for review, as the description doesn't capture my current understanding.
With ENABLE_HOST_FIREWALL set, we need to handle two cases in the egress path:

  • actual host traffic. Here we want to apply egress policy.
  • SNATed pod traffic (when not using BPF-masquerading). Here we don't apply egress policy. But we still create a CT entry, so that replies can naturally bypass ingress policy.

We differentiate between the two cases by comparing the mark-derived identity with the ipcache-derived identity. Host traffic will have HOST_ID for both, while SNATed pod traffic will only have HOST_ID from the ipcache (as it doesn't pass through the iptables rule that sets the mark).

Throwing --install-iptables-rules=false into this picture means that legitimate host traffic will not be marked either, and thus is mis-identified as "SNATed pod traffic". And so while we create a CT entry for it (and the reply traffic flow works), we don't actually apply egress policy to it when we should.

Do you agree with that assessment, or is there still something missing from the picture?

@oblazek
Copy link
Contributor Author

oblazek commented Oct 27, 2023

Throwing --install-iptables-rules=false into this picture means that legitimate host traffic will not be marked either, and thus is mis-identified as "SNATed pod traffic". And so while we create a CT entry for it (and the reply traffic flow works), we don't actually apply egress policy to it when we should.

Do you agree with that assessment, or is there still something missing from the picture?

You are totally right, if iptables are not present, there is no mark and return __ipv6_host_policy_egress(ctx, src_id == HOST_ID,... can't be relied on. So at least there should be a logical OR:
src_id == HOST_ID || ipcache_srcid == HOST_ID so that egress policy is correctly enforced.

I believe the same should be done for ipv4

@julianwiedmann julianwiedmann added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/host-firewall Impacts the host firewall or the host endpoint. needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 27, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 27, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.4 Oct 27, 2023
@julianwiedmann julianwiedmann added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. feature/ipv6 Relates to IPv6 protocol support labels Oct 27, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 27, 2023
@oblazek oblazek force-pushed the ob-ipv6-hostfw-fix-no-masq branch 3 times, most recently from c4a487e to 53fdf39 Compare October 27, 2023 09:17
@oblazek
Copy link
Contributor Author

oblazek commented Oct 27, 2023

/test

@julianwiedmann
Copy link
Member

Throwing --install-iptables-rules=false into this picture means that legitimate host traffic will not be marked either, and thus is mis-identified as "SNATed pod traffic". And so while we create a CT entry for it (and the reply traffic flow works), we don't actually apply egress policy to it when we should.
Do you agree with that assessment, or is there still something missing from the picture?

You are totally right, if iptables are not present, there is no mark and return __ipv6_host_policy_egress(ctx, src_id == HOST_ID,... can't be relied on. So at least there should be a logical OR: src_id == HOST_ID || ipcache_srcid == HOST_ID so that egress policy is correctly enforced.

I believe the same should be done for ipv4

agree, but this needs to be a conditional change (only when --install-iptables-rules=false is set).

@oblazek
Copy link
Contributor Author

oblazek commented Nov 15, 2023

updated, now the PR should only contain ipv6 whitelisting related changes.. will do the rest in the followup PR

@oblazek
Copy link
Contributor Author

oblazek commented Nov 15, 2023

/test

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty @oblazek ! There's one part where we need to check if the mark-based identity and ipcache-based identity match up (see below). But it's missing the ipcache-based identity ... could you have a look into passing that through, along what IPv4 does?

bpf/bpf_host.c Outdated Show resolved Hide resolved
bpf/lib/host_firewall.h Outdated Show resolved Hide resolved
bpf/bpf_host.c Show resolved Hide resolved
@oblazek
Copy link
Contributor Author

oblazek commented Nov 15, 2023

Hmm now the CB_IPCACHE_SRC_LABEL part is com...

yeah I must have missed that when working on different branches :(

@oblazek
Copy link
Contributor Author

oblazek commented Nov 15, 2023

/test

@oblazek
Copy link
Contributor Author

oblazek commented Nov 16, 2023

the 2 pipeline fails are not related to this fix

Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an initial look. The logic to set CB_IPCACHE_SRC_LABEL is still missing, but otherwise, the logic looks correct to me. I found some small naming inconsistency and one potential bug.

bpf/lib/host_firewall.h Show resolved Hide resolved
bpf/bpf_host.c Outdated Show resolved Hide resolved
bpf/bpf_host.c Outdated Show resolved Hide resolved
When combining HostFW with kube-proxy based masquerading
host traffic will have HOST_ID for both - packet mark based
src_id and ip cache based src_id.
While SNATed pod traffic will only have HOST_ID from the
ipcache (as it doesn't pass through the iptables rules that
set the mark).
Therefore for SNATed traffic we get HOST_ID only from
ipcache and not from packet mark it means we are handling
SNATed traffic from pod that needs to have a conntrack entry
so that return traffic doesn't go through host firewall and
is skipped.

This commit fixes the issue and adds a missing ct entry for
SNATed traffic and it also unifies the code between ipv4 and
ipv6 versions

Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz>
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now LGTM 👍

@julianwiedmann
Copy link
Member

/test

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 23, 2023
@julianwiedmann julianwiedmann added this pull request to the merge queue Nov 24, 2023
Merged via the queue into cilium:main with commit 9c1031e Nov 24, 2023
60 of 61 checks passed
@joamaki joamaki mentioned this pull request Nov 29, 2023
14 tasks
@joamaki joamaki added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Nov 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.5 Nov 29, 2023
@github-actions github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Dec 1, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.14 in 1.14.5 Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/host-firewall Impacts the host firewall or the host endpoint. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. feature/ipv6 Relates to IPv6 protocol support kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.14.4
Needs backport from main
Status: Released
Development

Successfully merging this pull request may close these issues.

IPv6 packets falsely dropped; conntrack issue
4 participants