Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iptables: add support for NOTRACK rules for pod to pod traffic #15264

Merged
merged 1 commit into from
Mar 24, 2021

Conversation

jibi
Copy link
Member

@jibi jibi commented Mar 9, 2021

When running in KPR mode, all pod-to-pod traffic goes through connection
tracking twice: the first time traffic is tracked by netfilter and the
second one by the BPF datapath.

This is suboptimal as doing conntrack at the netfilter layer is not
strictly required for the operation of the CNI, but it has some
important performance implications.

This commit introduces a new agent option,
--install-no-conntrack-iptables-rules (enabled by default) which
installs -j NOTRACK Iptables rules that match all traffic from and to
the native-routing-cidr CIDR, to prevent netfilter from tracking all
pod-to-pod traffic.

This feature can be only enabled when Cilium is:

  • running in direct routing
  • running in full KPR mode
  • not running its CNI chained on top of other CNI plugins
  • not running in a managed Kubernetes environment such as AKS, EKS or GKE

There are a few reasons for these specific requirements:

  • when running in tunneling mode (and with full KPR), the pod-to-pod
    traffic is encapsulated and decapsulated in the context of the BPF
    datapath program and so it is already skipping the netfilter raw table
    where connection tracking happens.
  • when running in kube-proxy mode conntrack is required to perform NAT
  • we cannot make assumptions on how a generic CNI is going to use
    netfilter conntrack

Signed-off-by: Gilberto Bertin gilberto@isovalent.com

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 9, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Mar 9, 2021
@jibi jibi force-pushed the pr/jibi/disable-ct branch 7 times, most recently from 7bb5d6a to 071f8e8 Compare March 15, 2021 11:24
@jibi jibi added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Mar 15, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 15, 2021
@jibi jibi added area/daemon Impacts operation of the Cilium daemon. area/kube-proxy-free labels Mar 15, 2021
@jibi jibi force-pushed the pr/jibi/disable-ct branch 6 times, most recently from 4d59825 to 3e3b03e Compare March 18, 2021 10:48
@jibi jibi marked this pull request as ready for review March 18, 2021 13:38
@jibi jibi requested review from a team March 18, 2021 13:38
@jibi jibi requested review from a team as code owners March 18, 2021 13:38
@jibi jibi requested review from a team and nathanjsweet March 18, 2021 13:38
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits.

Documentation/operations/upgrade.rst Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
daemon/cmd/kube_proxy_replacement.go Outdated Show resolved Hide resolved
Copy link
Member

@tgraf tgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the requirements on not running in managed k8s.

EKS: Native ENI (direct routing) + KPR in probe. So we should probably enable if KPR gets enabled?
GKE: Native routing + k8s IPAM + KPR
AKS: Azure IPAM (direct routing) + KPR

So assuming that the kernel image is recent enough for KPR, we should be able to enable it on all of them?

pkg/defaults/defaults.go Outdated Show resolved Hide resolved
@jibi jibi force-pushed the pr/jibi/disable-ct branch 2 times, most recently from 8a798c6 to b64afe4 Compare March 23, 2021 16:05
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
When running in KPR mode, all pod-to-pod traffic goes through connection
tracking twice: the first time traffic is tracked by netfilter and the
second one by the BPF datapath.

This is suboptimal as doing conntrack at the netfilter layer is not
strictly required for the operation of the CNI, but it has some
important performance implications.

This commit introduces a new agent option,
--install-no-conntrack-iptables-rules (disabled by default) which
installs `-j NOTRACK` Iptables rules that match all traffic from and to
the native-routing-cidr CIDR, to prevent netfilter from tracking all
pod-to-pod traffic.

This feature can be only enabled when Cilium is:

* running in direct routing
* running in full KPR mode
* not running its CNI chained on top of other CNI plugins
* not running in a managed Kubernetes environment such as AKS, EKS or GKE

There are a few reasons for these specific requirements:

* when running in tunneling mode (and with full KPR), the pod-to-pod
  traffic is encapsulated and decapsulated in the context of the BPF
  datapath program and so it is already skipping the netfilter raw table
  where connection tracking happens.
* when running in kube-proxy mode conntrack is required to perform NAT
* we cannot make assumptions on how a generic CNI is going to use
  netfilter conntrack

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi
Copy link
Member Author

jibi commented Mar 23, 2021

test-me-please

@jibi
Copy link
Member Author

jibi commented Mar 24, 2021

retest-runtime

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 24, 2021
@brb brb merged commit 496ce42 into master Mar 24, 2021
1.10.0 automation moved this from In progress to Done Mar 24, 2021
@brb brb deleted the pr/jibi/disable-ct branch March 24, 2021 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants