Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Add option conntrackGCMaxInterval #27870

Merged

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Sep 1, 2023

When ToFQDN policies are in use (CIDR) identities are created for each IP in a DNS response that matches a ToFQDN policy. These identities are garbage collected when

  1. all endpoints with ToFQDN policies are removed
  2. the maximum number of IPs per host is reached (tofqdn-max-ips-per-host)
  3. when the identity has been unused and not refreshed by conntrack map GC

Problems arise when he conntrack GC interval becomes very long. If it's backed by a LRU BPF map, the maximum is set to 12 hours (defaults.ConntrackGCMaxLRUInterval), meaning it will take 12 hours before unused identities are marked dead and collected (unless tofqdn-max-ips-per-host limit is reached for the FQDN entry).

To allow user to have more control over this add the conntrackGCMaxInterval option that will allow limiting the maximum interval to something less than the 12 hours.

Add option conntrackGCMaxInterval to allow limiting the maximum connection tracking GC interval. By default the automatic interval calculation may increase the interval up to 12 hours, which may incur an unreasonable delay to releasing of CIDR identities created from ToFQDN policies. Setting this option will limit the interval and ensure such identities are marked unused earlier and removed.

@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Sep 1, 2023
@joamaki joamaki force-pushed the pr/joamaki/conntrack-gc-interval-max branch 2 times, most recently from a267177 to 731e9f8 Compare September 1, 2023 10:33
@joamaki joamaki marked this pull request as ready for review September 4, 2023 07:35
@joamaki joamaki requested review from a team as code owners September 4, 2023 07:35
@joamaki joamaki force-pushed the pr/joamaki/conntrack-gc-interval-max branch from 731e9f8 to 7af1eed Compare September 4, 2023 07:45
@joamaki joamaki requested a review from a team as a code owner September 4, 2023 07:45
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@joamaki joamaki force-pushed the pr/joamaki/conntrack-gc-interval-max branch 2 times, most recently from e0d81f1 to 045ccb9 Compare September 5, 2023 13:11
@joamaki
Copy link
Contributor Author

joamaki commented Sep 5, 2023

/test

@joamaki joamaki added needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Sep 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.2 Sep 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.7 Sep 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.14 Sep 5, 2023
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small grammar nit, but otherwise LGTM.

One suggestion I do have, which isn't necessary but would be nice, is somehow changing the value for the conntrackGCMaxInterval to the actual default that Cilium will use if it isn't set. Having a default value for 0s is a bit confusing.

install/kubernetes/cilium/values.yaml.tmpl Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
@joamaki joamaki force-pushed the pr/joamaki/conntrack-gc-interval-max branch from f6efccd to a5f95cd Compare September 12, 2023 06:58
@joamaki
Copy link
Contributor Author

joamaki commented Sep 12, 2023

/test

When ToFQDN policies are in use (CIDR) identities are created for each IP
in a DNS response that matches a ToFQDN policy. These identities are garbage
collected when

  1) all endpoints with ToFQDN policies are removed
  2) the maximum number of IPs per host is reached (tofqdn-max-ips-per-host)
  3) when the identity has been unused and not refreshed by conntrack map GC

Problems arise when he conntrack GC interval becomes very long. If it's backed
by a LRU BPF map, the maximum is set to 12 hours (defaults.ConntrackGCMaxLRUInterval),
meaning it will take 12 hours before unused identities are marked dead and collected
(unless tofqdn-max-ips-per-host limit is reached for the FQDN entry).

To allow user to have more control over this add the conntrackGCMaxInterval option
that will allow limiting the maximum interval to something less than the 12 hours.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/conntrack-gc-interval-max branch from a5f95cd to 5ad0e56 Compare September 12, 2023 07:23
@joamaki
Copy link
Contributor Author

joamaki commented Sep 12, 2023

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 12, 2023
@julianwiedmann julianwiedmann merged commit 3ea5e8c into cilium:main Sep 12, 2023
59 of 60 checks passed
@joamaki joamaki mentioned this pull request Sep 15, 2023
6 tasks
@doniacld doniacld mentioned this pull request Sep 22, 2023
10 tasks
@doniacld doniacld added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Sep 22, 2023
@joamaki joamaki added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed needs-backport/1.12 labels Sep 26, 2023
@giorio94 giorio94 mentioned this pull request Sep 26, 2023
22 tasks
@giorio94 giorio94 added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Sep 26, 2023
@aanm aanm added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Sep 29, 2023
@jrajahalme jrajahalme moved this from Needs backport from main to Backport done to v1.13 in 1.13.8 Oct 17, 2023
@jrajahalme jrajahalme removed this from Needs backport from main in 1.12.15 Oct 17, 2023
@jrajahalme jrajahalme removed this from Needs backport from main in 1.14.3 Oct 17, 2023
squeed added a commit to squeed/cilium that referenced this pull request Feb 23, 2024
The FQDN proxy will GC IP addresses that are both:
- past their DNS TTLs, and
- no longer in use by active connections

However, many applications do not immediately re-resolve names between
connections, even if the TTL has expired. This can cause traffic to be
dropped unexpectedly.

Previously, this was not an issue, as FQDN GC happened very rarely. This
has been improved, however, by cilium#27572 and cilium#27870. Now, end-users
occasionally being surprised by this.

This sets the default grace period to 60 seconds, in an attempt to find
a good balance between application needs and security.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. area/fqdn Affects the FQDN policies feature backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. feature/conntrack ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.13.8
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet