Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datapath: remove fill-up notification for CT maps #26597

Closed
wants to merge 2 commits into from

Conversation

julianwiedmann
Copy link
Member

@julianwiedmann julianwiedmann commented Jul 3, 2023

With #24378 merged, CT always uses LRU maps. So we no longer have to deal with fill-up of the CT maps.

fc5a3bd ("bpf: signal agent on CT map update error that CT map is full")
added the GC trigger to help on 4.9 kernels (where the CT map is not LRU).

With 4.9 support gone, we always have a LRU map. So this signal would only
get raised for unrelated errors. Remove it.

But keep the SIGNAL_CT_FILL_UP enum around a bit longer for compatibility.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
The datapath is no longer sending this signal.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann julianwiedmann added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. feature/conntrack labels Jul 3, 2023
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann changed the title 1.15 ct fill up datapath: remove fill-up notification for CT maps Jul 3, 2023
@julianwiedmann julianwiedmann marked this pull request as ready for review July 3, 2023 13:56
@julianwiedmann julianwiedmann requested review from a team as code owners July 3, 2023 13:56
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few, non-blocking nits below from my side

@@ -30,7 +30,7 @@ type SignalType uint32
const (
// SignalNatFillUp denotes potential congestion on the NAT table
SignalNatFillUp SignalType = iota
// SignalCTFillUp denotes potential congestion on the CT table
// SignalCTFillUp denotes potential congestion on the CT table. Unused.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe expand here, “Unused, given that the datapath no longer sends this signal”.

pkg/signal/signal_test.go Show resolved Hide resolved
Copy link
Contributor

@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 5, 2023
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LRU map updates can get contended leading to this signal, we saw it in one production environment before. I can't share the details publicly but feel free to reach out to me for a private run-down.

I discussed this publicly a bit during LPC last year as well:
https://lpc.events/event/16/contributions/1368/

@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 5, 2023
@joestringer
Copy link
Member

joestringer commented Jul 5, 2023

Julian and I discussed this a bit out-of-band. My perspective on this is that the signal itself is high value with no noise when triggered with a LRU-backed connection tracking map. For example, under high load scenarios it triggers relatively rarely with LRU, but if you see this signal then it immediately indicates that there is very high contention. Users can react to this metric my looking for ways to better distribute traffic flows across their cluster.

At the same time, there are other considerations like how userspace reacts to this event. With a CT map backed by a hashtable, this may occur more frequently and reacting to this signal by triggering CT garbage collection likely makes sense. On the other hand, for the LRU case, it's far less likely to trigger and when it does trigger it's likely that the load is very high on the system. Hence in that scenario it may not make sense to trigger GC any more.

@julianwiedmann julianwiedmann marked this pull request as draft July 6, 2023 07:36
@julianwiedmann
Copy link
Member Author

Julian and I discussed this a bit out-of-band. My perspective on this is that the signal itself is high value with no noise when triggered with a LRU-backed connection tracking map. For example, under high load scenarios it triggers relatively rarely with LRU, but if you see this signal then it immediately indicates that there is very high contention. Users can react to this metric my looking for ways to better distribute traffic flows across their cluster.

At the same time, there are other considerations like how userspace reacts to this event. With a CT map backed by a hashtable, this may occur more frequently and reacting to this signal by triggering CT garbage collection likely makes sense. On the other hand, for the LRU case, it's far less likely to trigger and when it does trigger it's likely that the load is very high on the system. Hence in that scenario it may not make sense to trigger GC any more.

thanks Joe for summarizing! Marking as draft for now - seems like we want to keep the signal around, but maybe still adjust whether it should (unconditionally) trigger GC.

@github-actions
Copy link

github-actions bot commented Aug 6, 2023

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Aug 6, 2023
@github-actions
Copy link

This pull request has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/conntrack release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants