-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datapath: remove fill-up notification for CT maps #26597
Conversation
fc5a3bd ("bpf: signal agent on CT map update error that CT map is full") added the GC trigger to help on 4.9 kernels (where the CT map is not LRU). With 4.9 support gone, we always have a LRU map. So this signal would only get raised for unrelated errors. Remove it. But keep the SIGNAL_CT_FILL_UP enum around a bit longer for compatibility. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
The datapath is no longer sending this signal. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few, non-blocking nits below from my side
@@ -30,7 +30,7 @@ type SignalType uint32 | |||
const ( | |||
// SignalNatFillUp denotes potential congestion on the NAT table | |||
SignalNatFillUp SignalType = iota | |||
// SignalCTFillUp denotes potential congestion on the CT table | |||
// SignalCTFillUp denotes potential congestion on the CT table. Unused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd maybe expand here, “Unused, given that the datapath no longer sends this signal”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LRU map updates can get contended leading to this signal, we saw it in one production environment before. I can't share the details publicly but feel free to reach out to me for a private run-down.
I discussed this publicly a bit during LPC last year as well:
https://lpc.events/event/16/contributions/1368/
Julian and I discussed this a bit out-of-band. My perspective on this is that the signal itself is high value with no noise when triggered with a LRU-backed connection tracking map. For example, under high load scenarios it triggers relatively rarely with LRU, but if you see this signal then it immediately indicates that there is very high contention. Users can react to this metric my looking for ways to better distribute traffic flows across their cluster. At the same time, there are other considerations like how userspace reacts to this event. With a CT map backed by a hashtable, this may occur more frequently and reacting to this signal by triggering CT garbage collection likely makes sense. On the other hand, for the LRU case, it's far less likely to trigger and when it does trigger it's likely that the load is very high on the system. Hence in that scenario it may not make sense to trigger GC any more. |
thanks Joe for summarizing! Marking as draft for now - seems like we want to keep the signal around, but maybe still adjust whether it should (unconditionally) trigger GC. |
This pull request has been automatically marked as stale because it |
This pull request has not seen any activity since it was marked stale. |
With #24378 merged, CT always uses LRU maps. So we no longer have to deal with fill-up of the CT maps.