New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix FQDN memory leak #17432
Fix FQDN memory leak #17432
Conversation
The public function ForceExpiredByNames is not executed from anywhere so this function can be safely removed. Signed-off-by: André Martins <andre@cilium.io>
In the FQDN architecture there's a DNS Cache per endpoint, used to track which domain names each endpoint makes DNS requests, and a global DNS Cache where its main functionality is to help tracking which api.FQDNSelector present in the policy applies to locally running endpoints. The latter, as opposed to the former, didn't have any cleanup mechanism for the map that tracked which entries should be garbage collected, making the global DNS Cache to grow. This commit prevents those entries from being tracked for Garbage Collection in the global DNS Cache. Signed-off-by: André Martins <andre@cilium.io>
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check that I understood this correctly:
- The memory leak was due to the "cleanup" map growing without bounds due to the default DNS cache not running a GC on the cache
- The default DNS cache only contains a few entries which is on the order of number of local endpoints
- These entries are cleaned up when the endpoints change, but we were left with the "cleanup" map entries?
If I understood correctly, then LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spotting! I have some reservations about the temporary fix put in in this PR. /cc @jrajahalme who might have more context.
Edit: Also, is this a regression in recent releases? I'm surprised that the issue wasn't reported until now.
yes
not quite, from my experiments it contains as many entries as the number of DNS requests performed by each endpoint
Correct, that is my understanding as well.
@joamaki replied inline. Let me know if this doesn't answer your questions. |
@aditighag I don't believe it's a regression since this code exists since the feature was introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change (depending on my understanding) LGTM. One minor nit on the naming.
Similar to Aditi's comments though, why is this particular change safe. AFAIU, we are setting the map to nil so that we never insert entries into it, effectively disabling the DNS cache for expired FQDNs altogether. Is that safe? I think this is probably what @aditighag was trying to get at.
I assume that it's safe, but would be good to document the "why" in the commit msg.
@@ -160,6 +160,12 @@ func NewDNSCacheWithLimit(minTTL int, limit int) *DNSCache { | |||
return c | |||
} | |||
|
|||
func (c *DNSCache) DisableCleanupTrack() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (c *DNSCache) DisableCleanupTrack() { | |
func (c *DNSCache) DisableCleanupTracking() { |
@christarazi AFAICT we are not disabling the DNS cache for expired FQDNs, we are not tracking which entries should be cleaned up by the GC because these entries are cleaned up when the endpoints change.
|
In the FQDN architecture there's a DNS Cache per endpoint, used to track
which domain names each endpoint makes DNS requests, and a global DNS
Cache where its main functionality is to help tracking which
api.FQDNSelector present in the policy applies to locally running
endpoints. The latter, as opposed to the former, didn't have any
cleanup mechanism for the map that tracked which entries should be
garbage collected, making the global DNS Cache to grow.
This commit prevents those entries from being tracked for Garbage
Collection in the global DNS Cache.
Fixes #16300