Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fqdn: use map to dedup to reduce memory usage of dns gc job #25142

Merged
merged 1 commit into from May 9, 2023

Conversation

odinuge
Copy link
Member

@odinuge odinuge commented Apr 26, 2023

On some host we have a lot of DNS lookups, and this job allocates a lot of memory and burn a lot of cpu. We often see loglines like these; "FQDN garbage collector work deleted 16211 name entries"

This will now dedup using a map directly instead of deduping via merging slices. It doesn't seem like the code care about the order, so I think this should be fine. It would be nice to benchmark this, but its a bit difficult in the way the job is set up. Happy to iterate on that after we can test this.

Peek from a pprof, where this comes on top (running cilium v1.11.13);

Type: alloc_space
Showing nodes accounting for 24333619.24MB, 100% of 24333619.24MB total ----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                      4686095.18MB 95.24% |   github.com/cilium/cilium/daemon/cmd.(*Daemon).bootstrapFQDN.func1 /go/src/github.com/cilium/cilium/daemon/cmd/fqdn.go:236
                                       230562.07MB  4.69% |   github.com/cilium/cilium/daemon/cmd.(*Daemon).bootstrapFQDN.func1 /go/src/github.com/cilium/cilium/daemon/cmd/fqdn.go:246
                                         2609.33MB 0.053% |   github.com/cilium/cilium/pkg/fqdn.(*DNSCache).cleanupExpiredEntries /go/src/github.com/cilium/cilium/pkg/fqdn/cache.go:256
                                          997.05MB  0.02% |   github.com/cilium/cilium/daemon/cmd.(*Daemon).bootstrapFQDN.func1 /go/src/github.com/cilium/cilium/daemon/cmd/fqdn.go:250
                                          175.82MB 0.0036% |   github.com/cilium/cilium/pkg/fqdn.(*DNSCache).GC /go/src/github.com/cilium/cilium/pkg/fqdn/cache.go:339
4920439.45MB 20.22% 20.22% 4920439.45MB 20.22%                | github.com/cilium/cilium/pkg/fqdn.KeepUniqueNames /go/src/github.com/cilium/cilium/pkg/fqdn/helpers.go:127

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #issue-number

fqdn: use map to dedup to reduce memory usage of dns gc job

@odinuge odinuge requested a review from a team as a code owner April 26, 2023 13:23
@odinuge odinuge requested a review from thorn3r April 26, 2023 13:23
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 26, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Apr 26, 2023
@odinuge
Copy link
Member Author

odinuge commented Apr 26, 2023

cc @christarazi mind triggering test to verify that the logic is sound?

@christarazi
Copy link
Member

christarazi commented Apr 26, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-4.19' failed:

Click to show.

Test Name

K8sAgentFQDNTest Validate that FQDN policy continues to work after being updated

Failure Output

FAIL: Cannot install fqdn proxy policy

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-4.19/169/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@thorn3r thorn3r added the release-note/misc This PR makes changes that have no direct user impact. label May 3, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 3, 2023
@thorn3r
Copy link
Contributor

thorn3r commented May 3, 2023

@odinuge thanks for the PR! this seems like a pretty straight-forward improvement.
Would you mind squashing your commits and rebasing/fixing conflicts?

@odinuge
Copy link
Member Author

odinuge commented May 4, 2023

Hi! Yes, I'll rebase and squash tomorrow morning. Thanks

On some host we have a lot of DNS lookups, and this job allocates a lot
of memory and burn a lot of cpu. We often see loglines like these;
"FQDN garbage collector work deleted 16211 name entries"

This will now dedup using a map directly instead of deduping via merging
slices.

Peek from a pprof, where this comes on top (running cilium v1.11.13);
Type: alloc_space
Showing nodes accounting for 24333619.24MB, 100% of 24333619.24MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                      4686095.18MB 95.24% |   github.com/cilium/cilium/daemon/cmd.(*Daemon).bootstrapFQDN.func1 /go/src/github.com/cilium/cilium/daemon/cmd/fqdn.go:236
                                       230562.07MB  4.69% |   github.com/cilium/cilium/daemon/cmd.(*Daemon).bootstrapFQDN.func1 /go/src/github.com/cilium/cilium/daemon/cmd/fqdn.go:246
                                         2609.33MB 0.053% |   github.com/cilium/cilium/pkg/fqdn.(*DNSCache).cleanupExpiredEntries /go/src/github.com/cilium/cilium/pkg/fqdn/cache.go:256
                                          997.05MB  0.02% |   github.com/cilium/cilium/daemon/cmd.(*Daemon).bootstrapFQDN.func1 /go/src/github.com/cilium/cilium/daemon/cmd/fqdn.go:250
                                          175.82MB 0.0036% |   github.com/cilium/cilium/pkg/fqdn.(*DNSCache).GC /go/src/github.com/cilium/cilium/pkg/fqdn/cache.go:339
4920439.45MB 20.22% 20.22% 4920439.45MB 20.22%                | github.com/cilium/cilium/pkg/fqdn.KeepUniqueNames /go/src/github.com/cilium/cilium/pkg/fqdn/helpers.go:127

Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
@odinuge
Copy link
Member Author

odinuge commented May 9, 2023

Should be good to go now thorn3r

@thorn3r
Copy link
Contributor

thorn3r commented May 9, 2023

/test

Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 9, 2023
@christarazi christarazi merged commit b5994af into cilium:main May 9, 2023
57 checks passed
@christarazi christarazi added area/daemon Impacts operation of the Cilium daemon. kind/performance There is a performance impact of this. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels May 9, 2023
@tklauser tklauser added affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch area/daemon Impacts operation of the Cilium daemon. kind/community-contribution This was a contribution made by a community member. kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants