Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpointmanager: fix bpf policy pressure getting stuck. #28185

Merged
merged 1 commit into from
Nov 1, 2023

Conversation

tommyp1ckles
Copy link
Contributor

@tommyp1ckles tommyp1ckles commented Sep 16, 2023

Currently the policy map pressure metric only updates the map pressure metric when a new pressure value that is higher than the current one is set. This means that the metric can only ever go up, so when maps are shrunk (ex. such as after doing an cilium fqdn cache clean) the metric never goes down.

This changes the behaviour of the metric to maintain a map of map pressure values. When the trigger is invoked, it iterates all values and finds the max - updating the map_pressure gauge for policymaps to the max value. Endpoints that are shut down have their values removed.

CC: @christarazi

Note: This is part of a two part fix for endpoint policy map pressure, for backport purposes the other change was split into a seperate PR: #28184

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 16, 2023
@tommyp1ckles tommyp1ckles marked this pull request as ready for review September 16, 2023 04:07
@tommyp1ckles tommyp1ckles requested a review from a team as a code owner September 16, 2023 04:07
@tommyp1ckles tommyp1ckles added release-note/bug This PR fixes an issue in a previous release of Cilium. area/metrics Impacts statistics / metrics gathering, eg via Prometheus. labels Sep 16, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 16, 2023
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Just one comment.

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved
@aditighag aditighag removed their request for review September 18, 2023 22:03
@tommyp1ckles tommyp1ckles force-pushed the pr/tp/fix-policy-map-pressure-max branch from 82aa9c9 to 05aa798 Compare September 20, 2023 03:43
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/fix-policy-map-pressure-max branch from 05aa798 to 3f73d45 Compare September 25, 2023 14:04
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/fix-policy-map-pressure-max branch from 3f73d45 to efd638c Compare September 26, 2023 03:42
@tommyp1ckles
Copy link
Contributor Author

/test

pkg/endpointmanager/policymap_pressure.go Outdated Show resolved Hide resolved
@christarazi christarazi added the area/daemon Impacts operation of the Cilium daemon. label Sep 28, 2023
@github-actions
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Oct 28, 2023
@christarazi
Copy link
Member

Let's get this PR over the finish line @tommyp1ckles, it's basically there. Just some Go linter errors to fix up.

@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Oct 31, 2023
@tommyp1ckles
Copy link
Contributor Author

@christarazi woops yeah forgot about this one, let me get it ready

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/fix-policy-map-pressure-max branch from efd638c to 9e3890a Compare October 31, 2023 04:39
@tommyp1ckles
Copy link
Contributor Author

/test

Currently the policy map pressure metric only updates the map pressure
metric when a new pressure value that is higher than the current one is
set. This means that the metric can only ever go up, so when maps are
shrunk (ex. such as after doing an cilium fqdn cache clean) the metric
never goes down.

This changes the behavior of the metric to maintain a map of map
pressure values. When the trigger is invoked, it iterates all values and
finds the max - updating the map_pressure gauge for policymaps to the
max value. Endpoints that are shut down have their values removed.

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
@tommyp1ckles tommyp1ckles force-pushed the pr/tp/fix-policy-map-pressure-max branch from 9e3890a to 5bf6820 Compare October 31, 2023 06:08
@tommyp1ckles
Copy link
Contributor Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 31, 2023
@aanm aanm merged commit 28ce005 into cilium:main Nov 1, 2023
60 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. area/metrics Impacts statistics / metrics gathering, eg via Prometheus. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants