Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed identity cleanup on operator restart #28339

Closed
2 tasks done
skmatti opened this issue Sep 29, 2023 · 6 comments
Closed
2 tasks done

Delayed identity cleanup on operator restart #28339

skmatti opened this issue Sep 29, 2023 · 6 comments
Labels
affects/v1.12 This issue affects v1.12 branch kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Comments

@skmatti
Copy link
Contributor

skmatti commented Sep 29, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

On operator start up, all identities are marked as alive irrespective of whether an identity has the delete annotation

igc.heartbeatStore.markAlive(event.Object.Name, time.Now())

The delete annotation is added on an identity in

identity.Annotations[identitybackend.HeartBeatAnnotation] = timeNow.Format(time.RFC3339Nano)

As the default heartbeat timeout is 30 minutes, an identity that was marked for deletion with an annotation is marked alive on operator start up and delays the cleanup by 30 minutes.

I think an identity with the deletion annotation should not marked alive again. Is there an issue with this?.

Cilium Version

1.12

Kernel Version

not applicable

Kubernetes Version

1.27

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@skmatti skmatti added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Sep 29, 2023
@jonohart
Copy link

The larger problem that this causes is every time the operator restarts, identity GC is delayed again by at least the ID timeout value (30 mins default). The operator will restart it loses leader election due to k8s api calls failing. If this happens often enough, say once per hour, then the operator GC logic keeps delaying the GC of identities further and further in the future which allows identities to accumulate until the policy maps fill up.

@ti-mo ti-mo added affects/v1.12 This issue affects v1.12 branch sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. sig/agent Cilium agent related. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Oct 4, 2023
Copy link

github-actions bot commented Dec 4, 2023

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Dec 4, 2023
Copy link

This issue has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023
@skmatti
Copy link
Contributor Author

skmatti commented Dec 18, 2023

@dlapcevic @aojea Is this addressed with #27752, if so can you open this and reassign to yourself.

If this needs to be handled separately, can you open so others can take a look?. Thanks!

@dlapcevic
Copy link
Contributor

Yes, this will be addressed with #27752.

The plan is to immediately clean up unused Cilium Identities. They will anyway be cleaned up when operator starts.

@dlapcevic
Copy link
Contributor

/assign @dlapcevic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.12 This issue affects v1.12 branch kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

No branches or pull requests

4 participants