Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipcache: Plumb daemon context through IPCache #21676

Merged

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Oct 12, 2022

Plumb the primary daemon context into the label injector logic and the
async prefix releaser in order to reduce the likelihood of issues where
this logic continues to execute beyond the lifetime of the agent.

Implement a Shutdown() function for the IPCache which shuts down the
goroutines associated with the IPcache and ensures they are completed
before returning. This way, other components should not continue to
cause the IPCache to asynchronously continue to execute work. This
should hopefully fix issues in unit tests where the IPCache defers some
work that interacts with the identityAllocator after it has been cleaned
up.

Found during unit testing, where the identity allocator would be closed
while the ipcache's garbage collection of identities was still running.

Suggested-by: Jussi Maki jussi@isovalent.com
Suggested-by: Aditi Ghag aditi@cilium.io

@joestringer joestringer added the release-note/misc This PR makes changes that have no direct user impact. label Oct 12, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 12, 2022
@joestringer joestringer force-pushed the submit/fix-ipcache-gc-on-cleanup branch from b07f94f to 762c1d1 Compare October 12, 2022 01:26
@joestringer
Copy link
Member Author

/test

@joamaki
Copy link
Contributor

joamaki commented Oct 12, 2022

The arm64 test failure in Travis CI is fixed in #21681, so can be ignored here.

Copy link
Contributor

@joamaki joamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@joestringer joestringer marked this pull request as ready for review October 12, 2022 21:34
@joestringer joestringer requested review from a team as code owners October 12, 2022 21:34
Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look reasonable, but I have given an alternate suggestion on triggering the ipcache shutdown.

pkg/ipcache/gc.go Outdated Show resolved Hide resolved
pkg/ipcache/gc.go Outdated Show resolved Hide resolved
pkg/ipcache/gc.go Outdated Show resolved Hide resolved
pkg/ipcache/gc.go Show resolved Hide resolved
@joestringer joestringer force-pushed the submit/fix-ipcache-gc-on-cleanup branch from 762c1d1 to 97c3bb2 Compare October 14, 2022 00:58
@joestringer joestringer marked this pull request as draft October 14, 2022 01:00
@joestringer joestringer force-pushed the submit/fix-ipcache-gc-on-cleanup branch 2 times, most recently from fd470ee to b9eb3d4 Compare October 14, 2022 05:44
@joestringer
Copy link
Member Author

/test

Plumb the primary daemon context into the label injector logic and the
async prefix releaser in order to reduce the likelihood of issues where
this logic continues to execute beyond the lifetime of the agent.

Found during unit testing, where the identity allocator would be closed
while the ipcache's garbage collection of identities was still running.

Suggested-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer force-pushed the submit/fix-ipcache-gc-on-cleanup branch from b9eb3d4 to 64977c4 Compare October 14, 2022 22:02
Implement a Shutdown() function for the IPCache which shuts down the
goroutines associated with the IPcache and ensures they are completed
before returning. This way, other components should not continue to
cause the IPCache to asynchronously continue to execute work. This
should hopefully fix issues in unit tests where the IPCache defers some
work that interacts with the identityAllocator after it has been cleaned
up.

Suggested-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer force-pushed the submit/fix-ipcache-gc-on-cleanup branch from 64977c4 to ba78940 Compare October 14, 2022 22:12
@joestringer joestringer marked this pull request as ready for review October 14, 2022 22:13
@joestringer
Copy link
Member Author

Travis job hit #21730 . Running the full suite now.

@joestringer
Copy link
Member Author

/test

@joestringer
Copy link
Member Author

EKS cluster creation failed, causing ConformanceEKS and Conformance AWS-CNI workflows to fail.

@joestringer
Copy link
Member Author

/ci-eks

@joestringer
Copy link
Member Author

/ci-awscni

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shutdown logic looks much more deterministic now. Thanks for the revisions.
I've posted a question about the plumbed context, but the fix looks good to me.

Edit: The travis failure is in a clustermesh test that's probably unrelated to the changes. Mainly, the ipcache/gc* unit test has passed. Do we need to trigger the Travis test couple of times?

Comment on lines +128 to +129
if c != nil && c.Context != nil {
ctx = c.Context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the context nil check is mainly added for unit tests? Can we not pass the config.Context in unit tests as well for consistency? This will make it clear that the ipcache module expects a context to be passed, and that the internal logic checks if this context is done (I haven't checked this part though).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, I was avoiding this mainly because I knew there was a bit of tech debt there. But as it turns out, we can do some clean additional separation of packages / components by following this to the logical conclusion.

I'll open a follow up PR to review the solution to this since it spreads across a whole bunch of packages and the issue being fixed in this PR is currently affecting the master branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #21774 for the follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants