-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipcache: Plumb daemon context through IPCache #21676
ipcache: Plumb daemon context through IPCache #21676
Conversation
b07f94f
to
762c1d1
Compare
/test |
The arm64 test failure in Travis CI is fixed in #21681, so can be ignored here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look reasonable, but I have given an alternate suggestion on triggering the ipcache shutdown.
762c1d1
to
97c3bb2
Compare
fd470ee
to
b9eb3d4
Compare
/test |
Plumb the primary daemon context into the label injector logic and the async prefix releaser in order to reduce the likelihood of issues where this logic continues to execute beyond the lifetime of the agent. Found during unit testing, where the identity allocator would be closed while the ipcache's garbage collection of identities was still running. Suggested-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
b9eb3d4
to
64977c4
Compare
Implement a Shutdown() function for the IPCache which shuts down the goroutines associated with the IPcache and ensures they are completed before returning. This way, other components should not continue to cause the IPCache to asynchronously continue to execute work. This should hopefully fix issues in unit tests where the IPCache defers some work that interacts with the identityAllocator after it has been cleaned up. Suggested-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
64977c4
to
ba78940
Compare
Travis job hit #21730 . Running the full suite now. |
/test |
EKS cluster creation failed, causing ConformanceEKS and Conformance AWS-CNI workflows to fail. |
/ci-eks |
/ci-awscni |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shutdown logic looks much more deterministic now. Thanks for the revisions.
I've posted a question about the plumbed context, but the fix looks good to me.
Edit: The travis failure is in a clustermesh test that's probably unrelated to the changes. Mainly, the ipcache/gc* unit test has passed. Do we need to trigger the Travis test couple of times?
if c != nil && c.Context != nil { | ||
ctx = c.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the context nil
check is mainly added for unit tests? Can we not pass the config.Context
in unit tests as well for consistency? This will make it clear that the ipcache module expects a context to be passed, and that the internal logic checks if this context is done (I haven't checked this part though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit, I was avoiding this mainly because I knew there was a bit of tech debt there. But as it turns out, we can do some clean additional separation of packages / components by following this to the logical conclusion.
I'll open a follow up PR to review the solution to this since it spreads across a whole bunch of packages and the issue being fixed in this PR is currently affecting the master branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #21774 for the follow up.
Plumb the primary daemon context into the label injector logic and the
async prefix releaser in order to reduce the likelihood of issues where
this logic continues to execute beyond the lifetime of the agent.
Implement a Shutdown() function for the IPCache which shuts down the
goroutines associated with the IPcache and ensures they are completed
before returning. This way, other components should not continue to
cause the IPCache to asynchronously continue to execute work. This
should hopefully fix issues in unit tests where the IPCache defers some
work that interacts with the identityAllocator after it has been cleaned
up.
Found during unit testing, where the identity allocator would be closed
while the ipcache's garbage collection of identities was still running.
Suggested-by: Jussi Maki jussi@isovalent.com
Suggested-by: Aditi Ghag aditi@cilium.io