New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix triggerCNPUpdates panic #2768
Conversation
When a ClusterGroup is created, updated, or deleted, `triggerCNPUpdates` will be called to update ClusterNetworkPolicies that reference to it. However, a ClusterNetworkPolicy may have not been processed by `addCNP` when they are found in the informer cache, which means its internal NetworkPolicy doesn't exist yet. `triggerCNPUpdates` would panic if it assumes the internal NetworkPolicy always exist and uses the returned value directly. This patch fixes it by checking existence first before using the returned internal NetworkPolicy. Signed-off-by: Quan Tian <qtian@vmware.com>
|
/test-all |
Codecov Report
@@ Coverage Diff @@
## main #2768 +/- ##
==========================================
- Coverage 59.37% 53.29% -6.08%
==========================================
Files 285 285
Lines 23033 23028 -5
==========================================
- Hits 13675 12273 -1402
- Misses 7907 9408 +1501
+ Partials 1451 1347 -104
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
/test-e2e |
| // The internal NetworkPolicy may haven't been created yet. It's fine to skip processing this CNP as addCNP will | ||
| // create it eventually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be possible:
addCNP has already processed the CNP to an internalNP, just hasn't added this internalNP to the internalNetworkPolicyStore. In this case, reprocessCNP will skip processing this CNP and addCNP will just add the "old" internalNP to internalNetworkPolicyStore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! The issue exists regardless of the patch, and not only between triggerCNPUpdate and CNP eventHandlers, but also Namespace eventHandlers and CNP eventHandlers. I tried to fix it in this PR but didn't figure out a simple way: long code would need to be scoped into critical area and code would become quite redundant and hard to maintain. I was wondering if we should move internal NetworkPolicy creation and update to workers and all event handlers should just trigger them, but I know we didn't do that before was because of some reason, perhaps for the reference of AddressGroups and AppliedToGroups.
Maybe we should have a separate issue to discuss the race condition. @abhiraut @GraysonWu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened issue to track this #2794
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @abhiraut
When a ClusterGroup is created, updated, or deleted,
triggerCNPUpdateswill be called to update ClusterNetworkPolicies that reference to it.
However, a ClusterNetworkPolicy may have not been processed by
addCNPwhen they are found in the informer cache, which means its internal
NetworkPolicy doesn't exist yet.
triggerCNPUpdateswould panic if itassumes the internal NetworkPolicy always exist and uses the returned
value directly.
This patch fixes it by checking existence first before using the
returned internal NetworkPolicy.
Signed-off-by: Quan Tian qtian@vmware.com
Fixes #2769