Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix triggerCNPUpdates panic #2768

Merged
merged 1 commit into from Sep 17, 2021
Merged

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Sep 13, 2021

When a ClusterGroup is created, updated, or deleted, triggerCNPUpdates
will be called to update ClusterNetworkPolicies that reference to it.
However, a ClusterNetworkPolicy may have not been processed by addCNP
when they are found in the informer cache, which means its internal
NetworkPolicy doesn't exist yet. triggerCNPUpdates would panic if it
assumes the internal NetworkPolicy always exist and uses the returned
value directly.

This patch fixes it by checking existence first before using the
returned internal NetworkPolicy.

Signed-off-by: Quan Tian qtian@vmware.com

Fixes #2769

When a ClusterGroup is created, updated, or deleted, `triggerCNPUpdates`
will be called to update ClusterNetworkPolicies that reference to it.
However, a ClusterNetworkPolicy may have not been processed by `addCNP`
when they are found in the informer cache, which means its internal
NetworkPolicy doesn't exist yet. `triggerCNPUpdates` would panic if it
assumes the internal NetworkPolicy always exist and uses the returned
value directly.

This patch fixes it by checking existence first before using the
returned internal NetworkPolicy.

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn tnqn requested a review from abhiraut September 13, 2021 16:39
@tnqn
Copy link
Member Author

tnqn commented Sep 13, 2021

/test-all

@codecov-commenter
Copy link

Codecov Report

Merging #2768 (f656ab2) into main (0867867) will decrease coverage by 6.07%.
The diff coverage is 46.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2768      +/-   ##
==========================================
- Coverage   59.37%   53.29%   -6.08%     
==========================================
  Files         285      285              
  Lines       23033    23028       -5     
==========================================
- Hits        13675    12273    -1402     
- Misses       7907     9408    +1501     
+ Partials     1451     1347     -104     
Flag Coverage Δ
kind-e2e-tests 32.39% <0.00%> (-14.14%) ⬇️
unit-tests 41.55% <46.51%> (+0.55%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...g/controller/networkpolicy/clusternetworkpolicy.go 68.84% <40.00%> (+3.72%) ⬆️
...ntroller/networkpolicy/networkpolicy_controller.go 71.60% <45.94%> (-9.31%) ⬇️
pkg/controller/networkpolicy/clustergroup.go 75.62% <100.00%> (+8.02%) ⬆️
pkg/controller/types/networkpolicy.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/controller/networkpolicy/packetin.go 4.50% <0.00%> (-67.50%) ⬇️
pkg/apis/controlplane/v1beta2/helper.go 33.33% <0.00%> (-66.67%) ⬇️
...formers/externalversions/security/v1alpha1/tier.go 0.00% <0.00%> (-64.29%) ⬇️
...ers/externalversions/core/v1alpha2/clustergroup.go 0.00% <0.00%> (-64.29%) ⬇️
...s/externalversions/core/v1alpha2/externalentity.go 0.00% <0.00%> (-64.29%) ⬇️
...xternalversions/security/v1alpha1/networkpolicy.go 0.00% <0.00%> (-64.29%) ⬇️
... and 82 more

@tnqn
Copy link
Member Author

tnqn commented Sep 14, 2021

/test-e2e

Comment on lines +135 to +136
// The internal NetworkPolicy may haven't been created yet. It's fine to skip processing this CNP as addCNP will
// create it eventually.
Copy link
Contributor

@GraysonWu GraysonWu Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be possible:
addCNP has already processed the CNP to an internalNP, just hasn't added this internalNP to the internalNetworkPolicyStore. In this case, reprocessCNP will skip processing this CNP and addCNP will just add the "old" internalNP to internalNetworkPolicyStore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The issue exists regardless of the patch, and not only between triggerCNPUpdate and CNP eventHandlers, but also Namespace eventHandlers and CNP eventHandlers. I tried to fix it in this PR but didn't figure out a simple way: long code would need to be scoped into critical area and code would become quite redundant and hard to maintain. I was wondering if we should move internal NetworkPolicy creation and update to workers and all event handlers should just trigger them, but I know we didn't do that before was because of some reason, perhaps for the reference of AddressGroups and AppliedToGroups.

Maybe we should have a separate issue to discuss the race condition. @abhiraut @GraysonWu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened issue to track this #2794

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @abhiraut

@tnqn tnqn mentioned this pull request Sep 16, 2021
@tnqn tnqn merged commit 2572021 into antrea-io:main Sep 17, 2021
@tnqn tnqn deleted the triggerCNPUpdates-panic branch September 17, 2021 03:12
@tnqn tnqn added the action/backport Indicates a PR that requires backports. label Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

antrea-controller panics in triggerCNPUpdates
4 participants