Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the restriction that an ACNP referred ClusterGroup must be created before the ACNP #2478

Merged
merged 4 commits into from Aug 26, 2021

Conversation

Dyanngg
Copy link
Contributor

@Dyanngg Dyanngg commented Jul 27, 2021

This is the second of two PRs that relaxes admission validations for ClusterGroup references. This PR removes the restriction that a ClusterGroup must exist before it can be referred to in Antrea ClusterNetworkPolicies. It also allows a ClusterGroup to be deleted even if it is still referred to by an ACNP.

This PR also refactors the way test resources are created in testSteps of antreapolicy_test.go. Since resource creation/deletion dependencies are no longer enforced, e2e testcases can simply use an ordered list of test resources specification to test different scenarios (i.e. ACNP is created before its referred CG and vice versa).

@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2021

Codecov Report

Merging #2478 (9a44855) into main (1d148ad) will decrease coverage by 1.23%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2478      +/-   ##
==========================================
- Coverage   60.58%   59.34%   -1.24%     
==========================================
  Files         286      287       +1     
  Lines       23096    23107      +11     
==========================================
- Hits        13993    13714     -279     
- Misses       7644     7956     +312     
+ Partials     1459     1437      -22     
Flag Coverage Δ
e2e-tests ∅ <ø> (?)
kind-e2e-tests 45.37% <0.00%> (-2.26%) ⬇️
unit-tests 41.51% <33.33%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...g/controller/networkpolicy/clusternetworkpolicy.go 65.11% <ø> (-0.02%) ⬇️
pkg/controller/networkpolicy/validate.go 21.51% <ø> (+0.95%) ⬆️
pkg/controller/networkpolicy/clustergroup.go 67.59% <33.33%> (+1.85%) ⬆️
...g/agent/apiserver/handlers/featuregates/handler.go 0.00% <0.00%> (-82.36%) ⬇️
...kg/apiserver/registry/system/supportbundle/rest.go 19.54% <0.00%> (-55.18%) ⬇️
pkg/support/dump_others.go 0.00% <0.00%> (-51.73%) ⬇️
pkg/support/dump.go 8.13% <0.00%> (-49.60%) ⬇️
...g/agent/apiserver/handlers/addressgroup/handler.go 0.00% <0.00%> (-40.00%) ⬇️
...agent/apiserver/handlers/appliedtogroup/handler.go 0.00% <0.00%> (-40.00%) ⬇️
pkg/apiserver/handlers/loglevel/handler.go 0.00% <0.00%> (-38.47%) ⬇️
... and 48 more

@Dyanngg Dyanngg force-pushed the remove-acnp-cg-ref-restriction branch from 9320e68 to c1733da Compare July 27, 2021 22:34
@Dyanngg Dyanngg changed the title [WIP] Remove the restriction that an ACNP referred ClusterGroup must be created before the ACNP Remove the restriction that an ACNP referred ClusterGroup must be created before the ACNP Jul 27, 2021
@Dyanngg Dyanngg force-pushed the remove-acnp-cg-ref-restriction branch from c1733da to f1de0be Compare July 27, 2021 22:40
@Dyanngg Dyanngg requested a review from abhiraut July 27, 2021 22:40
@Dyanngg Dyanngg added this to the Antrea v1.3 release milestone Jul 27, 2021
@Dyanngg Dyanngg added the area/network-policy/lifecycle Issues or PRs related to the network policy lifecycle. label Jul 27, 2021
@Dyanngg Dyanngg force-pushed the remove-acnp-cg-ref-restriction branch from f1de0be to fd5faea Compare July 27, 2021 23:52
pkg/controller/networkpolicy/validate.go Show resolved Hide resolved
if err != nil {
klog.Errorf("Unable to delete internal Group %s from store: %v", key, err)
}
n.triggerParentGroupSync(grp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have been triggering CNP updates from workers.. is it possible to do so in syncInternalGroup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be very tricky to do so in syncInternalGroup, in case of deletion. If a group key is processed by syncInternalGroup, the controller would expect the internal group still exist in the internal group store, otherwise it won't do anything. So in this case, if we want to rely on syncInternalGroup for parent group syncs, we would need to delete the internal group from the internal store after syncInternalGroup. However, at the time of parent group sync we also want to ensure that the childGroup is deleted from internal store, so that we don't include outdated members. Since this process is async, it's hard for us to ensure that syncInternalGroup of the parent happens after the internal group deletion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm im worried this comes in the path of the synchronous Delete handlers and may cause perf issue. maybe add a TODO comment here to explain why its done here now and perhaps we should look in to moving this to the workers. let me also dig deeper if we can have an alternative here..

Copy link
Contributor

@abhiraut abhiraut Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you plan to add a TODO here? more important than ParentGroup my comment is on the triggerCNPUpdates; as that's the one i presume would be time consuming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. Added TODO.

@Dyanngg Dyanngg force-pushed the remove-acnp-cg-ref-restriction branch from fd5faea to 6106fad Compare July 30, 2021 18:47
@Dyanngg Dyanngg requested a review from abhiraut August 4, 2021 01:11
@Dyanngg Dyanngg force-pushed the remove-acnp-cg-ref-restriction branch from 6106fad to 021c563 Compare August 4, 2021 01:19
@abhiraut abhiraut requested a review from tnqn August 6, 2021 20:26
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Aug 6, 2021

/test-all

@Dyanngg Dyanngg force-pushed the remove-acnp-cg-ref-restriction branch from 021c563 to 40a106f Compare August 10, 2021 20:40
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Aug 10, 2021

/test-all

abhiraut
abhiraut previously approved these changes Aug 10, 2021
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Aug 11, 2021

rebased /test-all

@Dyanngg Dyanngg requested a review from abhiraut August 11, 2021 23:41
@Dyanngg Dyanngg force-pushed the remove-acnp-cg-ref-restriction branch from b83346c to 9fea7ce Compare August 12, 2021 00:48
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Aug 12, 2021

/test-all

abhiraut
abhiraut previously approved these changes Aug 12, 2021
@antoninbas
Copy link
Contributor

Are we still planning to merge this for v1.3? If yes, @abhiraut could you rebase and @tnqn do you want to take a look?

@abhiraut
Copy link
Contributor

yup. i will take care of this PR .. let us first merge the static ipblock PR #2577 and then I will rebase all conflicts together

…ated before the ACNP

as well as the restriction on ClusterGroup cannot be deleted if still referred to by an ACNP.
Signed-off-by: Yang Ding <dingyang@vmware.com>
Co-authored-by: abhiraut <rauta@vmware.com>

Signed-off-by: abhiraut <rauta@vmware.com>
@abhiraut
Copy link
Contributor

Are we still planning to merge this for v1.3? If yes, @abhiraut could you rebase and @tnqn do you want to take a look?

done

@abhiraut
Copy link
Contributor

/test-all

// members. Since this process is async, there's no guarantee which happens first. The drawback is
// that all ACNPs that refers to this CG directly will be synchronously processed once as part of
// deletion handler, which could cause perf issue in scale.
c.triggerCNPUpdates(og)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call is unnecessary.

If we want to rely on syncInternalGroup for ACNP syncs, the internal group needs to be deleted from the internal store after syncInternalGroup.

I don't think we need to defer deleting internal group to syncInternalGroup. triggerCNPUpdates actually just requires the group name. The association is stored in CNP store, instead of group store.

However, at the time of ACNP sync, we also want to ensure that the group is deleted from internal store already, so that we don't include outdated members. Since this process is async, there's no guarantee which happens first.

It doesn't matter whether the group is deleted from internal store already when processing ACNP. The only thing we need to ensure is triggering CNP update after syncInternalGroup, so that its data will be correct eventually.

Of course, we need to ensure c.triggerParentGroupSync() and c.triggerCNPUpdates() are called in syncInternalGroup regardless of the existence of the internal group. deferred calls can work. I don't think they really need to return error.

func (c *NetworkPolicyController) syncInternalGroup(key string) error {
	defer c.triggerCNPUpdates(key)
	defer c.triggerParentGroupSync(key)
	
	// Retrieve the internal Group corresponding to this key.
        ...
}

if err != nil {
klog.Errorf("Unable to delete internal Group %s from store: %v", key, err)
}
c.triggerParentGroupSync(grp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as triggerCNPUpdates. Calling it in syncInternalGroup should be good enough. The method doesn't really require the whole group but just the group name.

@abhiraut
Copy link
Contributor

thanks for the review Quan! let me evaluate and push a change based on that

Signed-off-by: abhiraut <rauta@vmware.com>
tnqn
tnqn previously approved these changes Aug 26, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit

parentGroupObjs, err := c.internalGroupStore.GetByIndex(store.ChildGroupIndex, grp)
if err != nil {
klog.Errorf("Error retrieving parents of ClusterGroup %s: %v", grp, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not introduced by this PR, but it should return earlier upon error, just like triggerCNPUpdates, though the following processing is safe to run in this particular case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: abhiraut <rauta@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Aug 26, 2021

/test-all

@tnqn tnqn merged commit d87dee1 into antrea-io:main Aug 26, 2021
antoninbas pushed a commit that referenced this pull request Oct 20, 2022
…ated before the ACNP (#2478)

as well as the restriction on ClusterGroup cannot be deleted if still referred to by an ACNP.

Co-authored-by: abhiraut <rauta@vmware.com>
Signed-off-by: Yang Ding <dingyang@vmware.com>
Signed-off-by: abhiraut <rauta@vmware.com>
(cherry picked from commit d87dee1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/network-policy/lifecycle Issues or PRs related to the network policy lifecycle.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants