-
Notifications
You must be signed in to change notification settings - Fork 260
[NPM] support network policy controller and its unit tests #849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPM] support network policy controller and its unit tests #849
Conversation
Codecov Report
@@ Coverage Diff @@
## master #849 +/- ##
==========================================
+ Coverage 42.10% 42.51% +0.40%
==========================================
Files 158 158
Lines 15025 15138 +113
==========================================
+ Hits 6327 6436 +109
+ Misses 7926 7913 -13
- Partials 772 789 +17 |
…l and uninstall default Azure NPM chain.
…more test cases (e.g., update and adding multiple network policies)
…work policies. Update unit tests to test more network policies events
npm/networkPolicyController.go
Outdated
| clientset: clientset, | ||
| netPolLister: npInformer.Lister(), | ||
| netPolListerSynced: npInformer.Informer().HasSynced, | ||
| workqueue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "NetPols"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand netpols to NetworkPolicy while naming the WQ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
npm/networkPolicyController.go
Outdated
|
|
||
| npInformer.Informer().AddEventHandler( | ||
| cache.ResourceEventHandlerFuncs{ | ||
| AddFunc: netPolController.addNetPol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets try to have function named too as AddNetworkPolicy or UpdateNetworkPOlicy. Netpol is our lingo and may confuse some folks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. For consistency, I also changed deleteNetPol to deleteNetworkPolicy. deleteNetworkPolicy -> cleanUpNetworkPolicy
npm/networkPolicyController.go
Outdated
|
|
||
| // (TODO): Reduce scope of lock later | ||
| c.npMgr.Lock() | ||
| defer c.npMgr.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can optimize this lock from defer to explicitly mentioning below RawNpMap exists check. thta way we are not locking when Add is being performed. Probably we can expand this to other controllers later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Another important thing is we need to decouple all shared maps (e.g., RawNpMap, etc) from npm.go to each controller and then carefully walk through them again. Even we may encapsulate each map with lock as a struct. Let's understand each maps in details and start this process as another PR.
npm/networkPolicyController_test.go
Outdated
| f.t.Errorf("ipSetSave failed @ ipsMgr.Save") | ||
| } | ||
| } | ||
| func (f *netPolFixture) ipSetRestore(ipsetConfigFile string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can reuse ipsetsave and restore from other tests right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. we can. I would suggest we left them separately now and see them after reviewing and thinking overall structure of our codes since as we discussed, we may decouple each controller into each package later.
npm/networkPolicyController_test.go
Outdated
| f.kubeInformer.Start(stopCh) | ||
| } | ||
|
|
||
| func (f *netPolFixture) ipSetSave(ipsetConfigFile string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this, we will need IpTableSave and IpTableRestore, since each test will be manipulating iptable rules, we should clean them up once done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added two methods for iptables management. Now the methods are not used in networkPolicyController_test.go due to npm_test.go. We will use them later.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…policy cache in a right place. Correct prometheus metric code.
vakalapa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
🚀
…hod parameter in cleanUpNetworkPolicy
npm/networkPolicyController.go
Outdated
| // #1 No item in RawNpMap, which means network policy with the cachedNetPolKey is not applied. Thus, do not need to clean up process. | ||
| // #2 item in RawNpMap exists, which means network policy with the cachedNetPolKey is applied . Thus, Need to clean up process. | ||
| cachedNetPolKey := util.GetNSNameWithPrefix(key) | ||
| _, cachedNetPolObjExists := c.npMgr.RawNpMap[cachedNetPolKey] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to check if the cahedObj exists here ? Because the first check we do in cleanUpnetworkPolicy is if the cachedObj exists, i think we can safely remove this check. Any reason for leaving this check here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Removed the redundant check.
npm/npm.go
Outdated
| }, | ||
| ) | ||
| // create network policy controller | ||
| npMgr.netPolController = NewNetworkPolicyController(informerFactory.Networking().V1().NetworkPolicies(), clientset, npMgr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not replace "informerFactory.Networking().V1().NetworkPolicies()" with npInformer (L213) as they are same ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to remove npInformer from NetworkPolicyManager struct in npm.go. Let me first check whether I can remove npInformer member. If not, I can use npInfomer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npInformer member in NetworkPolicyManager struct is also used in Start method while it may be not necessary to keep the member. I changed it to "npInformer".
| testCases := []expectedNetPolValues{ | ||
| {1, 2, 0, true, true}, | ||
| } | ||
| checkNetPolTestResult("TestAddMultipleNetPols", f, testCases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old nwpolicy_test, we were checking if the counters in Prometheus were correct, we are missing that check here, can we add back checks like below
newGaugeVal, err3 := promutil.GetValue(metrics.NumPolicies)
newCountVal, err4 := promutil.GetCountValue(metrics.AddPolicyExecTime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Will add them.
npm/networkPolicyController.go
Outdated
| } | ||
|
|
||
| // needSync checks whether the obj is valid network policy object. | ||
| func (c *networkPolicyController) needSync(obj interface{}) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This also returns the Key...so shall we rename it to GetNetworkPolicyKey (as the only validation it does is the networkpolicy object and not really if this is a valid update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. applied.
| // if network policy does not have different states against lastly applied states stored in cachedNetPolObj, | ||
| // netPolController does not need to reconcile this update. | ||
| // in this updateNetworkPolicy event, newNetPol was updated with states which netPolController does not need to reconcile. | ||
| if isSameNetworkPolicy(cachedNetPolObj, newNetPol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be called as needSync, but I'm fine with isSameNetworkPolicy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I would like to stick to isSameNetworkPolicy name since it is related to isSamePolicy function in parsePolicy.go and ProcessedNpMap data structure. So, when revisiting this issue, I will consider suggestion.
npm/networkPolicyController.go
Outdated
| // Cache network object first before applying ipsets and iptables. | ||
| // If error happens while applying ipsets and iptables, | ||
| // the key is re-queued in workqueue and process this function again, which eventually meets desired states of network policy | ||
| c.npMgr.RawNpMap[cachedNetPolKey] = netPolObj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need a deepcopy of this here ! we are not removing references from the sharedinformer cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think current code is better in terms of memory and computations. If we use deepcopy network policyhere, we will always have two network policy objects and increase computations for deepcopy operation. But, if we use reference of network policy from sharedinformer cache, it will be either one or two objects eventually and no computation overhead for deepcopy.
| netPolCachedKey := util.GetNSNameWithPrefix(key) | ||
|
|
||
| // (TODO): need to decouple this lock from npMgr if possible | ||
| c.npMgr.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt we take a ReadLock? Same for above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. We can use readlock here, but current NPM code base is using global Mutex. When we decouple global Mutex from our code, we can take your suggestion. It will be different PR.
| // TODO : may consider using "c.queue.AddAfter(key, *requeueAfter)" according to error type later | ||
| if err := c.syncNetPol(key); err != nil { | ||
| // Put the item back on the workqueue to handle any transient errors. | ||
| c.workqueue.AddRateLimited(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.workqueue.AddRateLimited(key) [](start = 3, length = 31)
Why do we need to do that? Simply dont call c.WorkQueue.Done in linie#196 (You should call that only if err!=nil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to put ratelimit when reconcile is failed not to try it again so fast. https://pkg.go.dev/k8s.io/client-go/util/workqueue#RateLimitingInterface
| } | ||
| // Finally, if no error occurs we Forget this item so it does not | ||
| // get queued again until another change happens. | ||
| c.workqueue.Forget(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it is different from c.Workqueue.Done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We call Done here so the workqueue knows we have finished processing this item. We also must remember to call Forget if we do not want this work item being re-queued. For example, we do not call Forget if a transient error occurs, instead the item is put back on the workqueue and attempted again after a back-off period. - Refer to
https://github.com/kubernetes/sample-controller/blob/master/controller.go#L196
| } | ||
|
|
||
| err := func(obj interface{}) error { | ||
| defer c.workqueue.Done(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer c.workqueue.Done(obj) [](start = 2, length = 27)
c.workqueue.Done(obj)
if err != nil {
c.workqueue.AddRateLimited(key)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we agree to stick to current code.
…etrics. Applied better naming and removed redundancy codes.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
* first version of network policy controller and its unit tests * update reconcile and deleteNetworkPolicy function to correctly install and uninstall default Azure NPM chain. * To explicitly manage default Azure NPM chain in deleteNetworkPolicy function * correct comments and delete unused variable * fix missed returing errors in codes * Correct to check DeletionTimestamp and DeletionGracePeriodSeconds variables * removed placeholder functions in network policy controoler and added more test cases (e.g., update and adding multiple network policies) * - applied comments (use explict names, locating lock in a better place) * add two methods to save and restore iptables in unit test * comment out unused function * early filter in updateNetworkPolicy function if they are the same network policies. Update unit tests to test more network policies events * - start using klog package instead of log package * remove unneeded defer for lock * Locate of adding and deleting network policy object from our network policy cache in a right place. Correct prometheus metric code. * use cached network policy key instead of network policy object as method parameter in cleanUpNetworkPolicy * remove redundant check * Remove ns- prefix as key in RawNpMap. Update UT to check prometheus metrics. Applied better naming and removed redundancy codes. * minor update for varialbe names * remove dependency between UT by re-initializing metrics. Correct message.
Reason for Change:
This PR is mainly to add network policy controller to make dealing with network policy event reliable.
Add unit tests for network policy controller to test various network policy events.
Cleans up previous codes (e.g., not used logics).
Put comments of existing and new codes as many as possible.
Issue Fixed:
Requirements:
Notes: