Skip to content

Conversation

@JungukCho
Copy link
Contributor

@JungukCho JungukCho commented Apr 6, 2021

Reason for Change:

This PR is mainly to add network policy controller to make dealing with network policy event reliable.
Add unit tests for network policy controller to test various network policy events.
Cleans up previous codes (e.g., not used logics).
Put comments of existing and new codes as many as possible.

Issue Fixed:

  1. Retry reconcile loop if there is failed
  2. Fix incorrect return error.

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Apr 6, 2021

Codecov Report

Merging #849 (b7eb15c) into master (cf7d8a5) will increase coverage by 0.40%.
The diff coverage is 46.57%.

@@            Coverage Diff             @@
##           master     #849      +/-   ##
==========================================
+ Coverage   42.10%   42.51%   +0.40%     
==========================================
  Files         158      158              
  Lines       15025    15138     +113     
==========================================
+ Hits         6327     6436     +109     
+ Misses       7926     7913      -13     
- Partials      772      789      +17     

clientset: clientset,
netPolLister: npInformer.Lister(),
netPolListerSynced: npInformer.Informer().HasSynced,
workqueue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "NetPols"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand netpols to NetworkPolicy while naming the WQ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack


npInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: netPolController.addNetPol,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets try to have function named too as AddNetworkPolicy or UpdateNetworkPOlicy. Netpol is our lingo and may confuse some folks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. For consistency, I also changed deleteNetPol to deleteNetworkPolicy. deleteNetworkPolicy -> cleanUpNetworkPolicy


// (TODO): Reduce scope of lock later
c.npMgr.Lock()
defer c.npMgr.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize this lock from defer to explicitly mentioning below RawNpMap exists check. thta way we are not locking when Add is being performed. Probably we can expand this to other controllers later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Another important thing is we need to decouple all shared maps (e.g., RawNpMap, etc) from npm.go to each controller and then carefully walk through them again. Even we may encapsulate each map with lock as a struct. Let's understand each maps in details and start this process as another PR.

f.t.Errorf("ipSetSave failed @ ipsMgr.Save")
}
}
func (f *netPolFixture) ipSetRestore(ipsetConfigFile string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reuse ipsetsave and restore from other tests right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. we can. I would suggest we left them separately now and see them after reviewing and thinking overall structure of our codes since as we discussed, we may decouple each controller into each package later.

f.kubeInformer.Start(stopCh)
}

func (f *netPolFixture) ipSetSave(ipsetConfigFile string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this, we will need IpTableSave and IpTableRestore, since each test will be manipulating iptable rules, we should clean them up once done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added two methods for iptables management. Now the methods are not used in networkPolicyController_test.go due to npm_test.go. We will use them later.

@matmerr
Copy link
Member

matmerr commented Apr 8, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

…policy cache in a right place. Correct prometheus metric code.
vakalapa
vakalapa previously approved these changes Apr 9, 2021
Copy link
Contributor

@vakalapa vakalapa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm :shipit: 🚀

// #1 No item in RawNpMap, which means network policy with the cachedNetPolKey is not applied. Thus, do not need to clean up process.
// #2 item in RawNpMap exists, which means network policy with the cachedNetPolKey is applied . Thus, Need to clean up process.
cachedNetPolKey := util.GetNSNameWithPrefix(key)
_, cachedNetPolObjExists := c.npMgr.RawNpMap[cachedNetPolKey]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to check if the cahedObj exists here ? Because the first check we do in cleanUpnetworkPolicy is if the cachedObj exists, i think we can safely remove this check. Any reason for leaving this check here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed the redundant check.

npm/npm.go Outdated
},
)
// create network policy controller
npMgr.netPolController = NewNetworkPolicyController(informerFactory.Networking().V1().NetworkPolicies(), clientset, npMgr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not replace "informerFactory.Networking().V1().NetworkPolicies()" with npInformer (L213) as they are same ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to remove npInformer from NetworkPolicyManager struct in npm.go. Let me first check whether I can remove npInformer member. If not, I can use npInfomer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npInformer member in NetworkPolicyManager struct is also used in Start method while it may be not necessary to keep the member. I changed it to "npInformer".

testCases := []expectedNetPolValues{
{1, 2, 0, true, true},
}
checkNetPolTestResult("TestAddMultipleNetPols", f, testCases)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the old nwpolicy_test, we were checking if the counters in Prometheus were correct, we are missing that check here, can we add back checks like below

newGaugeVal, err3 := promutil.GetValue(metrics.NumPolicies)
newCountVal, err4 := promutil.GetCountValue(metrics.AddPolicyExecTime)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Will add them.

}

// needSync checks whether the obj is valid network policy object.
func (c *networkPolicyController) needSync(obj interface{}) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This also returns the Key...so shall we rename it to GetNetworkPolicyKey (as the only validation it does is the networkpolicy object and not really if this is a valid update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. applied.

// if network policy does not have different states against lastly applied states stored in cachedNetPolObj,
// netPolController does not need to reconcile this update.
// in this updateNetworkPolicy event, newNetPol was updated with states which netPolController does not need to reconcile.
if isSameNetworkPolicy(cachedNetPolObj, newNetPol) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be called as needSync, but I'm fine with isSameNetworkPolicy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I would like to stick to isSameNetworkPolicy name since it is related to isSamePolicy function in parsePolicy.go and ProcessedNpMap data structure. So, when revisiting this issue, I will consider suggestion.

// Cache network object first before applying ipsets and iptables.
// If error happens while applying ipsets and iptables,
// the key is re-queued in workqueue and process this function again, which eventually meets desired states of network policy
c.npMgr.RawNpMap[cachedNetPolKey] = netPolObj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need a deepcopy of this here ! we are not removing references from the sharedinformer cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think current code is better in terms of memory and computations. If we use deepcopy network policyhere, we will always have two network policy objects and increase computations for deepcopy operation. But, if we use reference of network policy from sharedinformer cache, it will be either one or two objects eventually and no computation overhead for deepcopy.

netPolCachedKey := util.GetNSNameWithPrefix(key)

// (TODO): need to decouple this lock from npMgr if possible
c.npMgr.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt we take a ReadLock? Same for above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We can use readlock here, but current NPM code base is using global Mutex. When we decouple global Mutex from our code, we can take your suggestion. It will be different PR.

// TODO : may consider using "c.queue.AddAfter(key, *requeueAfter)" according to error type later
if err := c.syncNetPol(key); err != nil {
// Put the item back on the workqueue to handle any transient errors.
c.workqueue.AddRateLimited(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.workqueue.AddRateLimited(key) [](start = 3, length = 31)

Why do we need to do that? Simply dont call c.WorkQueue.Done in linie#196 (You should call that only if err!=nil)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to put ratelimit when reconcile is failed not to try it again so fast. https://pkg.go.dev/k8s.io/client-go/util/workqueue#RateLimitingInterface

}
// Finally, if no error occurs we Forget this item so it does not
// get queued again until another change happens.
c.workqueue.Forget(obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it is different from c.Workqueue.Done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// We call Done here so the workqueue knows we have finished processing this item. We also must remember to call Forget if we do not want this work item being re-queued. For example, we do not call Forget if a transient error occurs, instead the item is put back on the workqueue and attempted again after a back-off period. - Refer to
https://github.com/kubernetes/sample-controller/blob/master/controller.go#L196

}

err := func(obj interface{}) error {
defer c.workqueue.Done(obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer c.workqueue.Done(obj) [](start = 2, length = 27)

               c.workqueue.Done(obj)
               if err != nil {
                  c.workqueue.AddRateLimited(key)
              }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we agree to stick to current code.

@vakalapa
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JungukCho JungukCho merged commit d816931 into Azure:master Apr 14, 2021
matmerr pushed a commit to matmerr/azure-container-networking that referenced this pull request Apr 21, 2021
* first version of network policy controller and its unit tests

* update reconcile and deleteNetworkPolicy function to correctly install and uninstall default Azure NPM chain.

* To explicitly manage default Azure NPM chain in deleteNetworkPolicy function

* correct comments and delete unused variable

* fix missed returing errors in codes

* Correct to check DeletionTimestamp and DeletionGracePeriodSeconds variables

* removed placeholder functions in network policy controoler and added more test cases (e.g., update and adding multiple network policies)

* - applied comments (use explict names, locating lock in a better place)

* add two methods to save and restore iptables in unit test

* comment out unused function

* early filter in updateNetworkPolicy function if they are the same network policies. Update unit tests to test more network policies events

* - start using klog package instead of log package

* remove unneeded defer for lock

* Locate of adding and deleting network policy object from our network policy cache in a right place. Correct prometheus metric code.

* use cached network policy key instead of network policy object as method parameter in cleanUpNetworkPolicy

* remove redundant check

* Remove ns- prefix as key in RawNpMap. Update UT to check prometheus metrics. Applied better naming and removed redundancy codes.

* minor update for varialbe names

* remove dependency between UT by re-initializing metrics. Correct message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement npm Related to NPM.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants