Skip to content

Conversation

@vakalapa
Copy link
Contributor

@vakalapa vakalapa commented Apr 2, 2021

Today, during a resource event handling, NPM will return immediately if it encounters an error and local NPM cache will not be updated. This results in leaked ipset/iptables state in kernel which will never be cleaned up since the local NPM cache is not aware of it.

Solution:
Build cache after each successful operation. Return on first error.

This PR, incorporates the above logic, NPM now keeps updating the local cache after every successful operation. NPM also ignores the notexists errors in Delete set operations.
Logging is updated to use klog instead of using acn/log.
if the Cached Pod IP changes, now NPM will delete the existing cached pod and adds a new pod.
CleanUpDeletedPod will now only require the key of the pod which needs to be deleted.

@vakalapa vakalapa requested review from JungukCho and matmerr April 2, 2021 23:02
@codecov
Copy link

codecov bot commented Apr 2, 2021

Codecov Report

Merging #848 (ced4ab8) into master (85cd950) will decrease coverage by 0.16%.
The diff coverage is 63.90%.

@@            Coverage Diff             @@
##           master     #848      +/-   ##
==========================================
- Coverage   42.22%   42.05%   -0.17%     
==========================================
  Files         158      158              
  Lines       15025    15136     +111     
==========================================
+ Hits         6344     6366      +22     
- Misses       7904     7995      +91     
+ Partials      777      775       -2     

metrics.SendErrorLogAndMetric(util.NSID, "[UpdateNamespace] Error: failed to add namespace %s to ipset list %s with err: %v", newNsName, labelKey, err)
return err
}
// {IMPORTANT} Same as above order is assumed to be key and then key+val. NPM should only append to existing labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Can we resolve this dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, keeping this as it is for now.

log.Logf("Adding pod %s to ipset %s", newPodObj.Status.PodIP, addIPSetName)
if err = ipsMgr.AddToSet(addIPSetName, newPodObj.Status.PodIP, util.IpsetNetHashFlag, podKey); err != nil {
return fmt.Errorf("[syncAddAndUpdatePod] Error: failed to add pod to label ipset with err: %v", err)
// {IMPORTANT} The order of compared list will be key and then key+val. NPM should only append after both key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same as namespace comment) Can we resolve this issue in this PR? As I understood this issue and we discussed this before, but if possible, it is better to resolve this issue to avoid future possible issue.
Current GetLabelKVFromSet returns key and value. So better to returns (string, string) in the GetLabelKVFromSet
function and use "removedKey" and "removedValue" in this function

if err = ipsMgr.AddToSet(addIPSetName, newPodObj.Status.PodIP, util.IpsetNetHashFlag, podKey); err != nil {
return fmt.Errorf("[syncAddAndUpdatePod] Error: failed to add pod to label ipset with err: %v", err)
}
// {IMPORTANT} Same as above order is assumed to be key and then key+val. NPM should only append to existing labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we resolve this issue in this PR?

c.npMgr.PodMap[podKey].appendLabels(map[string]string{addedLabel[0]: addedLabel[1]}, AppendToExitingLabels)
}
}
// This will ensure after all labels are worked on to overwrite. This way will reduce any bugs introduced above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are possible bugs? It would be nice to update bugs case as comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. Due to the ordering issues if we not delete and add a label, this appedlabels will help override it.

// the IP addresses of the cached npmPod and newPodObj is the same
// If no change in labels, then GetIPSetListCompareLabels will return empty list.
// Otherwise it returns list of deleted PodIP from cached pod's labels and list of added PodIp from new pod's labels
addToIPSets, deleteFromIPSets := util.GetIPSetListCompareLabels(cachedNpmPodObj.Labels, newPodObj.Labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here, do we deal with the first case? If so, can we keep using cachedNpmPodObj and update the cachedNpmPodObj if needed? Are there specific reasons to create newNpmPod() and update cache, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason we are creating new obj is because we have fields like Phase, PodIPs, Rv etc which needs to be updated to reflect the latest newPodobj ... we can add them individually too if it makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement, adding this change

portList := getContainerPortList(podObj)
// To remove any references to objects sent to events by sharedinformer,
// NPM will need to deepcopy data from the provided objects
copiedPortList := make([]corev1.ContainerPort, len(portList))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getContainerPortList returns new slices with copy. Why do we need the same operation again?

// Delete the pod from its namespace's ipset.
if err = ipsMgr.DeleteFromSet(cachedNpmPodObj.Namespace, cachedNpmPodObj.PodIP, podKey); err != nil {
return fmt.Errorf("[syncAddAndUpdatePod] Error: failed to delete pod from namespace ipset with err: %v", err)
metrics.SendErrorLogAndMetric(util.PodID, "[syncAddAndUpdatePod] Info: Unexpected state. Pod (Namespace:%s, Name:%s, newUid:%s) , has cachedPodIp:%s which is different from PodIp:%s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we do not need this message since it can happen.

Copy link
Contributor Author

@vakalapa vakalapa Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line was very useful in Genesys debugging, we would want to see why the delete and add happened for any new folks.

podObj, ok := obj.(*corev1.Pod)
if !ok {
metrics.SendErrorLogAndMetric(util.NpmID, "ADD Pod: Received unexpected object type: %v", obj)
return key, needSync
Copy link
Contributor

@JungukCho JungukCho Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About metrics.SEndErrorLogAndMetric().
I think we do not need send this information since it can be ignorable. If we need to keep it, may need to change "util.NpmID" -> util.PodID". Same opinion for metrics.SEndErrorLogAndMetric() in this function.

It would be nice to check this in the rest of code (e.g., deletePod(), it uses "util.NpmID" and is better to use ""util.PodID" ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch changed this in 3 locations.

Copy link
Contributor

@JungukCho JungukCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for applying comments! Looks good to me.

@vakalapa vakalapa added the npm Related to NPM. label Apr 14, 2021
@vakalapa vakalapa merged commit 53e45c8 into master Apr 14, 2021
@vakalapa vakalapa deleted the vakr/npmerrorcontroller branch April 14, 2021 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

npm Related to NPM.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants