Skip to content

Conversation

@huntergregory
Copy link
Contributor

@huntergregory huntergregory commented Aug 1, 2022

Sometimes at high scale, the Linux ipset utility will leak reference counts for ipsets e.g. there will be no iptables rules or lists referencing an ipset, yet it will have a positive reference count. As a result, some lingering ipsets can't be deleted when NPM boots up.

In NPM "v1", we ignore these in-use-by-kernel errors. Now we do the same in "v2" to prevent a CrashLoopBackOff when this kernel leak happens.

@huntergregory huntergregory added the npm Related to NPM. label Aug 1, 2022
@huntergregory huntergregory changed the title scale: [NPM] flush and destroy sets separately & cleanup restorer scale: [NPM] ignore leaked ipset references Aug 3, 2022
@huntergregory huntergregory marked this pull request as ready for review August 3, 2022 01:06
@huntergregory huntergregory requested a review from a team as a code owner August 3, 2022 01:06
@huntergregory huntergregory requested review from vakalapa and removed request for a team August 3, 2022 01:06
grepCommand := iMgr.ioShim.Exec.Command(ioutil.Grep, azureNPMPrefix)
azureIPSets, haveAzureIPSets, commandError := ioutil.PipeCommandToGrep(listCommand, grepCommand)
klog.Infof("running this command while resetting ipsets: [%s %s %s | %s %s]", ipsetCommand, ipsetListFlag, ipsetNameFlag, ioutil.Grep, azureNPMRegex)
azureIPSets, haveAzureIPSets, commandError := ioutil.PipeCommandToGrep(listNamesCommand, grepCommand)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an extra check here ? if the total number of IPSets == IPSets with azure-npm- prefix, then wouldn't it be easy to just call in generic ipset flush and destroy ?

i am afraid there might some cases i am missing here, or have some weird scenario in linux which we are not thinking of. Wdyt ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipset list --names | grep -v azurenpmregex == 0

ipset flush && ipset destroy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

grepCommand := iMgr.ioShim.Exec.Command(ioutil.Grep, ioutil.GrepQuietFlag, ioutil.GrepAntiMatchFlag, azureNPMPrefix)
commandString := fmt.Sprintf(" [%s %s %s | %s %s %s %s]", ipsetCommand, ipsetListFlag, ipsetNameFlag, ioutil.Grep, ioutil.GrepQuietFlag, ioutil.GrepAntiMatchFlag, azureNPMPrefix)
klog.Infof("running this command while resetting ipsets: [%s]", commandString)
_, haveNonAzureIPSets, commandError := ioutil.PipeCommandToGrep(listNamesCommand, grepCommand)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: haveNonAzureIPSets -> haveNonAzureNPMIPSets

@vakalapa vakalapa merged commit 2460582 into master Aug 9, 2022
@vakalapa vakalapa deleted the npm-restore-at-scale branch August 9, 2022 18:39
Copy link
Member

@matmerr matmerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marking for posterity that would like to steer away from all the grep/piping/exit code redirection, opt for something more built in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

npm Related to NPM.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants