Skip to content

Conversation

@huntergregory
Copy link
Contributor

@huntergregory huntergregory commented Nov 16, 2021

Logic for resetting IPSets in Linux:

  • get current npm IPSets:
    • run ipset list --name (which returns just the set names per line) and grep for "azure-npm-"
  • create a restore file which flushes all these sets and then deletes all of them
  • see the code for detailed description of the error handling logic and Prometheus metrics logic

Update for Prometheus ResetIPSetEntries:

  • removes all sets from the Prometheus metric for IPSet entries

@huntergregory huntergregory added the npm Related to NPM. label Nov 16, 2021
metrics.ResetIPSetEntries()
return nil
}
creator, originalNumSets, destroyFailureCount := iMgr.fileCreatorForReset(searchResults)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creator naming will be specific to context of this function (ipset), for example, ipSetRestorer.
But, it is fine as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can rename all the functions and variables with (file)creator in them, but this should be a separate cleanup PR

hashedSetName := string(line)
names = append(names, hashedSetName)
// error handlers specific to resetting ipsets
errorHandlers := []*ioutil.LineErrorHandler{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work to classify all errors and corresponding actions.
I understood the high-level ideas about LineErrorHandler which holds necessary error handling information to the corresponding executed line when there is error.

While I am fine as is, but just curious.
Do we consolidate all error handlings code without this errorHandlers here - https://github.com/Azure/azure-container-networking/blob/master/npm/pkg/dataplane/ioutil/restore_linux.go#L199?
For maintenance, it would be nice to consolidate error handling in one places if possible.
I thought Callback has only error message, but it changed destroyFailureCount in some places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some callbacks modify destroyFailureCount because the error is a kind of failure to delete (as opposed to the set not existing, which is technically an error but not a failure in our eyes). The line you reference is code which recognizes which line an error happened on and is somewhat unrelated to the error handlers. Did this answer your question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I understood the intention of codes.

My fundamental comments are whether we can consolidate all error definitions and handlers in one place per ipset and iptables instead of sparsely declaring them in each function and using Callback.
It will be easier to understand and maintain codes, but I may miss somethings and it will need big changes.
wdyt?

BTW, it is fine as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still might not fully understand your question. To try to help consolidate, for creatorForApply, I group all of the error handling code at the bottom of the file. I can't move the error handling code out of creatorForReset because of the dependency on destroyFailureCount.

We could consolidate the definition and method (e.g. ContinueAndAbortSection) but the callback will always be unique and will eventually do more than logging for most (e.g. mark a set as needing reconcile).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for instance though, we could make this error handler a function of the set name and destroyFailureCount pointer though

// create restore file that flushes all sets, then deletes all sets
// technically don't need to flush a hashset
// this needs to be a separate function because we need to check creator contents in UTs
func (iMgr *IPSetManager) fileCreatorForReset(ipsetListOutput []byte) (creator *ioutil.FileCreator, originalNumAzureSets int, destroyFailureCount *int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline avoid usage of named returns, and potentially just return creator, originalNumAzureSets and destroyFailureCounts are purely used for for metrics tracking and don't particularly related to the usage of creator in clients

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fail the lint check in favor of unnamed returns? Also, I'm not sure how we could decouple the numbers for metrics from the function output because we set metrics based on the success/failure of creator.RunCommand

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

zero := 0
destroyFailureCount = &zero
creator = ioutil.NewFileCreator(iMgr.ioShim, maxTryCount, ipsetRestoreLineFailurePattern)
func (iMgr *IPSetManager) fileCreatorForReset(ipsetListOutput []byte) (*ioutil.FileCreator, int, *int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we do not return pointer.

(*ioutil.FileCreator, int, *int) -> (*ioutil.FileCreator, int, int)

and can use non-pointer operation in caller as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need the pointer because the destroyFailureCount is updated after we get the creator and run creator.Run(). I'll add a clarifying comment next PR

Copy link
Contributor

@JungukCho JungukCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! You can resolve my last comment in next PR if it makes sense.
However, please check @matmerr comments.

@huntergregory huntergregory merged commit da4bd0d into master Nov 18, 2021
@rbtr rbtr deleted the reset-ipsets branch November 30, 2021 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

npm Related to NPM.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants