-
Notifications
You must be signed in to change notification settings - Fork 260
fix: [NPM-LINUX] resiliency for several non-retriable errors #1566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…azure-container-networking into npm-exponential-try-count
| if currentLineIndex == lineNum-1 { | ||
| lineIndex = currentLineIndex | ||
| if currentLineNum == lineNum { | ||
| lineIndex = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bug fix here that impacts the 2nd+ retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before, we may map a line error on line 5 to the original line 5 (not the new line 5 after skipping lines on retry). This could lead to using the wrong line error handling for the second retry
|
Example Logs: |
| if err != nil { | ||
| metrics.SendErrorLogAndMetric(util.IpsmID, "error: failed to apply ipsets: %s", err.Error()) | ||
| // exponentially increase maxRestoreTryCount | ||
| iMgr.maxRestoreTryCount *= 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to exponentially increase here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this just results in 10 tries, we can just remove this and change the largestretry to 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline. setting a constant limit of 5
let's reconsider the perf impacts of dumping all ipset lines
) * adaptively modify linux max restore try count to prevent perpetual errors * remove debug print * log restore file and send ipsetmanager_linux errors * send other appropriate errors * fix handleLineError function * fix printing restore lines and enhance a log * fix lints and wrap chainLineNumber errors * fix one off error for logging the try count * revert exponential increase to try limit * update try count to 5 and update UTs * do not log lines for every restore call until perf is understood
) * adaptively modify linux max restore try count to prevent perpetual errors * remove debug print * log restore file and send ipsetmanager_linux errors * send other appropriate errors * fix handleLineError function * fix printing restore lines and enhance a log * fix lints and wrap chainLineNumber errors * fix one off error for logging the try count * revert exponential increase to try limit * update try count to 5 and update UTs * do not log lines for every restore call until perf is understood
) * adaptively modify linux max restore try count to prevent perpetual errors * remove debug print * log restore file and send ipsetmanager_linux errors * send other appropriate errors * fix handleLineError function * fix printing restore lines and enhance a log * fix lints and wrap chainLineNumber errors * fix one off error for logging the try count * revert exponential increase to try limit * update try count to 5 and update UTs * do not log lines for every restore call until perf is understood
) * adaptively modify linux max restore try count to prevent perpetual errors * remove debug print * log restore file and send ipsetmanager_linux errors * send other appropriate errors * fix handleLineError function * fix printing restore lines and enhance a log * fix lints and wrap chainLineNumber errors * fix one off error for logging the try count * revert exponential increase to try limit * update try count to 5 and update UTs * do not log lines for every restore call until perf is understood
Currently, if a cluster had 2+ persistent errors in dirty IPSets, then new changes may not get applied.
This PR also enhances logging:
print each line in the restore file(commented out until perf is understood)Fixes: