-
Notifications
You must be signed in to change notification settings - Fork 260
feat: [NPM] ipset save before restoring and fixing grep UTs #1085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ipsets to skip previously run lines. Also update some logging for iptables chain management
…hat we dont revert on dataplane UTs
…pdate ApplyIPSets test calls for dataplane UTs
| // error handling principal: | ||
| // - if contract with ipset save (or grep) is breaking, salvage what we can, take a snapshot without grep, and log the failure | ||
| // - have a background process for sending/removing snapshots intermittently | ||
| func (iMgr *IPSetManager) updateDirtyKernelSets(saveFile []byte, creator *ioutil.FileCreator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function walk through all ipset information stored in kernel?
If so, is it necessary?
If there are large number of ipsets are programmed in kernel, this cost will be high?
Do we leverage -exist option for add and del to make it simple?
# from IPSet man page
add SETNAME ADD-ENTRY [ ADD-OPTIONS ]
Add a given entry to the set. If the -exist option is specified, ipset ignores if the entry already added to the set.
del SETNAME DEL-ENTRY [ DEL-OPTIONS ]
Delete an entry from a set. If the -exist option is specified and the entry is not in the set (maybe already expired), then the command is ignored.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we grep for all npm sets, so the work is linear in the number of sets in NPM. For non-dirty sets, we do skip reading all add lines in the save file, so this won't be too intensive. We wouldn't know what sets to delete without walking through the ipset save file, so del --exist couldn't work, and since we're already determining the members to delete for dirty sets, determining the exact members to add is equal work, and we write a smaller restore file compared to looping through all members and running add --exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may misunderstand the code, but in the code, it also has string match. So the work is not linear to the number of sets. It is linear to the number of sets and its string in each set.
My fundamental question in the code is why we need to walk through all ipsets (read from kernel and store in saveFile variable) in every applyIPSets call and derive some actions (add or deletion) even though we maintain a lot of data structures in ipsetmanager_linux.go. Can we leverage those information and apply what we need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess the decision is between these options:
- read in save file for dirty sets
- maintain copy of ipsets object from before they became dirty
We're doing the first here. The second would require extra memory (worst case, doubling the memory for the ipset cache before applying) and would require a redesign of the whole ipset manager.
Thoughts @vakalapa ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operation is definitely very intensive wrt string matching, even if we skip over the section of some ipsets we will be doing first of line matching.
We chose not to save deleted members of IPSets to save memory because we are already very constrained on memory in linux and data structures we are maintaining in Ipset manager is just source of truth. This is modelled against k8s philosophy, we have a source of truth (our internal ipset cache), we read the state on the machine and try to reconcile to that state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this approach is robust in the sense that kernel state will hardly deviate from expected state since NPM takes a snapshot of sets to be updated and then applies changes based on that, current IPSM's draw back is that as errors occur and as NPM runs for longer duration, kernel state heavily deviates from what is expected and this results in a cascading affect of issues,
Hopefully in the future, when we figure out all the errors and have low-probability-for-error controller, then we can tone this approach down. But for initial purposes i feel, even with this compute intensive update, we are VERY fast compared to previous generation.
Wdygt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you guys think this is a way, I am ok with this approach.
I just wrote my two cents.
Based on my understanding, the goal of this enhancements in ipset in high-level is
- Compared to v1 NPM, in v2 we do kind of batch process (one shot for one controller event) unlike in v1 NPM (sequentially called exec.run call for each ipset operation) to improve performance.
- More fined-grained ipset management with reference counts to avoid known issues (e.g.,
used in kernel)
Do I miss something?
I may be wrong and miss some details, but to me, it looks like that we took a kind of reverse engineering way, which is hard and complex way while we have enough information to derive all information since data plane is based on all information in control plane.
- We know when and which operations we need to program (e.g., add, deletion, update) based on controller event)
- We know what information are programmed from ipset manger data-structures
- All the references counts for ipsets
So, I think we can achieve the same functionality with those information in much simple way.
About performance, with a lot of ipsets, I am not sure how much we gains since we do string match O(# of ipsets * # of characters per ipset) in every event which needs applyIPset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, this current design is more secure than alternatives because even if some malicious actor changes the ipsets, NPM will try to reconcile to the desired state and overwrite manual entries. Granted NPM will only do this for "dirty" sets. A backlog item is added to have a background thread "at leisure" should try to reconcile all ipsets to the desired state, this design will make NPM more secure in the long run.
Also, perf evaluation is planned on this current design to understand at scale what latency this compute intensive task is going to induce in rule programming.
| deleteErrCode, deleteErr := pMgr.runIPTablesCommand(util.IptablesDeletionFlag, jumpFromForwardToAzureChainArgs...) | ||
| hadDeleteError := deleteErr != nil && deleteErrCode != couldntLoadTargetErrorCode | ||
| if hadDeleteError { | ||
| deleteErrCode, deleteErr := pMgr.runIPTablesCommandAndIgnoreErrorCode(couldntLoadTargetErrorCode, util.IptablesDeletionFlag, jumpFromForwardToAzureChainArgs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In deletion operation, is it ok not to get error code about doesNotExistErrorCode or couldntLoadTargetErrorCode, etc?
Anyway, the goal is to clean up the tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the AZURE-NPM chain doesn't exist, we'll get couldntLoadTargetErrorCode and if the chain exists but the rule doesn't exist, we'll get doesNotExistErrorCode. So this should ignore both of these codes actually, since this is called before initializing DP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's an error besides those two, something's wrong, but it could be several things, so ya maybe we just log and continue.
For instance, we get couldntLoadTargetErrorCode for "unknown option" and for a malformed IP address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes will be moved to the next PR which updates policy/chain management
| allArgsString := strings.Join(allArgs, " ") | ||
| msgStr := strings.TrimSuffix(string(output), "\n") | ||
| if errCode > 0 && operationFlag != util.IptablesCheckFlag { | ||
| if errCode > 0 && errCode != errCodeToIgnore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check removeNPMChains case with errCodeToIgnore?
In case the chain does not exists, it is fine to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, could you clarify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you resolved it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes will be moved to the next PR which updates policy/chain management
| return pMgr.runIPTablesCommandAndIgnoreErrorCode(-1, operationFlag, args...) | ||
| } | ||
|
|
||
| func (pMgr *PolicyManager) runIPTablesCommandAndIgnoreErrorCode(errCodeToIgnore int, operationFlag string, args ...string) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little hard to follow which operations are ok to ignore which errors.
Do we delegate error handling in a caller side?
We just have one run function and a caller will do right actions bases on return values.
It will be easier to follow up codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main consideration for ignoring an error code in this function is that this function logs when there's an error, but I think it shouldn't log an error if we ignore the error later. We previously had some of that code baked into run() for check operations, but I think we should also do it for deleting a rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes will be moved to the next PR which updates policy/chain management
|
One comment: naming (file-creator.go and its contents) is misleading if I correctly understand the code. IPset and iptables use |
Open to suggestions. How about restore-file-creator.go? |
|
ok I'll call it restore_linux.go. Are you ok with the name FileCreator for the struct? |
If it makes sense to you, I am fine with it. |
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
…ge in pipeline" This reverts commit 31148c3.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
| if len(iMgr.toAddOrUpdateCache) > 0 { | ||
| saveFile, saveError = iMgr.ipsetSave() | ||
| if saveError != nil { | ||
| return fmt.Errorf("%w", saveError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we wrap with context so we know the save error came from this stack
JungukCho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for applying comments.
We can walk about translation and linux dataplane before integration test.
Changes for Applying IPSets
Design Decision
Instead of calculating which members should be removed from ipsets, ipset manager will use the current kernel state via ipset save and apply the cache as the expected state.
Pros
Cons
Looking Forward
We will eventually have an infrequent reconcile routine which will reconcile the whole cache with the kernel state.
We will assess the extra latency of this design choice later.
Other changes
TODO: