feat: [NPM] ipset save before restoring and fixing grep UTs #1085

huntergregory · 2021-11-04T18:55:47Z

Changes for Applying IPSets

use ipset save file to add/delete members
update error handling logic to skip previously run lines in a restore file

Design Decision

Instead of calculating which members should be removed from ipsets, ipset manager will use the current kernel state via ipset save and apply the cache as the expected state.

Pros

for the dirty sets, the kernel will always become exactly as we desire (delete everything not desired from the kernel i.e. no leftover set members that aren't in the cache)
we are effectively reconciling and mitigating security concerns for dirty sets
no need to modify cache management at this time

Cons

increased compute and latency
added complexity and perhaps harder to maintain

Looking Forward

We will eventually have an infrequent reconcile routine which will reconcile the whole cache with the kernel state.

We will assess the extra latency of this design choice later.

Other changes

Allow proper exit codes and stdouts for piped commands in UTs.

TODO:

move some UT code into smaller-scoped functions for less complexity

…ipsets to skip previously run lines. Also update some logging for iptables chain management

…hat we dont revert on dataplane UTs

…pdate ApplyIPSets test calls for dataplane UTs

npm/pkg/dataplane/ioutil/file-creator.go

JungukCho · 2021-11-09T21:07:59Z

npm/pkg/dataplane/ipsets/ipsetmanager_linux.go

+// error handling principal:
+// - if contract with ipset save (or grep) is breaking, salvage what we can, take a snapshot without grep, and log the failure
+// - have a background process for sending/removing snapshots intermittently
+func (iMgr *IPSetManager) updateDirtyKernelSets(saveFile []byte, creator *ioutil.FileCreator) {


Does this function walk through all ipset information stored in kernel?
If so, is it necessary?
If there are large number of ipsets are programmed in kernel, this cost will be high?

Do we leverage -exist option for add and del to make it simple?

# from IPSet man page add SETNAME ADD-ENTRY [ ADD-OPTIONS ] Add a given entry to the set. If the -exist option is specified, ipset ignores if the entry already added to the set. del SETNAME DEL-ENTRY [ DEL-OPTIONS ] Delete an entry from a set. If the -exist option is specified and the entry is not in the set (maybe already expired), then the command is ignored.

we grep for all npm sets, so the work is linear in the number of sets in NPM. For non-dirty sets, we do skip reading all add lines in the save file, so this won't be too intensive. We wouldn't know what sets to delete without walking through the ipset save file, so del --exist couldn't work, and since we're already determining the members to delete for dirty sets, determining the exact members to add is equal work, and we write a smaller restore file compared to looping through all members and running add --exist

I may misunderstand the code, but in the code, it also has string match. So the work is not linear to the number of sets. It is linear to the number of sets and its string in each set.

My fundamental question in the code is why we need to walk through all ipsets (read from kernel and store in saveFile variable) in every applyIPSets call and derive some actions (add or deletion) even though we maintain a lot of data structures in ipsetmanager_linux.go. Can we leverage those information and apply what we need?

So I guess the decision is between these options:

read in save file for dirty sets

maintain copy of ipsets object from before they became dirty

We're doing the first here. The second would require extra memory (worst case, doubling the memory for the ipset cache before applying) and would require a redesign of the whole ipset manager.

Thoughts @vakalapa ?

This operation is definitely very intensive wrt string matching, even if we skip over the section of some ipsets we will be doing first of line matching.

We chose not to save deleted members of IPSets to save memory because we are already very constrained on memory in linux and data structures we are maintaining in Ipset manager is just source of truth. This is modelled against k8s philosophy, we have a source of truth (our internal ipset cache), we read the state on the machine and try to reconcile to that state.

Also, this approach is robust in the sense that kernel state will hardly deviate from expected state since NPM takes a snapshot of sets to be updated and then applies changes based on that, current IPSM's draw back is that as errors occur and as NPM runs for longer duration, kernel state heavily deviates from what is expected and this results in a cascading affect of issues,

Hopefully in the future, when we figure out all the errors and have low-probability-for-error controller, then we can tone this approach down. But for initial purposes i feel, even with this compute intensive update, we are VERY fast compared to previous generation.

Wdygt?

If you guys think this is a way, I am ok with this approach.

I just wrote my two cents.
Based on my understanding, the goal of this enhancements in ipset in high-level is

Compared to v1 NPM, in v2 we do kind of batch process (one shot for one controller event) unlike in v1 NPM (sequentially called exec.run call for each ipset operation) to improve performance.

More fined-grained ipset management with reference counts to avoid known issues (e.g., used in kernel)
Do I miss something?

I may be wrong and miss some details, but to me, it looks like that we took a kind of reverse engineering way, which is hard and complex way while we have enough information to derive all information since data plane is based on all information in control plane.

We know when and which operations we need to program (e.g., add, deletion, update) based on controller event)

We know what information are programmed from ipset manger data-structures

All the references counts for ipsets
So, I think we can achieve the same functionality with those information in much simple way.

About performance, with a lot of ipsets, I am not sure how much we gains since we do string match O(# of ipsets * # of characters per ipset) in every event which needs applyIPset.

As we discussed offline, this current design is more secure than alternatives because even if some malicious actor changes the ipsets, NPM will try to reconcile to the desired state and overwrite manual entries. Granted NPM will only do this for "dirty" sets. A backlog item is added to have a background thread "at leisure" should try to reconcile all ipsets to the desired state, this design will make NPM more secure in the long run.

Also, perf evaluation is planned on this current design to understand at scale what latency this compute intensive task is going to induce in rule programming.

npm/pkg/dataplane/ipsets/ipsetmanager_linux.go

JungukCho · 2021-11-09T21:24:21Z

npm/pkg/dataplane/policies/chain-management_linux.go

-	deleteErrCode, deleteErr := pMgr.runIPTablesCommand(util.IptablesDeletionFlag, jumpFromForwardToAzureChainArgs...)
-	hadDeleteError := deleteErr != nil && deleteErrCode != couldntLoadTargetErrorCode
-	if hadDeleteError {
+	deleteErrCode, deleteErr := pMgr.runIPTablesCommandAndIgnoreErrorCode(couldntLoadTargetErrorCode, util.IptablesDeletionFlag, jumpFromForwardToAzureChainArgs...)


In deletion operation, is it ok not to get error code about doesNotExistErrorCode or couldntLoadTargetErrorCode, etc?
Anyway, the goal is to clean up the tables.

if the AZURE-NPM chain doesn't exist, we'll get couldntLoadTargetErrorCode and if the chain exists but the rule doesn't exist, we'll get doesNotExistErrorCode. So this should ignore both of these codes actually, since this is called before initializing DP.

if there's an error besides those two, something's wrong, but it could be several things, so ya maybe we just log and continue.

For instance, we get couldntLoadTargetErrorCode for "unknown option" and for a malformed IP address.

these changes will be moved to the next PR which updates policy/chain management

JungukCho · 2021-11-09T21:24:59Z

npm/pkg/dataplane/policies/chain-management_linux.go

 		allArgsString := strings.Join(allArgs, " ")
 		msgStr := strings.TrimSuffix(string(output), "\n")
-		if errCode > 0 && operationFlag != util.IptablesCheckFlag {
+		if errCode > 0 && errCode != errCodeToIgnore {


Can you check removeNPMChains case with errCodeToIgnore?
In case the chain does not exists, it is fine to proceed.

sorry, could you clarify this?

Seems you resolved it.

these changes will be moved to the next PR which updates policy/chain management

JungukCho · 2021-11-09T21:35:09Z

npm/pkg/dataplane/policies/chain-management_linux.go

+	return pMgr.runIPTablesCommandAndIgnoreErrorCode(-1, operationFlag, args...)
+}
+
+func (pMgr *PolicyManager) runIPTablesCommandAndIgnoreErrorCode(errCodeToIgnore int, operationFlag string, args ...string) (int, error) {


It is a little hard to follow which operations are ok to ignore which errors.
Do we delegate error handling in a caller side?

We just have one run function and a caller will do right actions bases on return values.
It will be easier to follow up codes.

The main consideration for ignoring an error code in this function is that this function logs when there's an error, but I think it shouldn't log an error if we ignore the error later. We previously had some of that code baked into run() for check operations, but I think we should also do it for deleting a rule.

these changes will be moved to the next PR which updates policy/chain management

JungukCho · 2021-11-09T21:53:11Z

One comment: naming (file-creator.go and its contents) is misleading if I correctly understand the code. IPset and iptables use stdout to program them. Initially, I am looking for where is file.

huntergregory · 2021-11-10T18:01:20Z

One comment: naming (file-creator.go and its contents) is misleading if I correctly understand the code. IPset and iptables use stdout to program them. Initially, I am looking for where is file.

Open to suggestions. How about restore-file-creator.go?

JungukCho · 2021-11-11T01:06:27Z

One comment: naming (file-creator.go and its contents) is misleading if I correctly understand the code. IPset and iptables use stdout to program them. Initially, I am looking for where is file.

Open to suggestions. How about restore-file-creator.go?
May be restoreLinuxDP.go since both iptables and ipset uses this for mainly restore, but your suggestion is ok.
Just having file makes me confused, but I may be so sensitive.

huntergregory · 2021-11-11T01:24:10Z

One comment: naming (file-creator.go and its contents) is misleading if I correctly understand the code. IPset and iptables use stdout to program them. Initially, I am looking for where is file.

Open to suggestions. How about restore-file-creator.go?
May be restoreLinuxDP.go since both iptables and ipset uses this for mainly restore, but your suggestion is ok.
Just having file makes me confused, but I may be so sensitive.

ok I'll call it restore_linux.go. Are you ok with the name FileCreator for the struct?

JungukCho · 2021-11-11T17:32:47Z

One comment: naming (file-creator.go and its contents) is misleading if I correctly understand the code. IPset and iptables use stdout to program them. Initially, I am looking for where is file.

Open to suggestions. How about restore-file-creator.go?
May be restoreLinuxDP.go since both iptables and ipset uses this for mainly restore, but your suggestion is ok.
Just having file makes me confused, but I may be so sensitive.

ok I'll call it restore_linux.go. Are you ok with the name FileCreator for the struct?

If it makes sense to you, I am fine with it.

vakalapa · 2021-11-11T22:12:05Z

/azp run

azure-pipelines · 2021-11-11T22:12:20Z

Azure Pipelines successfully started running 2 pipeline(s).

…peline

…ge in pipeline" This reverts commit 31148c3.

npm/config/config.go

vakalapa · 2021-11-12T18:18:06Z

/azp run

azure-pipelines · 2021-11-12T18:18:23Z

Azure Pipelines successfully started running 2 pipeline(s).

matmerr · 2021-11-13T00:02:05Z

npm/pkg/dataplane/ipsets/ipsetmanager_linux.go

+	if len(iMgr.toAddOrUpdateCache) > 0 {
+		saveFile, saveError = iMgr.ipsetSave()
+		if saveError != nil {
+			return fmt.Errorf("%w", saveError)


can we wrap with context so we know the save error came from this stack

JungukCho

LGTM! Thank you for applying comments.
We can walk about translation and linux dataplane before integration test.

use ipset save to update members and update error handling logic for …

de13661

…ipsets to skip previously run lines. Also update some logging for iptables chain management

huntergregory requested review from JungukCho and vakalapa November 4, 2021 18:55

huntergregory added the npm Related to NPM. label Nov 4, 2021

huntergregory added 9 commits November 4, 2021 15:16

remove unused code

ad7ba7c

add comment block describing high level ipset restore logic

9ac9aa4

fix bug in piping to grep. need to add pipe errors for fexec UTs so t…

ab6d90d

…hat we dont revert on dataplane UTs

grep for npm sets working for ipsets save, but this breaks DP UTs

7ecb311

VerifyCalls method for mock ioshim

d55cecd

add ability to unit test piped commands

6c17d5b

update logging

ce0a3d8

UTs for piping a command to grep

9fc401c

grep for npm sets in ipset save, verify number of calls in UTs, and u…

2a51a5c

…pdate ApplyIPSets test calls for dataplane UTs

huntergregory changed the title ~~[NPM] Update Linux IPSet Management~~ feat: [NPM] Update Linux IPSet Management Nov 9, 2021

JungukCho reviewed Nov 9, 2021

View reviewed changes

update comments based on PR suggestions

d6889b1

huntergregory added 2 commits November 10, 2021 10:24

addressing comments

5ba0616

remove out-of-scope policy changes for this PR

2646c97

huntergregory changed the title ~~feat: [NPM] Update Linux IPSet Management~~ feat: [NPM] ipset save before restoring and fixing grep UTs Nov 11, 2021

rename restore file creator files

3cc072d

huntergregory added 2 commits November 11, 2021 17:43

FIXME: setting v2 controllers toggle to true to create an image in pi…

31148c3

…peline

Revert "FIXME: setting v2 controllers toggle to true to create an ima…

29d447a

…ge in pipeline" This reverts commit 31148c3.

JungukCho reviewed Nov 12, 2021

View reviewed changes

npm/config/config.go Show resolved Hide resolved

matmerr reviewed Nov 13, 2021

View reviewed changes

wrap errors

6f47ab0

JungukCho approved these changes Nov 15, 2021

View reviewed changes

vakalapa approved these changes Nov 15, 2021

View reviewed changes

huntergregory merged commit 17ed0b8 into master Nov 16, 2021

vakalapa deleted the ipset-save-restore branch November 16, 2021 17:51

feat: [NPM] ipset save before restoring and fixing grep UTs #1085

feat: [NPM] ipset save before restoring and fixing grep UTs #1085

Uh oh!

Conversation

huntergregory commented Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes for Applying IPSets

Design Decision

Pros

Cons

Looking Forward

Other changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JungukCho commented Nov 9, 2021

Uh oh!

huntergregory commented Nov 10, 2021

Uh oh!

JungukCho commented Nov 11, 2021

Uh oh!

huntergregory commented Nov 11, 2021

Uh oh!

JungukCho commented Nov 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vakalapa commented Nov 11, 2021

Uh oh!

azure-pipelines bot commented Nov 11, 2021

Uh oh!

Uh oh!

vakalapa commented Nov 12, 2021

Uh oh!

azure-pipelines bot commented Nov 12, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JungukCho left a comment

huntergregory commented Nov 4, 2021 •

edited

Loading

JungukCho commented Nov 11, 2021 •

edited

Loading