Skip to content

Conversation

@huntergregory
Copy link
Contributor

@huntergregory huntergregory commented Nov 18, 2021

Linux DP wasn't working for integration tests before this PR. Now it does.

It will fail conformance tests though due to the first problem in the TODOs below.

TODO in followup PR:

  • fix problem: accept rules for iptables ingress chains jump to EGRESS instead of jumping to ALLOW-INGRESS-MARK chain
  • uncomment reboot code for Linux Policy Manager (reset and initialize Azure chains when we have 0 network policies)
  • call reconcile for policy manager within DP
  • getting the chain line number will return the first digit of the line number, not the whole number. So we might think position 15 is ok when we need position 1.

Notes on Changes

Linux Policy Manager fixes (in chain-management_linux.go):

  1. make the jump from FORWARD chain to AZURE-NPM chain the first rule in the FORWARD chain
  2. Don't throw an error if we fail to delete this jump rule when it doesn't exist
  3. fix how we reset chains by only flushing/delete current chains (as opposed to creating/flushing/deleting all possible chains before)
  4. Also, don't make an iptables-restore call when there are no azure chains currently

Fix DP:

  1. uncomment code that initializes policy manager
  2. same with resetting

Unit Testing:

  1. verify all exec calls now
  2. fix UTs for grep
  3. consolidate tests in chain-management_linux_test.go into tests with subtests for better readability and maintenance

Integration Testing:

  1. Better testing for network policies (had to update some test network policies which impacted tests in policymanager_linux_test.go)

@huntergregory huntergregory added the npm Related to NPM. label Nov 18, 2021
@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Contributor

@JungukCho JungukCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you apply comments based on previous comments in the function (may need to slightly change for v2) when creating NewDataPlane - https://github.com/Azure/azure-container-networking/blob/master/npm/pkg/dataplane/dataplane.go#L70 ?
I am not sure how windows works, but for linux it is good to comment them.

deleteErrCode, deleteErr := pMgr.runIPTablesCommand(util.IptablesDeletionFlag, jumpFromForwardToAzureChainArgs...)
hadDeleteError := deleteErr != nil && deleteErrCode != couldntLoadTargetErrorCode
// TODO check rule doesn't exist error code instead. The first call of dp.Reset() we will have exit code 2 (couldn't load target) since AZURE-NPM won't exist
hadDeleteError := deleteErr != nil && deleteErrCode != couldntLoadTargetErrorCode // couldntLoadTargetErrorCode happens when AZURE-NPM chain doesn't exist (and hence the jump rule doesn't exist too)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: slightly better to put comment above code?

// couldntLoadTargetErrorCode happens when AZURE-NPM chain doesn't exist (and hence the jump rule doesn't exist too)
hadDeleteError := deleteErr != nil && deleteErrCode != couldntLoadTargetErrorCode 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is so tough to handle iptables and ipset errors.
Can we update comments which includes couldntLoadTargetErrorCode error can be ignorable?
But, what if the error is not couldntLoadTargetErrorCode? In the code, now the code sends log message.
So, I guess it is ok to proceed. It would be helpful to add comments about it.

It is not scope of this PR, but just bring it up.
I just realized couldntLoadTargetErrorCode is not easy to know this is global variable.
I looked for local variables. We had this approach of the codes in several places, but good to come up with some ideas to easily recognize it.
somehow

chainsErr.NoTargetExist

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update comments. I agree that we can do something like chainsErr.NoTargetExist later

)

const reconcileChainTimeInMinutes = 5
const reconcileTimeInMinutes = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for better management (e.g., naming or etc) of codes, we may sync how windows dataplane supports common functions. Now some of them are TODO in policymanager_windows.go, but would be helpful to put comments whether these are supported or not.

@vakalapa it would be nice if you walk through them and fille up them when you have time since I think you are the most knowledgeable in windows dataplane.

When we understand what reconcile in both side, we may change the name to better understand if it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atm, there are no plans to introduce a reconcile loop in windows DP. We do not have periodic tasks like chain management in windows and the error signatures in windows are virtually unusable so we cannot build any reconcile failure loop on it.

@huntergregory
Copy link
Contributor Author

Can you apply comments based on previous comments in the function (may need to slightly change for v2) when creating NewDataPlane - https://github.com/Azure/azure-container-networking/blob/master/npm/pkg/dataplane/dataplane.go#L70 ? I am not sure how windows works, but for linux it is good to comment them.

So I see we should be resetting policy manager first, then ipset manager (I'll update this). Was there anything else the comment refers to?

JungukCho
JungukCho previously approved these changes Nov 18, 2021
Copy link
Contributor

@JungukCho JungukCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

azureChainGrepPattern string = "Chain AZURE-NPM"
minAzureChainStringLength int = len(azureChainGrepPattern)
azureChainStartIndex int = 6
minLineNumberStringLength int = 3 // TODO transferred from iptm.go and not sure why this length is important, but will update the function its used in later anyways
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: curious, is it idiomatic to put comment just next to code?
If not, if the comment is long, it would be nice to put above the line since I think it is more readable.

But we can update it since we need to resolve the question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just habit. I'll put comments on top now, but ya this variable will be deleted next PR anyways

@JungukCho
Copy link
Contributor

JungukCho commented Nov 18, 2021

Can you apply comments based on previous comments in the function (may need to slightly change for v2) when creating NewDataPlane - https://github.com/Azure/azure-container-networking/blob/master/npm/pkg/dataplane/dataplane.go#L70 ? I am not sure how windows works, but for linux it is good to comment them.

So I see we should be resetting policy manager first, then ipset manager (I'll update this). Was there anything else the comment refers to?

There is dependency. We have to delete iptables first and then delete ipset. If not, kernel complains ipset is being used. So, deleting iptables is failed. So, the previous comments (1 and 2) explain this hard dependency when starting NPM (basically reinitialization). Good to have comments for dependency for maintenance purpose.

	// It is important to keep order to clean-up iptables and ipset.
	// IPtables should be cleaned first to avoid failures to clean-up iptables due to "ipset is using in kernel" error
	// 1. clean-up NPM-related iptables information and then running periodic processes to keep iptables correct
	if err := c.iptMgr.UninitNpmChains(); err != nil {
		utilruntime.HandleError(fmt.Errorf("Failed to UninitNpmChains with err: %w", err))
	}

	// 2. then clean-up all NPM ipsets states
	if err := c.ipsMgr.DestroyNpmIpsets(); err != nil {
		utilruntime.HandleError(fmt.Errorf("Failed to DestroyNpmIpsets with err: %w", err))
	}

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@huntergregory huntergregory merged commit 5d2da73 into master Nov 19, 2021
@rbtr rbtr deleted the place-azure-first branch November 30, 2021 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

npm Related to NPM.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants