Skip to content

Conversation

@vakalapa
Copy link
Contributor

@vakalapa vakalapa commented May 26, 2021

If there are multiple policies applied on a cluster with TO rules in one and PORT rules in another selecting the same namespace and pod selector, then some TO traffic is being dropped. The reason for that is in PORT chains we are adding TO/FROM chain rules and DROP rules, event though TO/FORM chain MARKs the packet and returns it, since there are no RETURNs in PORT after the TO/FROM is called, we are sending the packet to DROP chains and dropping the traffic.

This fix adds a RETURN rule at the top of both DROP chains to return, in future we should check if we can reduce the TO/FROM rules and DROP rules in PORT chain

Along with that, with this PR, we will now be only adding a single JUMP to either TO/FROM chain in PORT chain for all traffic. We will not be adding any DROP rules in PORT or TO/FROM chains ! This will reduce the number jump rules drastically, keeps a simple logic on where the MARK rules are hit and where the DROP rules are hit.

Copy link
Contributor

@JungukCho JungukCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a detailed explanation to describe what was an issue and how "early on Mark" resolves the issue with one example yaml? Since I only understood what code does (e.g., control order of rules (e.g., append or insert) based on type of chain), I cannot quite map what it solved. It will help me to understand this PR. Since this PR is linked with #870 , it would be nice to use the example. Will be very appreciate to show iptables changes as well.

Also are there valid unit test for this?

@vakalapa
Copy link
Contributor Author

vakalapa commented Jun 1, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Contributor

@JungukCho JungukCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to point out exact codes, but not sure where they are.
However, after playing with this PR, in high-level, this PR removes "AZURE-NPM-EGRESS(or INGRESS)-DROPS" chains from "AZURE-NPM-EGRESS (or INGRESS)-PORT" chain to resolve packet drop issues.

One thing that I am not sure is to put "RETURN" in each "AZURE-NPM-EGRESS (or INGRESS)-DROPS" chains is necessary.
IPtables output from example yaml which enables reproducing packet drop issues. Italic parts seems redundant.

Or are there something which I missed?

Chain AZURE-NPM-EGRESS (1 references)
pkts bytes target prot opt in out source destination
0 0 AZURE-NPM-EGRESS-PORT all -- * * 0.0.0.0/0 0.0.0.0/0
0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 mark match 0x3000 / RETURN-on-EGRESS-and-INGRESS-mark-0x3000 /
0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 mark match 0x1000 /
RETURN-on-EGRESS-mark-0x1000 */
*

0 0 AZURE-NPM-EGRESS-DROPS all -- * * 0.0.0.0/0 0.0.0.0/0

Chain AZURE-NPM-EGRESS-DROPS (1 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 mark match 0x3000 / _RETURN-on-EGRESS-and-INGRESS-mark-0x3000 /
0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 mark match 0x1000 /
RETURN-on-EGRESS-mark-0x1000 /

0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 match-set azure-npm-784554818 src /* DROP-ALL-FROM-ns-default /_
0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 match-set azure-npm-784554818 src /
DROP-ALL-FROM-ns-default */

npm/iptm/iptm.go Outdated
}

func isDropsChain(chainName string) bool {
for _, chain := range IptablesAzureDropsChainList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to using "if condition" instead of "for loop with IptableAzureDropsChainList" since there are only two elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let me fix it

getAzureNPMAcceptChainRules,
getAzureNPMIngressChainRules,
getAzureNPMIngressPortChainRules,
getAzureNPMIngressFromChainRules,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need these chains no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we will need the chains, this is just removing the default RETURN rules added in these chains

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we change the function name then? GetAllChainsAndRules is misleading

@vakalapa
Copy link
Contributor Author

vakalapa commented Jun 2, 2021

I would like to point out exact codes, but not sure where they are.
However, after playing with this PR, in high-level, this PR removes "AZURE-NPM-EGRESS(or INGRESS)-DROPS" chains from "AZURE-NPM-EGRESS (or INGRESS)-PORT" chain to resolve packet drop issues.

One thing that I am not sure is to put "RETURN" in each "AZURE-NPM-EGRESS (or INGRESS)-DROPS" chains is necessary.
IPtables output from example yaml which enables reproducing packet drop issues. Italic parts seems redundant.

The redundant RETURN rules in DROP chains are a form contingency for future chain ordering when we call DROP chains directly unlike today. These rules ensure any future changes will insure us from dropping relevant MARK'ed packets (whcih has caused this issue in the first place)

npm/iptm/iptm.go Outdated
}

// (TODO) need a review of all chains to make sure we are not adding duplicate rules.
// NPM should not add multiple TO/FROM rules along with DROPs in PORT chains,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need these comments? Is it resolved in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we will not need this comment as we are taking care of these duplicate rules in this PR.

@neaggarwMS
Copy link
Member

// getAzureNPMEgressPortChainRules returns rules for AZURE-NPM-INGRESS-PORT

nit: typo


Refers to: npm/iptm/helper.go:244 in c41f87a. [](commit_id = c41f87a, deletion_comment = False)

Copy link
Contributor

@JungukCho JungukCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern in this PR is I am not 100% sure that whether it does allow invalid packets or not allow valid packets when various multiple network policies are applied in AZURE-NPM-INGRESS(or EGRESS)-DROPS chain while I think this PR does not change those behavior from previous code base and getDefaultDropEntries() guarantees that one.

Do you have any thoughts?

npm/iptm/iptm.go Outdated
if entry.IsJumpEntry {
// Since there is a RETURN statement added to each DROP chain, we need to make sure
// any new DROP rule added to ingress or egress DROPS chain is added at the BOTTOM
if entry.IsJumpEntry || isDropsChain(entry.Chain) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPM does not need entry.IsJumpEntry anymore with this PR.
So, please confirm in NPM code. If we confirm this, we can remove this if condition.
Also, review and update translatePolicy_test.go with this change accordingly. If other codes also impacted, please check their UTs as well.

As I put comment about this before, RETURNs in "AZURE-NPM-EGRESS (or INGRESS)-DROPS" chains are redundant.

If we confirm this, it seems we only need below block in iptm.go with this PR.

iptMgr.OperationFlag = util.IptablesInsertionFlag.

Let me know if I miss something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we do not need to append when the jumpentry is true, since we are now removing all jump entries and are relying on the default rules to help with that jump transitions to next chains.

As far as the redundant RETURN rule in DROP chains are concerned, i feel strongly to keep those rules in place as a contingency for any later changes. With these redundant rules, we are ensuring that no matter what we will NEVER DROP a MARK'ed packet and this assurance helps are from hitting any other bugs in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern in this PR is I am not 100% sure that whether it does allow invalid packets or not allow valid packets when various multiple network policies are applied in AZURE-NPM-INGRESS(or EGRESS)-DROPS chain while I think this PR does not change those behavior from previous code base and getDefaultDropEntries() guarantees that one.

Do you have any thoughts?

I think overall this is a better design because, keeping the translation logic aside, while evaluating rule we are making sure that a MARKED packet will never be dropped and a unmarked packet will be dropped on certain conditions. This simplifies the logic from before where we have multiple avenue to DROPs chain before we evaluated all MARK rules in a given direction (which was wrong).

@JungukCho
Copy link
Contributor

Thank you for applying comments!

One more suggestion is to remove "IsJumpEntry" member from "IptEntry" struct (https://github.com/Azure/azure-container-networking/blob/master/npm/iptm/iptm.go#L50) with this PR since the "IsJumpEntry" member is not needed anymore.
Are there reasons to keep it which I missed?

Except for above suggestion, it looks good to me.

Copy link
Contributor

@JungukCho JungukCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I think this PR has functional improvement, but I assume it also has data plane latency improvement. So, while it is not urgent now, it would be good to have some measurements to see data plane improvements with this PR later.

@vakalapa
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@vakalapa
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@vakalapa vakalapa merged commit c2b2db1 into master Jun 15, 2021
@vakalapa vakalapa deleted the vakr/fixtofromruledrop branch June 15, 2021 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug npm Related to NPM.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[NPM] TO/FROM rule and port rule on same PodSelector in multiple policies is having issues.

4 participants