Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change L2 forwarding flows for traffic to gateway and tunnel #1594

Merged
merged 1 commit into from
Dec 4, 2020

Conversation

jianjuns
Copy link
Contributor

Let the packets to the tunnel port go through L2ForwardingCalcTable
as well, to be consistent with other types of traffic.
Redirect packets to gateway and tunnel, as well as BUM packets to
conntrackCommitTable and bypass ingress NetworkPolicy enforcement.

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-all-features-conformance: to trigger conformance tests with all alpha features enabled.
  • /skip-all-features-conformance: to skip conformance tests with all alpha features enabled.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@codecov-io
Copy link

codecov-io commented Nov 24, 2020

Codecov Report

Merging #1594 (22ffe91) into master (9d3d10b) will decrease coverage by 2.78%.
The diff coverage is 37.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1594      +/-   ##
==========================================
- Coverage   63.31%   60.53%   -2.79%     
==========================================
  Files         170      181      +11     
  Lines       14250    15562    +1312     
==========================================
+ Hits         9023     9421     +398     
- Misses       4292     5155     +863     
- Partials      935      986      +51     
Flag Coverage Δ
e2e-tests 47.88% <35.63%> (?)
kind-e2e-tests 45.78% <30.93%> (-9.61%) ⬇️
unit-tests 40.46% <14.85%> (-0.81%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <0.00%> (ø)
cmd/antrea-agent/options.go 21.69% <0.00%> (+0.97%) ⬆️
...gent/controller/noderoute/node_route_controller.go 61.04% <ø> (+14.58%) ⬆️
pkg/agent/stats/collector.go 97.72% <ø> (ø)
pkg/apiserver/apiserver.go 73.95% <0.00%> (-12.07%) ⬇️
pkg/apiserver/certificate/cacert_controller.go 38.73% <0.00%> (-17.65%) ⬇️
pkg/apiserver/handlers/webhook/mutation_crd.go 0.00% <0.00%> (ø)
pkg/controller/networkpolicy/mutate.go 0.00% <0.00%> (ø)
pkg/controller/networkpolicy/tier.go 0.00% <ø> (-90.00%) ⬇️
pkg/controller/networkpolicy/validate.go 0.00% <0.00%> (-44.83%) ⬇️
... and 99 more

@jianjuns
Copy link
Contributor Author

@wenyingd and @tnqn : let me know what you think about the change. Want to make flows more consistent for all types of traffic.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put L2ForwardCalculation logic in one table looks good to me. Just a few comments about the pipeline and the global variables.

l3DecTTLTable: bridge.CreateTable(l3DecTTLTable, conntrackCommitTable, binding.TableMissActionNext),
l2ForwardingCalcTable: bridge.CreateTable(l2ForwardingCalcTable, IngressEntryTable, binding.TableMissActionNext),
l3DecTTLTable: bridge.CreateTable(l3DecTTLTable, l2ForwardingCalcTable, binding.TableMissActionNext),
l2ForwardingCalcTable: bridge.CreateTable(l2ForwardingCalcTable, conntrackCommitTable, binding.TableMissActionNext),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the next table of l2ForwardingCalcTable is changed to conntrackCommitTable? I feel from the pipeline's perspective its next table is still ingressEntryTable, just some particular traffic can skip it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But I hope to change the default flow (which is for BUM traffic) to goto conntrackCommitTable, and be dropped there.
What you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, but would prefer we don't re-declare that the next table is conntrackCommitTable in l2ForwardCalcFlow.

@@ -87,6 +87,8 @@ const (
)

var (
ingressEntryTable, egressEntryTable binding.TableIDType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we still make ingressEntryTable as the next table of l2ForwardCalcTable, the global variable won't be needed.

@jianjuns jianjuns force-pushed the l2fwd-flow branch 2 times, most recently from 90fc5c7 to d74d739 Compare November 24, 2020 22:28
@@ -785,8 +794,7 @@ func (c *client) l3FwdFlowToPod(localGatewayMAC net.HardwareAddr, podInterfaceIP
flows = append(flows, flowBuilder.MatchDstIP(ip).
Action().SetSrcMAC(localGatewayMAC).
Action().SetDstMAC(podInterfaceMAC).
Action().DecTTL().
Action().GotoTable(l3FwdTable.GetNext()).
Action().GotoTable(l3DecTTLTable).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyingd : I made this change, because later with SNAT policy, we can have packets from gw0 which will hit this flow as well after un-SNAT'd (such packets' dst MAC will be rewritten to vMAC in conntrackStateTable), but we want not to decrement TTL for them.
Also we can keep all routed packets (handled by l3Fwd funcs) consistent.

Let me know what you think.

Copy link
Contributor

@wenyingd wenyingd Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that you have changed flows in L3DecTTLTable: 1) not dec TTL if the packet is from gw0; 2) dec TTL for other cases.And I guess you want to use L3DecTTLTable in both source and destination Node (originally I only want to use it on source Node). I am not sure if a packet is sending from local Pod to gateway port would also go into L3DecTTLTable (e.g. in NoEncap mode), if yes, then the logics in L3DecTTLTable might be not as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I hope to use L3DecTTLTable on both source and destination port, for the reasons I mentioned above.

if a packet is sending from local Pod to gateway port would also go into L3DecTTLTable (e.g. in NoEncap mode)
I think in that case the host stack should decrement the TTL? So, in l3FwdFlowRouteToGW() I skip L3DecTTLTable().

func (c *client) l3FwdFlowRouteToGW(gwMAC net.HardwareAddr, category cookie.Category) []binding.Flow {
l3FwdTable := c.pipeline[l3ForwardingTable]
var flows []binding.Flow
for _, ipProto := range c.ipProtocols {
flows = append(flows, l3FwdTable.BuildFlow(priorityLow).MatchProtocol(ipProto).
Action().SetDstMAC(gwMAC).
Action().DecTTL().
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyingd : do you think this case should bypass DecTTL()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the packet should be forwarded to the gateway interface, and the host should dec TTL, so we don't need TTL decrement in OpenFlow. @tnqn what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.

Done())
flows = append(flows,
// Skip packets from the gateway interface.
decTTLTable.BuildFlow(priorityHigh).MatchPriority(priorityNormal).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyingd : I assume kube-proxy packets (even they come from tunnel/remote Node) should bypass DecTTL(). What you think?

@@ -1096,7 +1094,6 @@ func (c *client) allowRulesMetricFlows(conjunctionID uint32, ingress bool) []bin
metricFlow := func(isCTNew bool, protocol binding.Protocol) binding.Flow {
return c.pipeline[metricTableID].BuildFlow(priorityNormal).
MatchProtocol(protocol).
MatchPriority(priorityNormal).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyingd : I assume MatchPriority() is not needed?

Copy link
Contributor

@wenyingd wenyingd Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could remove the call for MatchPriority() in the functions in this file (pipeline.go), but please note that we should keep it in file (network_policy.go), for CNP might call this API to reset flow priority. An original discussion is #803 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you also need to change the svg file if changing the pipeline.

if err := c.ofEntryOperations.Add(flow); err != nil {
flows := []binding.Flow{
c.tunnelClassifierFlow(config.DefaultTunOFPort, cookie.Default),
c.l2ForwardCalcFlow(globalVirtualMAC, config.DefaultTunOFPort, true, cookie.Default),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globalVirtualMAC is also used in some other cases like Windows SNAT as the mark to replace the dst MAC. Do you think we could unify the usage, e.g., using this MAC for tunnel traffic, and using macRewriteMark to rewrite dstMAC in any Node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That (using macRewriteMark to indicate MAC rewrite) makes sense to me. But I am not familiar with all other cases. Would you make the change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I would make that change.

@jianjuns
Copy link
Contributor Author

Sure. I will update ovs-pipeline.md after you agree on the design.

@jianjuns
Copy link
Contributor Author

If you are fine, I hope to change ovs-pipeline.md after this change - need @antoninbas to review the SVG change who will come back next week. @wenyingd

@wenyingd
Copy link
Contributor

If you are fine, I hope to change ovs-pipeline.md after this change - need @antoninbas to review the SVG change who will come back next week. @wenyingd

I am fine with it.

l2FwdCalcTable := c.pipeline[l2ForwardingCalcTable]
nextTable := ingressEntryTable
if skipIngressRules {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking is it possible to leverage the "ofport" value to decide next table? e.g., if the value is tunnel port or gateway port (I assume we have no ingress rules applied on the gw0), then skipIngressRules, otherwise go to ingressEntryTable.
Then we could not use the parameter "skipIngressRule"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is possible too, but do you think better to put the logic in l2FwdCalcTable and pipeline.go or the callers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personnally, I would prefer to put it inside the l2FwdCalcTable's logic, that would make the call simple, and easier to understand. If in the callers, reader should learn the pipeline case by case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Changed as your suggested.

@jianjuns
Copy link
Contributor Author

jianjuns commented Dec 1, 2020

@wenyingd @tnqn : do you have further comments on this one?

func (c *client) l3FwdFlowRouteToGW(gwMAC net.HardwareAddr, category cookie.Category) []binding.Flow {
l3FwdTable := c.pipeline[l3ForwardingTable]
var flows []binding.Flow
for _, ipProto := range c.ipProtocols {
flows = append(flows, l3FwdTable.BuildFlow(priorityLow).MatchProtocol(ipProto).
Action().SetDstMAC(gwMAC).
Action().DecTTL().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.

pkg/agent/openflow/pipeline.go Show resolved Hide resolved
l3DecTTLTable: bridge.CreateTable(l3DecTTLTable, conntrackCommitTable, binding.TableMissActionNext),
l2ForwardingCalcTable: bridge.CreateTable(l2ForwardingCalcTable, IngressEntryTable, binding.TableMissActionNext),
l3DecTTLTable: bridge.CreateTable(l3DecTTLTable, l2ForwardingCalcTable, binding.TableMissActionNext),
l2ForwardingCalcTable: bridge.CreateTable(l2ForwardingCalcTable, conntrackCommitTable, binding.TableMissActionNext),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, but would prefer we don't re-declare that the next table is conntrackCommitTable in l2ForwardCalcFlow.

tnqn
tnqn previously approved these changes Dec 3, 2020
pkg/agent/openflow/pipeline.go Show resolved Hide resolved
Let the packets to the tunnel port go through L2ForwardingCalcTable
as well, to be consistent with other types of traffic.
Redirect packets to gateway and tunnel, as well as BUM packets to
conntrackCommitTable and bypass ingress NetworkPolicy enforcement.
Let l3DecTTLTable decrement TTL for packets from tunnel to local Pods.
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing my comments!

@jianjuns
Copy link
Contributor Author

jianjuns commented Dec 4, 2020

/test-all

@srikartati
Copy link
Member

srikartati commented Dec 7, 2020

Hi @jianjuns,
e2e tests on kind cluster with noEncap mode seems to be failing after this PR is merged, in particular TestNetworkPolicyStats.
I checked a few PRs with this commit, they have the same failure. Could you confirm if this is true and is there any issue?

One of the current PRs (#1617): https://github.com/vmware-tanzu/antrea/pull/1617/checks?check_run_id=1501453349
I see failure on my PR too.. #1549

@jianjuns
Copy link
Contributor Author

jianjuns commented Dec 7, 2020

@srikartati Sure. Let me check if it is relevant to my PR. Add @tnqn for his notice too.

@jianjuns
Copy link
Contributor Author

jianjuns commented Dec 7, 2020

@tnqn : from the test log, seems egress NP stats cannot be retrieved. However, I did not see how my PR is relevant to that. The only change seems relevant is that I removed "MatchPriority" in allow/dropRulesMetricFlows, as it seems not needed.
Do you have any idea?

@antoninbas
Copy link
Contributor

@jianjuns looks like it's ingress NP stats, not egress

could it be because of this: https://github.com/vmware-tanzu/antrea/blob/da021fa75c7eceb1082fbef8345826c28fbe39a3/pkg/agent/openflow/pipeline.go#L697-L699

I'm not 100% sure, but is it possible that in noEncap mode we do not create DefaultTunOFPort, and thus that port number can be used for a regular Pod (therefore causing ingress NP enforcement to be bypassed for this Pod). Not sure why this would be the only test to fail. But I notice that we typically always use networkConfig.TrafficEncapMode.SupportsEncap() in the rest of the code before comparing to DefaultTunOFPort.

@jianjuns
Copy link
Contributor Author

jianjuns commented Dec 7, 2020

@antoninbas I thought ingress NP stats check passed but egress stats was not found. But you are right it might be ingress stats was not correctly updated.

And your theory about DefaultTunOFPort makes sense! At least we should not decide the next table based on the ofPort value. Let me change it.

@jianjuns
Copy link
Contributor Author

jianjuns commented Dec 7, 2020

Seems the failure is gone after changing the code: #1626.
@antoninbas @srikartati @tnqn

antoninbas pushed a commit that referenced this pull request Dec 23, 2020
Let the packets to the tunnel port go through L2ForwardingCalcTable
as well, to be consistent with other types of traffic.
Redirect packets to gateway and tunnel, as well as BUM packets to
conntrackCommitTable and bypass ingress NetworkPolicy enforcement.
Let l3DecTTLTable decrement TTL for packets from tunnel to local Pods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants