Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use DSCP instead of tun_metadata0 in Traceflow #1466

Merged
merged 1 commit into from
Nov 6, 2020

Conversation

gran-vmv
Copy link
Contributor

@gran-vmv gran-vmv commented Oct 30, 2020

This patch changes Traceflow dataplaneTag from tun_metadata0 to DSCP, and
removes the tunnel/encap mode restriction for inter-Node Traceflow.

This PR closes #1357

@gran-vmv gran-vmv added the status/WIP Work in progress label Oct 30, 2020
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@codecov-io
Copy link

codecov-io commented Oct 30, 2020

Codecov Report

Merging #1466 into master will increase coverage by 0.23%.
The diff coverage is 87.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1466      +/-   ##
==========================================
+ Coverage   68.24%   68.47%   +0.23%     
==========================================
  Files         165      165              
  Lines       13107    13118      +11     
==========================================
+ Hits         8945     8983      +38     
+ Misses       3226     3195      -31     
- Partials      936      940       +4     
Flag Coverage Δ
integration-tests 45.64% <16.27%> (-0.17%) ⬇️
kind-e2e-tests 55.00% <87.93%> (+0.13%) ⬆️
unit-tests 42.38% <4.65%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/agent.go 50.14% <ø> (+0.28%) ⬆️
pkg/controller/traceflow/controller.go 67.74% <ø> (+2.58%) ⬆️
pkg/agent/controller/traceflow/packetin.go 64.17% <57.14%> (+15.36%) ⬆️
pkg/agent/openflow/client.go 68.10% <83.33%> (+0.24%) ⬆️
...agent/controller/traceflow/traceflow_controller.go 82.60% <100.00%> (+5.97%) ⬆️
pkg/agent/openflow/pipeline.go 82.92% <100.00%> (+0.32%) ⬆️
...ver/registry/controlplane/nodestatssummary/rest.go 50.00% <0.00%> (-50.00%) ⬇️
pkg/agent/stats/collector.go 91.95% <0.00%> (-5.75%) ⬇️
...kg/controller/networkpolicy/store/networkpolicy.go 77.58% <0.00%> (-3.45%) ⬇️
pkg/ovs/openflow/ofctrl_bridge.go 71.14% <0.00%> (-0.80%) ⬇️
... and 5 more

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, some minor comments

pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
pkg/controller/traceflow/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/traceflow/packetin.go Show resolved Hide resolved
pkg/agent/controller/traceflow/packetin.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
Comment on lines 231 to 233
// OfTraceflowMarkRange stores dataplaneTag at range 2-7 in DSCP field of IP header.
// IPv4/v6 DSCP (Bits 2-7) Field supports exact match only.
OfTraceflowMarkRange = binding.Range{2, 7}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit confusing. The DSCP field is bits 2 through 7 of the IP ToS field. I don't think "range 2-7 in DSCP field" makes sense.

We are only using this constant in the code below:

packetOutBuilder = packetOutBuilder.AddLoadAction(binding.NxmFieldIPTos, uint64(dataplaneTag), OfTraceflowMarkRange)

it may make more sense to define the constant locally where it is used, rather than here in the middle of the "register ranges"

BTW, isn't there an OVS field for DSCP we can use directly? OXM_OF_IP_DSCP? @wenyingd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to traceflowMarkToSRange, move to a separate section and updated comments.
I prefer to store all ranges in same place, because it is easy to query.

Yes, OXM_OF_IP_DSCP is what we want, will test with this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested OXM_OF_IP_DSCP, still requires range{0, 5}. We don't need to move to this.

pkg/controller/traceflow/controller.go Outdated Show resolved Hide resolved
@jianjuns
Copy link
Contributor

jianjuns commented Nov 2, 2020

FYI. - I think OVS folks are evaluating the tun_metadata fix and promised to give us an update today. It might still be possible to have the fix for Antrea 0.11.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, but would let team decide whether we go this way or wait for OVS fix.

@vicky-liu
Copy link

@vicky-liu
Copy link

@jianjuns @antoninbas , The fix made by OVS team requires the latest kernel version, I think it's a strict limitation for users to enable Antrea Traceflow. I still prefer to leverage DSCP to support Traceflow for both encap and no-encap. Please share your concerns about this solution.

@antoninbas
Copy link
Contributor

@vicky-liu that was my original concern, that the fix would require a patch to the OVS kernel datapath

yes, we should move forward with the DSCP workaround

@jianjuns
Copy link
Contributor

jianjuns commented Nov 4, 2020

Sure. Let us go the PR for 0.11. We can decide if we want to change to DSCP + INT in future releases.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to support TF for NoEncap and NetworkPolicyOnly mode? If so, we should update the AgentOptions.validate(), and also the antrea-agent.conf.

pkg/controller/traceflow/controller.go Outdated Show resolved Hide resolved
pkg/ovs/openflow/interfaces.go Outdated Show resolved Hide resolved
pkg/agent/controller/traceflow/packetin.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
return nil, nil, err
}
}
if outputPort == config.DefaultTunOFPort || outputPort == config.HostGatewayOFPort {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - if the destination IP of the TF packet is indeed the gw0 IP, what should be the expected behavior here? And the action should be forwarded or delivered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we restrict destination must be Pod. If the destination is other IP, we cannot identify if the packet is properly delivered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TraceflowSpec the destination could be an IP? And if it is the Pod IP, I think we still track "delivered"?

In theory we can handle the case destination IP is gw0 IP too. If you want not to support that in this PR, probably add a comment in InstallTraceflowFlows() to explain that.

Even when the IP is an external one, I think antrea-controller should handle it (as we know it is forwarded through gw0), and not to report timeout failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle this case by checking whether the dstIP is in Pod cache to identify if the action is Delivered or Forwarded, but I think we should submit another patch after this patch merged.

pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Show resolved Hide resolved
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-all-features-conformance: to trigger conformance tests with all alpha features enabled.
  • /skip-all-features-conformance: to skip conformance tests with all alpha features enabled.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@gran-vmv
Copy link
Contributor Author

gran-vmv commented Nov 4, 2020

/test-all

@gran-vmv gran-vmv added this to the Antrea v0.11.0 release milestone Nov 5, 2020
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments only, lgtm

pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/controller/traceflow/packetin.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
func (c *client) traceflowL2ForwardOutputFlows(dataplaneTag uint8, category cookie.Category) []binding.Flow {
flows := []binding.Flow{}
// Output and SendToController if output port is tunnel or gateway port.
// The gw0 IP as Traceflow destination is not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should open an issue for this, in order to document the limitation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted #1500 for this enhancement.

pkg/controller/traceflow/controller.go Outdated Show resolved Hide resolved
@gran-vmv gran-vmv force-pushed the tf-dscp branch 2 times, most recently from c9e6634 to 7a67b5f Compare November 6, 2020 04:08
This patch changes Traceflow dataplaneTag from tun_metadata0 to DSCP, and
remove the tunnel/encap mode restriction for inter-Node Traceflow.
@gran-vmv
Copy link
Contributor Author

gran-vmv commented Nov 6, 2020

/test-all

@gran-vmv
Copy link
Contributor Author

gran-vmv commented Nov 6, 2020

/test-networkpolicy

@gran-vmv gran-vmv merged commit 619b678 into antrea-io:master Nov 6, 2020
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Nov 10, 2020
This patch changes Traceflow dataplaneTag from tun_metadata0 to DSCP, and
remove the tunnel/encap mode restriction for inter-Node Traceflow.
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Nov 10, 2020
This patch changes Traceflow dataplaneTag from tun_metadata0 to DSCP, and
remove the tunnel/encap mode restriction for inter-Node Traceflow.
@jianjuns
Copy link
Contributor

Do you plan to support TF for NoEncap and NetworkPolicyOnly mode? If so, we should update the AgentOptions.validate(), and also the antrea-agent.conf.

Seems I forgot to confirm this one. Could you answer? @gran-vmv

@gran-vmv
Copy link
Contributor Author

Do you plan to support TF for NoEncap and NetworkPolicyOnly mode? If so, we should update the AgentOptions.validate(), and also the antrea-agent.conf.

Seems I forgot to confirm this one. Could you answer? @gran-vmv

I checked Traceflow in noEncap mode, and it can work. I think AgentOptions.validate() doesn't validate traceflow options, and the validation in agent/traceflow/traceflow_comtroller is removed by this PR.

@jianjuns
Copy link
Contributor

Then at least change the yaml and documentation.

@gran-vmv
Copy link
Contributor Author

Then at least change the yaml and documentation.

Currently no changes needed for yaml and doc.
In current Traceflow doc, the low level tech is not mentioned such as tun_metadata/DSCP, and the interNode restriction is not mentioned too. After this PR merged, the interNode restriction is removed and we don't need to change this unless we want to add some details into this doc.

@jianjuns
Copy link
Contributor

Ok. You are right. I thought we commented in the yaml that TF requires encap and Geneve.

antoninbas pushed a commit that referenced this pull request Nov 11, 2020
This patch changes Traceflow dataplaneTag from tun_metadata0 to DSCP, and
remove the tunnel/encap mode restriction for inter-Node Traceflow.
@gran-vmv gran-vmv deleted the tf-dscp branch January 21, 2021 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Traceflow fails due to destination Node did not receive the trace packet
8 participants