Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Bugfix: SNAT is not performed on the +est packets. #2702

Merged
merged 1 commit into from Sep 8, 2021

Conversation

wenyingd
Copy link
Contributor

@wenyingd wenyingd commented Sep 2, 2021

Load the SNAT mark in snatTable on the established connection packets which is sent out from local Pods.
Fixes #2701

Signed-off-by: wenyingd wenyingd@vmware.com

@wenyingd wenyingd changed the base branch from main to release-1.2 September 2, 2021 03:33
@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 2, 2021

/test-all

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 2, 2021

/test-windows-all
/test-conformance
/test-windows-e2e

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 2, 2021

/test-windows-conformance
/test-windows-networkpolicy
/test-ipv6-all
/test-ipv6-only-all

l3FwdTable := c.pipeline[l3ForwardingTable]
nextTable := l3FwdTable.GetNext()
snatTable := c.pipeline[snatTable]
nextTable := snatTable.GetNext()
Copy link
Contributor Author

@wenyingd wenyingd Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns @tnqn I am thinking shall we resubmit the packet to the l3DecTTLTable after SNAT on Windows? In current implementation (<= Antrea 1.2), we output the SNATed packet to the uplink interface, and we didn't decrement TTL because we define l2ForwardingCalcTable is the one next to snatTable. I think it is possibly because we output the packet to antrea-gw0 on Linux, and TTL is decremented on the Linux host.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

Copy link
Contributor

@shettyg shettyg Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyingd @jianjuns It looked to me that you are outputting SNATted packet to antrea-gw0 interface (as destination IP is in a different subnet) and not to the uplink interface directly on Windows too. Once it goes to antrea-gw0, it comes back to OVS pipeline via br-int (as that is the one that has the nodeIP). And then you o/p it via uplink interface (e.g Ethernet).

You probably meant the same thing. But, I wanted to make sure that I understand the converstion.

Copy link
Contributor

@shettyg shettyg Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are submitting it directly to uplink interface, how would you know the mac address of this out of k8s cluster IP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shettyg Thanks for reminding me. You are correct, we actually output the packet th antrea-gw0 even if we perform SNAT inside OVS. The current code for TTL is correct, and please ignore my previous suggestion. I would add comments here for the next table so that we could avoid similar confusion in future.

@@ -95,7 +95,7 @@ data:
# supporting Pod traffic across IP subnets.
# hybrid: noEncap if source and destination Nodes are on the same subnet, otherwise encap.
#
#trafficEncapMode: encap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will revert the change before merge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this change is to trigger CI test only.

l3FwdTable := c.pipeline[l3ForwardingTable]
nextTable := l3FwdTable.GetNext()
snatTable := c.pipeline[snatTable]
nextTable := snatTable.GetNext()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

@codecov-commenter
Copy link

codecov-commenter commented Sep 3, 2021

Codecov Report

Merging #2702 (930d41a) into release-1.2 (191c7b7) will increase coverage by 4.85%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           release-1.2    #2702      +/-   ##
===============================================
+ Coverage        60.19%   65.04%   +4.85%     
===============================================
  Files              284      284              
  Lines            22385    25672    +3287     
===============================================
+ Hits             13474    16699    +3225     
+ Misses            7474     7413      -61     
- Partials          1437     1560     +123     
Flag Coverage Δ
e2e-tests 55.94% <ø> (?)
kind-e2e-tests 47.18% <ø> (+0.02%) ⬆️
unit-tests 42.23% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controller/egress/ipallocator/allocator.go 67.82% <0.00%> (-15.16%) ⬇️
pkg/controller/networkpolicy/endpoint_querier.go 77.64% <0.00%> (-13.79%) ⬇️
pkg/apis/controlplane/v1beta1/conversion.go 72.44% <0.00%> (-11.89%) ⬇️
pkg/legacyapis/core/v1alpha2/register.go 69.23% <0.00%> (-10.77%) ⬇️
pkg/controller/egress/controller.go 76.76% <0.00%> (-10.44%) ⬇️
pkg/apis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/legacyapis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/ovs/openflow/ofctrl_meter.go 33.84% <0.00%> (-10.16%) ⬇️
pkg/legacyapis/security/v1alpha1/register.go 73.33% <0.00%> (-10.00%) ⬇️
pkg/agent/config/traffic_encap_mode.go 91.30% <0.00%> (-8.70%) ⬇️
... and 268 more

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 3, 2021

/test-all
/test-ipv6-all
/test-ipv6-only-all

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 6, 2021

/test-all-features-conformance
/test-conformance
/test-windows-conformance
/test-windows-e2e
/test-windows-networkpolicy

@lzhecheng
Copy link
Contributor

/test-e2e

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 6, 2021

/test-all
/test-ipv6-all
/test-ipv6-only-all

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 6, 2021

/test-ipv6-conformance
/test-ipv6-e2e
/test-windows-networkpolicy

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 6, 2021

/test-e2e
/test-ipv6-e2e
/test-ipv6-conformance
/test-ipv6-networkpolicy
/test-windows-networkpolicy

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 7, 2021

/skip-e2e
The failed test cases in e2e include: TestIPSecTunnelConnectivity and TestIPSecDeleteStaleTunnelPorts. These two cases are not for noEncap mode.

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 7, 2021

/test-windows-networkpolicy

@luolanzone luolanzone mentioned this pull request Sep 7, 2021
@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 7, 2021

/skip-windows-networkpolicy
All windows-networkpolicy tests are passed on my local setup. I failed to trigger CI tests on windows-networkpolicy on Antrea v1.2 in noEncap mode.

// Do not decrement TTL on the SNATed packet on Windows. This is because the SNATed packet is output to antrea-gw0
// first to learn the destination MAC using the host ARP cache. And antrea-gw0 will decrement the TTL when it
// forwards the packet to the bridge interface on which the host IP address is configured.
nextTable := c.pipeline[l3DecTTLTable].GetNext()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snatTable's default next table is l2ForwardingCalcTable, why we don't use its next table directly here but uses a non-related table's?
I think the meaning of default table is packets will go to it by default unless we explicitly change it for some reason. So I don't quite understand what's the indication here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to explicitly skip "l3DecTTLTable" for the reason I commented here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the comment but what's the point of the default next table then? I think this would confuse readers that snatTable's next table is l3DecTTLTable so we explicitly skip it. Overriding nextTable only makes sense when we use a non-default value. If the normal path should go snatTable -> l3DecTTLTable, it would make sense to skip it like this. When the pipeline is designed as snatTable-> l2ForwardingCalcTable, doesn't it imply we only delTTL for particular traffic explicitly? I'm fine with the comment but want to avoid this confusing nextTable declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated.

Load the SNAT mark in snatTable on the established connection packets which is sent out from local Pods.

Signed-off-by: wenyingd <wenyingd@vmware.com>
@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 7, 2021

/test-all
/skip-ipv6-all
/skip-ipv6-only-all

I have passed Windows conformance test, networkpolicy test, and windows-e2e-test in noEncap mode on my local setup. Since the change is only on Windows Node and IPv6 cluster is not impacted, I skip IPv6 related tests.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 8, 2021

/test-networkpolicy

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 8, 2021

/test-all

@tnqn tnqn merged commit 6d7478f into antrea-io:release-1.2 Sep 8, 2021
@antoninbas
Copy link
Contributor

antoninbas commented Sep 8, 2021

@wenyingd could you take care of back-porting to release-1.1?

@wenyingd
Copy link
Contributor Author

wenyingd commented Sep 9, 2021

@wenyingd could you take care of back-porting to release-1.1?

Yes, I would check if it is needed in release-1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants