Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure full functionality of AntreaProxy with proxyAll enabled when kube-proxy presents #6308

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented May 8, 2024

Depends on #6381

Resolve #6232

To ensure full functionality of AntreaProxy, except for handling ClusterIP from Nodes,
even when kube-proxy in iptables mode is present, certain key changes are implemented
when proxyAll is enabled:

The jump rules for the chains managed by Antrea, ANTREA-PREROUTING and ANTREA-OUTPUT
in nat table, are installed by inserting instead of appending to bypass the chain
KUBE-SERVICES performing Service DNAT managed by kube-proxy. Antrea ensures that
the jump rules take precedence over those managed by kube-proxy.

The iptables rules of nat table chain ANTREA-PREROUTING are like below, and they are
similar in chain ANTREA-OUTPUT.

-A ANTREA-PREROUTING -m comment --comment "Antrea: DNAT external to NodePort packets" -m set --match-set ANTREA-NODEPORT-IP dst,dst -j DNAT --to-destination 169.254.0.252

The rule is to DNAT NodePort traffic, bypassing chain KUBE-SERVICES.

The iptables rules of raw table chains ANTREA-PREROUTING / ANTREA-OUTPUT are like
below:

1. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track incoming encapsulation packets" -m udp -p udp --dport 6081 -m addrtype --dst-type LOCAL -j NOTRACK
2. -A ANTREA-PREROUTING -m comment --comment "Antrea: drop Pod multicast traffic forwarded via underlay network" -m set --match-set CLUSTER-NODE-IP src -d 224.0.0.0/4 -j DROP
3. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track request packets destined to external IPs" -m set --match-set ANTREA-EXTERNAL-IP dst -j NOTRACK
4. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track reply packets sourced from external IPs" -m set --match-set ANTREA-EXTERNAL-IP src -j NOTRACK
5. -A ANTREA-OUTPUT -m comment --comment "Antrea: do not track request packets destined to external IPs" -m set --match-set ANTREA-EXTERNAL-IP dst -j NOTRACK
  • Rules 1-2 are not new rules.
  • Rule 3 is to bypass conntrack for packets sourced from external and destined to
    externalIPs, which also results in bypassing the chains managed by Antrea Proxy
    and kube-proxy in nat table.
  • Rule 4 is to bypass conntrack for packets sourced from externalIPs, which also
    results in bypassing the chains managed by Antrea Proxy and kube-proxy in nat
    table.
  • Rule 5 is to bypass conntrack for packets sourced from local and destined to
    externalIPs, which also results in bypassing the chains managed by Antrea Proxy
    and kube-proxy in nat table.

The following are the benchmark results of a LoadBalancer Service configured with DSR mode.
The results of TCP_STREAM and TCP_RR (single TCP connection) are almost the same as that
before. The result of TCP_CRR (multiple TCP connections) performs better than before. One
reason should be that conntrack is skipped for LoadBalancer Services.

Test           v2.0 proxyAll     Dev proxyAll    Delta
TCP_STREAM     4933.97           4918.35         -0.32%
TCP_RR         8095.49           8032.4         -0.78%
TCP_CRR        1645.66           1888.93         +14.79%

TODO:

  • In iptables nat table, ensure that Antrea iptables chain ANTREA-PREROUTING and ANTREA-OUTPUT are in the first rule of corresponding default iptables chains to skip kube-proxy chains.
  • Add unit tests
  • Add e2e tests? Whether to add a new Kind test [proxyAll=true, LoadBalancerMode=DSR, NodeIPAM=true, kube-proxy presents]
  • Update related document.
  • Update manifests
  • PR / commit description
  • Benchmark of LoadBalancer

@hongliangl hongliangl requested review from tnqn and antoninbas May 8, 2024 09:22
@hongliangl hongliangl added area/proxy Issues or PRs related to proxy functions in Antrea action/release-note Indicates a PR that should be included in release notes. area/OS/linux Issues or PRs related to the Linux operating system. labels May 8, 2024
@hongliangl hongliangl added this to the Antrea v2.1 release milestone May 8, 2024
@hongliangl hongliangl force-pushed the 20240429-bypass-kube-proxy branch 2 times, most recently from ece78aa to e5a6949 Compare May 9, 2024 01:38
@hongliangl hongliangl requested review from tnqn and antoninbas May 9, 2024 09:43
@hongliangl hongliangl marked this pull request as ready for review May 9, 2024 09:44
@hongliangl hongliangl marked this pull request as draft May 17, 2024 05:48
@hongliangl hongliangl force-pushed the 20240429-bypass-kube-proxy branch 2 times, most recently from e4b5ccb to e8b24b1 Compare May 20, 2024 11:23
@hongliangl hongliangl marked this pull request as ready for review May 20, 2024 11:27
@hongliangl hongliangl force-pushed the 20240429-bypass-kube-proxy branch 2 times, most recently from 7a890bc to 1b705a5 Compare May 21, 2024 08:50
@hongliangl
Copy link
Contributor Author

/test-all

@hongliangl hongliangl force-pushed the 20240429-bypass-kube-proxy branch 3 times, most recently from 2254f1d to 716b05a Compare May 30, 2024 03:20
@hongliangl hongliangl force-pushed the 20240429-bypass-kube-proxy branch 3 times, most recently from 99916ca to cad061b Compare June 5, 2024 08:37
@hongliangl hongliangl requested a review from tnqn July 3, 2024 15:27
MatchCIDRDst(c.kubeServiceHost).
MatchTransProtocol(iptables.ProtocolTCP).
MatchPortDst(&c.kubeServicePort, nil).
SetTarget(kubeProxyServiceChain).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes strong assumption on kube-proxy runs in iptables mode but it could be ipvs or nftables mode. I'm not sure if the current implementation can work with the other two modes, but there seems no reason why it should jump to the service chain directly here, could it just return from this chain?

Another possible approach is, handle traffic to external ips and nodeports only in this patch, given there seems no obvious benefit to redirect clusterIP traffic to OVS at the moment, but it takes some code to specially handle kubernetes service's cluster IP. In theory, there should be no "external to ClusterIP" traffic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible approach is, handle traffic to external ips and nodeports only in this patch, given there seems no obvious benefit to redirect clusterIP traffic to OVS at the moment, but it takes some code to specially handle kubernetes service's cluster IP. In theory, there should be no "external to ClusterIP" traffic.

I'm fine with this, and the code will be simpler. As for ClusterIP, maybe we could handle that in the future if there are users that need external to ClusterIP.

pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
docs/antrea-proxy.md Outdated Show resolved Hide resolved
docs/antrea-proxy.md Outdated Show resolved Hide resolved
docs/antrea-proxy.md Outdated Show resolved Hide resolved
docs/antrea-proxy.md Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20240429-bypass-kube-proxy branch 2 times, most recently from 6f666b7 to bcd4544 Compare July 8, 2024 06:54
@hongliangl hongliangl requested review from tnqn and jianjuns July 8, 2024 06:57
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes, this is now easier to review

pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
@@ -829,6 +929,12 @@ func (c *Client) restoreIptablesData(podCIDR *net.IPNet,
"-j", iptables.DNATTarget,
"--to-destination", nodePortDNATVirtualIP.String(),
}...)
writeLine(iptablesData, []string{
"-A", antreaPreRoutingChain,
"-m", "comment", "--comment", `"Antrea: accept external to external IP packets"`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is nat table, why accepting packets here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this. According to the document :

NAT
This is the realm of the `nat' table, which is fed packets from two netfilter hooks: for non-local packets, the NF_IP_PRE_ROUTING and NF_IP_POST_ROUTING hooks are perfect for destination and source alterations respectively. If CONFIG_IP_NF_NAT_LOCAL is defined, the hooks NF_IP_LOCAL_OUT and NF_IP_LOCAL_IN are used for altering the destination of local packets.

This table is slightly different from the `filter' table, in that only the first packet of a new connection will traverse the table: the result of this traversal is then applied to all future packets in the same connection.

The IPs matched by the ipset has been enforced with target NOTRACK, and the corresponding packets will be processed in nat table.

pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
@@ -744,6 +744,7 @@ func testProxyHairpin(t *testing.T, isIPv6 bool) {
agnhostHost := fmt.Sprintf("agnhost-host-%v", isIPv6)
createAgnhostPod(t, data, agnhostHost, node, true)
t.Run("HostNetwork Endpoints", func(t *testing.T) {
skipIfKubeProxyEnabled(t, data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this test is ran when kube-proxy is removed and Antrea Proxy handles ClusterIP traffic. Now, as the kube-proxy presents and handles ClusterIP traffic, the result of the test won't be the expected result.

handle all Service traffic.

Starting with Antrea v2.1, when only `proxyAll` is enabled, AntreaProxy is capable
of handing all kinds of Service traffic except for ClusterIP, even if kube-proxy is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for -> ", not just ClusterIP Service traffic"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be except. LoadBalancer, NodePort and externalIP traffic except ClusterIP.

@hongliangl hongliangl force-pushed the 20240429-bypass-kube-proxy branch 3 times, most recently from 1041cd6 to 5d5faf7 Compare July 9, 2024 07:03
@hongliangl hongliangl requested a review from tnqn July 9, 2024 08:10
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, some nits

docs/antrea-proxy.md Outdated Show resolved Hide resolved
docs/antrea-proxy.md Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
@@ -745,6 +745,9 @@ func testProxyHairpin(t *testing.T, isIPv6 bool) {
createAgnhostPod(t, data, agnhostHost, node, true)
t.Run("HostNetwork Endpoints", func(t *testing.T) {
skipIfProxyAllDisabled(t, data)
// AntreProxy with proxyAll enabled doesn't handle ClusterIP traffic sourced from local Node when kube-proxy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// AntreProxy with proxyAll enabled doesn't handle ClusterIP traffic sourced from local Node when kube-proxy
// AntreaProxy with proxyAll enabled doesn't handle ClusterIP traffic sourced from local Node when kube-proxy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, when proxyAll is enabled and kube-proxy is running, such hairpin case can still work, it's just the client IP is not the virtual IP, right? If yes, I think the test should just change the expected IP, instead of not validating this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it still can work, but the expected IP is not the virtual IP. I will update the test.

Comment on lines 53 to 56
Starting with Antrea v2.1, when `proxyAll` is enabled, AntreaProxy is capable
of handling all kinds of Service traffic except ClusterIP, even if kube-proxy in
iptables mode is present. This is accomplished by prioritizing the rules installed
by Antrea Proxy redirecting Service traffic to OVS over those installed by kube-proxy.
Copy link
Member

@tnqn tnqn Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this:

Starting with Antrea v2.1, when `proxyAll` is enabled, AntreaProxy will handle
Service traffic destined to NodePort, LoadBalancerIP and ExternalIP, even if
kube-proxy is present. This benefits users who want to take advantage of
AntreaProxy's advanced features, such as Direct Server Return (DSR) mode, but
lack control over kube-proxy's installation. This is accomplished by
prioritizing the rules installed by AntreaProxy over those installed by
kube-proxy, thus it works only with kube-proxy iptables mode. Support for other
kube-proxy modes may be added in the future.

cc @jianjuns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the document.

@hongliangl hongliangl force-pushed the 20240429-bypass-kube-proxy branch 2 times, most recently from 1f3f1ce to 8f6410e Compare July 11, 2024 02:36
…ube-proxy presents

To ensure full functionality of AntreaProxy, except for handling ClusterIP from Nodes,
even when kube-proxy in iptables mode is present, certain key changes are implemented
when proxyAll is enabled:

The jump rules for the chains managed by Antrea, `ANTREA-PREROUTING` and `ANTREA-OUTPUT`
in nat table, are installed by inserting instead of appending to bypass the chain
`KUBE-SERVICES` performing Service DNAT managed by kube-proxy. Antrea ensures that
the jump rules take precedence over those managed by kube-proxy.

The iptables rules of nat table chain `ANTREA-PREROUTING` are like below, and they are
similar in chain `ANTREA-OUTPUT`.

```
-A ANTREA-PREROUTING -m comment --comment "Antrea: DNAT external to NodePort packets" -m set --match-set ANTREA-NODEPORT-IP dst,dst -j DNAT --to-destination 169.254.0.252
```

The rule is to DNAT NodePort traffic, bypassing chain `KUBE-SERVICES`.

The iptables rules of raw table chains ANTREA-PREROUTING / ANTREA-OUTPUT are like
below:

```
1. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track incoming encapsulation packets" -m udp -p udp --dport 6081 -m addrtype --dst-type LOCAL -j NOTRACK
2. -A ANTREA-PREROUTING -m comment --comment "Antrea: drop Pod multicast traffic forwarded via underlay network" -m set --match-set CLUSTER-NODE-IP src -d 224.0.0.0/4 -j DROP
3. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track request packets destined to external IPs" -m set --match-set ANTREA-EXTERNAL-IP dst -j NOTRACK
4. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track reply packets sourced from external IPs" -m set --match-set ANTREA-EXTERNAL-IP src -j NOTRACK
5. -A ANTREA-OUTPUT -m comment --comment "Antrea: do not track request packets destined to external IPs" -m set --match-set ANTREA-EXTERNAL-IP dst -j NOTRACK
```

- Rules 1-2 are not new rules.
- Rule 3 is to bypass conntrack for packets sourced from external and destined to
  externalIPs, which also results in bypassing the chains managed by Antrea Proxy
  and kube-proxy in nat table.
- Rule 4 is to bypass conntrack for packets sourced from externalIPs, which also
  results in bypassing the chains managed by Antrea Proxy and kube-proxy in nat
  table.
- Rule 5 is to bypass conntrack for packets sourced from local and destined to
  externalIPs, which also results in bypassing the chains managed by Antrea Proxy
  and kube-proxy in nat table.

The following are the benchmark results of a LoadBalancer Service configured with DSR mode.
The results of TCP_STREAM and TCP_RR (single TCP connection) are almost the same as that
before. The result of TCP_CRR (multiple TCP connections) performs better than before. One
reason should be that conntrack is skipped for LoadBalancer Services.

```
Test           v2.0 proxyAll     Dev proxyAll    Delta
TCP_STREAM     4933.97           4918.35         -0.32%
TCP_RR         8095.49           8032.4         -0.78%
TCP_CRR        1645.66           1888.93         +14.79%
```

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jul 11, 2024

/test-all
/test-ipv6-all
/test-ipv6-only-all

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the efforts!

one nit: since you updated all AntreaProxy to Antrea Proxy in another PR, better to update here in antrea-proxy.md as well to avoid any dependency and another round of updates.

@hongliangl
Copy link
Contributor Author

Thanks a lot for the efforts!

one nit: since you updated all AntreaProxy to Antrea Proxy in another PR, better to update here in antrea-proxy.md as well to avoid any dependency and another round of updates.

I can update document antrea-proxy.md in that PR.

@tnqn
Copy link
Member

tnqn commented Jul 11, 2024

/test-kind-ipv6-only-all
/test-kind-ipv6-all

@tnqn tnqn merged commit 5cee770 into antrea-io:main Jul 11, 2024
57 of 68 checks passed
@hongliangl hongliangl deleted the 20240429-bypass-kube-proxy branch July 11, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. area/OS/linux Issues or PRs related to the Linux operating system. area/proxy Issues or PRs related to proxy functions in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make proxyAll/LoadBalancerModeDSR work with kube-proxy present
5 participants