Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start AntreaProxy document #3679

Merged
merged 2 commits into from Apr 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
219 changes: 219 additions & 0 deletions docs/antrea-proxy.md
@@ -0,0 +1,219 @@
# AntreaProxy

## Table of Contents

<!-- toc -->
- [Introduction](#introduction)
- [AntreaProxy with proxyAll](#antreaproxy-with-proxyall)
- [Removing kube-proxy](#removing-kube-proxy)
- [Windows Nodes](#windows-nodes)
- [Special use cases](#special-use-cases)
- [When you are using NodeLocal DNSCache](#when-you-are-using-nodelocal-dnscache)
- [When you want your external LoadBalancer to handle Pod traffic](#when-you-want-your-external-loadbalancer-to-handle-pod-traffic)
- [Known issues or limitations](#known-issues-or-limitations)
<!-- /toc -->

## Introduction

AntreaProxy was first introduced in Antrea v0.8 and has been enabled by default
on all platforms since v0.11. AntreaProxy enables some or all of the cluster's
Service traffic to be load-balanced as part of the OVS pipeline, instead of
depending on kube-proxy. We typically observe latency improvements for Service
traffic when AntreaProxy is used.

While AntreaProxy can be disabled on Linux Nodes by setting the `AntreaProxy`
Feature Gate to `false`, it should remain enabled on all Windows Nodes, as it is
needed for correct NetworkPolicy implementation for Pod-to-Service traffic.

By default, AntreaProxy will only handle Service traffic originating from Pods
in the cluster, with no support for NodePort. However, starting with Antrea
v1.4, a new operating mode was introduced in which AntreaProxy can handle all
Service traffic, including NodePort. See the following
[section](#antreaproxy-with-proxyall) for more information.

## AntreaProxy with proxyAll

The `proxyAll` configuration parameter can be enabled in the Antrea
configuration if you want AntreaProxy to handle all Service traffic, with the
possibility to remove kube-proxy altogether and have one less DaemonSet running
in the cluster. This is particularly interesting on Windows Nodes, since until
the introduction of `proxyAll`, Antrea relied on userspace kube-proxy, which is
no longer actively maintained by the K8s community and is slower than other
kube-proxy backends.

Note that on Linux, even when `proxyAll` is enabled, kube-proxy will usually
take priority and will keep handling NodePort Service traffic (unless the source
is a Pod, which is pretty unusual as Pods typically access Services by
ClusterIP). This is because kube-proxy rules typically come before the rules
installed by AntreaProxy to redirect traffic to OVS. When kube-proxy is not
deployed or is removed from the cluster, AntreaProxy will then handle all
Service traffic.

### Removing kube-proxy

In this section, we will provide steps to run a K8s cluster without kube-proxy,
with Antrea being responsible for all Service traffic.

You can create a K8s cluster without kube-proxy with kubeadm as follows:

```bash
kubeadm init --skip-phases=addon/kube-proxy
```

To remove kube-proxy from an existing cluster, you can use the following steps:

```bash
# Delete the kube-proxy DaemonSet
kubectl -n kube-system delete ds/kube-proxy
# Delete the kube-proxy ConfigMap to prevent kube-proxy from being re-deployed
# by kubeadm during "upgrade apply". This workaround will not take effect for
# kubeadm versions older than v1.19 as the following patch is required:
# https://github.com/kubernetes/kubernetes/pull/89593
kubectl -n kube-system delete cm/kube-proxy
# Delete existing kube-proxy rules; there are several options for doing that
# Option 1 (if using kube-proxy in iptables mode), run the following on each Node:
iptables-save | grep -v KUBE | iptables-restore
# Option 2 (any mode), restart all Nodes
# Option 3 (any mode), run the following on each Node:
kube-proxy --cleanup
# You can create a DeamonSet to easily run the above command on all Nodes, using
# the kube-proxy container image
```

You will then need to deploy [Antrea](getting-started.md), after making the
necessary changes to the `antrea-config` ConfigMap:

```yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: antrea-config
namespace: kube-system
data:
antrea-agent.conf: |
kubeAPIServerOverride: "<kube-apiserver URL>"
antreaProxy:
proxyAll: true
```

The `kubeAPIServerOverride` option will enable the Antrea Agent to connect to
the K8s apiserver. This is required now that kube-proxy is no longer running and
that the Antrea Agent can no longer use the ClusterIP for the `kubernetes`
Service during initialization. If you are unsure about which values to use, take
a look at your Kubeconfig file, and look for a line like this one:

```yaml
...
server: https://192.168.77.100:6443
...
```

Then use this value as is (e.g., `"https://192.168.77.100:6443"`) for
`kubeAPIServerOverride`.

And that's it! All you have to do now is make sure that the `antrea-agent` Pods
came up correctly and perhaps validate that NodePort Services can be accessed
correctly.

#### Windows Nodes

Assuming you are following the steps we [documented](windows.md) to add Windows
Nodes to your K8s cluster with Antrea, you will simply need to skip running
kube-proxy:

* Do not install or start the `kube-proxy` service [when using containderd as
the container runtime](windows.md#installation-as-a-service-containerd-based-runtimes)
* Do not create the `kube-proxy-windows` DaemonSet [when using Docker as the
container runtime](windows.md#installation-via-wins-docker-based-runtimes)

## Special use cases

### When you are using NodeLocal DNSCache

[NodeLocal DNSCache](https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/)
improves performance of DNS queries in a K8s cluster by running a DNS cache on
each Node. DNS queries are intercepted by a local instance of CoreDNS, which
forwards the requests to CoreDNS (cluster local queries) or the upstream DNS
server in case of a cache miss.

The way it works (normally) is by assigning the the kube-dns ClusterIP to a
local "dummy" interface, and installing iptables rules to disable connection
tracking for the queries and bypass kube-proxy. The local CoreDNS instance is
configured to bind to that address and can therefore intercept queries. In case
of a cache miss, queries can be sent to the cluster CoreDNS Pods thanks to a
"shadow" Service which will expose CoreDNS Pods via a new ClusterIP.

When AntreaProxy is enabled (default), Pod DNS queries to the kube-dns ClusterIP
will be load-balanced directly by AntreaProxy to a CoreDNS Pod endpoint. This
means that NodeLocal DNSCache is completely bypassed, which is probably not
acceptable for users who want to leverage this feature to improve DNS
performance in their clusters. While these users can update the Pod
configuration to use the local IP assigned by NodeLocal DNSCache to the "dummy"
interface, this is not always ideal in the context of CaaS, as it can require
everyone running Pods in the cluster to be aware of the situation.

This is the reason why we initially introduced the `skipServices` configuration
option for AntreaProxy in Antrea v1.4. By adding the kube-dns Service (which
exposes CoreDNS) to the list, you can ensure that AntreaProxy will "ignore" Pod
DNS queries, and that they will be forwarded to NodeLocal DNSCache. You can edit
the `antrea-config` ConfigMap as follows:

```yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: antrea-config
namespace: kube-system
data:
antrea-agent.conf: |
antreaProxy:
skipServices: ["kube-system/kube-dns"]
```

### When you want your external LoadBalancer to handle Pod traffic

In some cases, the external LoadBalancer for a cluster provides additional
capabilities (e.g., TLS termination) and it is desirable for Pods to access
in-cluster Services through the external LoadBalancer. By default, this is not
the case as both kube-proxy and AntreaProxy will install rules to load-balance
this traffic directly at the source Node (even when the destination IP is set to
the external `loadBalancerIP`). To circumvent this behavior, we introduced the
`proxyLoadBalancerIPs` configuration option for AntreaProxy in Antrea v1.5. This
option defaults to `true`, but when setting it to `false`, AntreaProxy will no
longer load-balance traffic destined to external `loadBalancerIP`s, hence
ensuring that this traffic can go to the external LoadBalancer. You can set it
to `false` by editing the `antrea-config` ConfigMap as follows:

```yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: antrea-config
namespace: kube-system
data:
antrea-agent.conf: |
antreaProxy:
proxyLoadBalancerIPs: false
```

There are two important prerequisites for this feature:

* You must enable `proxyAll` and [remove kube-proxy](#removing-kube-proxy) from
the cluster, otherwise kube-proxy will still load-balance the traffic and you
will not achieve the desired behavior.
* Your external LoadBalancer must SNAT the traffic, in order for the reply
traffic to go back through the external LoadBalancer.

## Known issues or limitations

* Due to some restrictions on the implementation of Services in Antrea, the
maximum number of Endpoints that Antrea can support at the moment is 800. If
the number of Endpoints for a given Service exceeds 800, extra Endpoints will
be dropped (with non-local Endpoints being dropped in priority by each Antrea
Agent). This will be fixed eventually.
* Due to some restrictions on the implementation of Services in Antrea, the
maximum timeout value supported for ClientIP-based Service SessionAffinity is
65535 seconds (the K8s Service specs allow values up to 86400 seconds). Values
greater than 65535 seconds will be truncated and the Antrea Agent will log a
warning. [We do not intend to address this
limitation](https://github.com/antrea-io/antrea/issues/1578).
3 changes: 3 additions & 0 deletions docs/feature-gates.md
Expand Up @@ -67,6 +67,9 @@ manifest provided as part of releases enables this feature by default. If you
edit the manifest, make sure you do not disable it, as it is needed for correct
NetworkPolicy implementation for Pod-to-Service traffic.

Please refer to this [document](antrea-proxy.md) for extra information on
AntreaProxy and how it can be configured.

### EndpointSlice

`EndpointSlice` enables Service EndpointSlice support in AntreaProxy. The
Expand Down