Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running Antrea in clusters created with Kind #14

Closed
antoninbas opened this issue Nov 4, 2019 · 7 comments · Fixed by #137
Closed

Support running Antrea in clusters created with Kind #14

antoninbas opened this issue Nov 4, 2019 · 7 comments · Fixed by #137
Assignees
Labels

Comments

@antoninbas
Copy link
Contributor

Depends on #13.

This could be very convenient for CI and enable us to run e2e tests as part of a public CI service.

@antoninbas antoninbas self-assigned this Nov 4, 2019
@antoninbas
Copy link
Contributor Author

Tentative support in #32

@antoninbas
Copy link
Contributor Author

With the current support, the Antrea components (agent, controller) can come up. However, there is no connectivity between Nodes. At this time, I believe that this is because using VXLAN tunnels in OVS userspace mode requires some special configuration: http://docs.openvswitch.org/en/latest/howto/userspace-tunneling/. I will work on this.

@antoninbas antoninbas added the p0 label Nov 21, 2019
@antoninbas
Copy link
Contributor Author

antoninbas commented Nov 22, 2019

It seems that using OVS in userspace mode also requires to explicitly disable TX offloading for each Pod's eth0 interface. Otherwise all TCP traffic going through the gateway is dropped (and probably Pod to Pod traffic as well). This is something I observed when working on support for Kind. All the packets going through the gateway were dropped and Pods couldn't reach the K8s API server.

Found this link, which is not part of OVS documentation: https://arthurchiao.github.io/blog/ovs-deep-dive-5-datapath-tx-offloading/

@antoninbas
Copy link
Contributor Author

Found a good reference for the checksum issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685616

The issue here is that OVS netdev datapath doesn't
support TX checksum offloading (this is not easy task with arguable profit).
i.e. if packet arrives with bad/no checksum it will be sent to the output port
with same bad/no checksum. Everything works in case of kernel datapth because
the packet doesn't leave the kernel space. In case of netdev datapath some
information (like CHECKSUM_VALID skb flags) is lost while receiving via
socket in userspace and subsequently kernel expects valid checksum while
receiving the packet from userspace because TX offloading is not enabled.

This kind of issues usually mitigated by disabling TX offloading on the
"right*" interfaces, or by setting iptables to fill the checksums like this:

iptables -A POSTROUTING -t mangle -p udp -m udp -j CHECKSUM --checksum-fill

Some related OpenStack bug: https://bugs.launchpad.net/neutron/+bug/1244589

Also, note that this happens only for virtual interfaces like veth/tap because
kernel always tries to delay checksum calculation/validation as much as possible.
Correct packets received from the wire will always have correct checksums.

antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 25, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 26, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 27, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 27, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.
 * Use a new version of start_ovs (start_ovs_netdev) which modifies the
   ovs-ctl script in-place to avoid loading the kernel module.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 27, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.
 * Use a new version of start_ovs (start_ovs_netdev) which modifies the
   ovs-ctl script in-place to avoid loading the kernel module.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 27, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.
 * Use a new version of start_ovs (start_ovs_netdev) which modifies the
   ovs-ctl script in-place to avoid loading the kernel module.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
antoninbas added a commit that referenced this issue Nov 27, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.
 * Use a new version of start_ovs (start_ovs_netdev) which modifies the
   ovs-ctl script in-place to avoid loading the kernel module.

Refer to #14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes #14
Fixes #13
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 28, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.
 * Use a new version of start_ovs (start_ovs_netdev) which modifies the
   ovs-ctl script in-place to avoid loading the kernel module.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 28, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.
 * Use a new version of start_ovs (start_ovs_netdev) which modifies the
   ovs-ctl script in-place to avoid loading the kernel module.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
antoninbas added a commit that referenced this issue Nov 28, 2019
The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.
 * Use a new version of start_ovs (start_ovs_netdev) which modifies the
   ovs-ctl script in-place to avoid loading the kernel module.

Refer to #14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes #14
Fixes #13
trozet pushed a commit to trozet/ovn-kubernetes that referenced this issue Mar 11, 2020
This patch enables using netdev mode for OVS. This allows multiple
docker containers hosting different OVS instances to function without
the potential collisions that would occur while sharing the same kernel
data path.

Note, netdev without DPDK is considered "unsupported" officially, and is
something we only want to use for KIND deployments. Therefore the config
option to enable it is hidden, using an environment variable that is not
exposed in the ovn-kubernetes config.

Netdev mode does not support TX checksum offload, therefore it needs to
be disabled on pod veth interfaces as well as veth interfaces attached
from OVS to the host. See:
antrea-io/antrea#14

Co-Authored-by: Andrew Sun <asun@redhat.com>

Signed-off-by: Tim Rozet <trozet@redhat.com>
trozet pushed a commit to trozet/ovn-kubernetes that referenced this issue Mar 12, 2020
This patch enables using netdev mode for OVS. This allows multiple
docker containers hosting different OVS instances to function without
the potential collisions that would occur while sharing the same kernel
data path.

Note, netdev without DPDK is considered "unsupported" officially, and is
something we only want to use for KIND deployments. Therefore the config
option to enable it is hidden, using an environment variable that is not
exposed in the ovn-kubernetes config.

Netdev mode does not support TX checksum offload, therefore it needs to
be disabled on pod veth interfaces as well as veth interfaces attached
from OVS to the host. See:
antrea-io/antrea#14

Co-Authored-by: Andrew Sun <asun@redhat.com>

Signed-off-by: Tim Rozet <trozet@redhat.com>
trozet pushed a commit to trozet/ovn-kubernetes that referenced this issue Mar 12, 2020
This patch enables using netdev mode for OVS. This allows multiple
docker containers hosting different OVS instances to function without
the potential collisions that would occur while sharing the same kernel
data path.

Note, netdev without DPDK is considered "unsupported" officially, and is
something we only want to use for KIND deployments. Therefore the config
option to enable it is hidden, using an environment variable that is not
exposed in the ovn-kubernetes config.

Netdev mode does not support TX checksum offload, therefore it needs to
be disabled on pod veth interfaces as well as veth interfaces attached
from OVS to the host. See:
antrea-io/antrea#14

Co-Authored-by: Andrew Sun <asun@redhat.com>

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet
Copy link

trozet commented Mar 20, 2020

@antoninbas Hi, I've been working on adding similar support into OVN. I wanted to ask specifically why multiple OVS in separate containers cannot utilize the same kernel path? If each OVS is in its own namespace with its own unique DPID, will there be conflicts in kernel path? Thanks.

@williamtu
Copy link
Contributor

I think it doesn't work, but in reality, I do see people running multiple ovs-vswitcd in multiple
containers sharing one ovs kernel datapath, without any problem. I guess it depends on use cases.

There is a talk about this in 2015 mentioning a couple of issues
https://www.openvswitch.org/support/ovscon2015/17/1555-benc.pdf

@antoninbas
Copy link
Contributor Author

antoninbas commented Mar 25, 2020

@trozet I believe you can make it work, but I also think that wasn't the best option for the Antrea case:

  • the OVS bridge for each Kind Node needs to have a different name (and I believe so does the host gateway interface), which means that we have to tinker with some Antrea configuration files to make this happen.
  • I don't know how easy it would be to make it work on macOS (or if it's even possible with some reasonable effort), I doubt that the OVS kernel module is available out-of-the-box in HyperKit.

And as William pointed-out, there may be some other issues on top of that. Of course, using the userspace datapath also comes with its own issues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants