-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running Antrea in clusters created with Kind #14
Comments
Tentative support in #32 |
With the current support, the Antrea components (agent, controller) can come up. However, there is no connectivity between Nodes. At this time, I believe that this is because using VXLAN tunnels in OVS userspace mode requires some special configuration: http://docs.openvswitch.org/en/latest/howto/userspace-tunneling/. I will work on this. |
It seems that using OVS in userspace mode also requires to explicitly disable TX offloading for each Pod's eth0 interface. Otherwise all TCP traffic going through the gateway is dropped (and probably Pod to Pod traffic as well). This is something I observed when working on support for Kind. All the packets going through the gateway were dropped and Pods couldn't reach the K8s API server. Found this link, which is not part of OVS documentation: https://arthurchiao.github.io/blog/ovs-deep-dive-5-datapath-tx-offloading/ |
Found a good reference for the checksum issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685616
|
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. Refer to antrea-io#14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes antrea-io#14 Fixes antrea-io#13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. Refer to antrea-io#14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes antrea-io#14 Fixes antrea-io#13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. Refer to antrea-io#14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes antrea-io#14 Fixes antrea-io#13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. * Use a new version of start_ovs (start_ovs_netdev) which modifies the ovs-ctl script in-place to avoid loading the kernel module. Refer to antrea-io#14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes antrea-io#14 Fixes antrea-io#13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. * Use a new version of start_ovs (start_ovs_netdev) which modifies the ovs-ctl script in-place to avoid loading the kernel module. Refer to antrea-io#14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes antrea-io#14 Fixes antrea-io#13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. * Use a new version of start_ovs (start_ovs_netdev) which modifies the ovs-ctl script in-place to avoid loading the kernel module. Refer to antrea-io#14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes antrea-io#14 Fixes antrea-io#13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. * Use a new version of start_ovs (start_ovs_netdev) which modifies the ovs-ctl script in-place to avoid loading the kernel module. Refer to #14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes #14 Fixes #13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. * Use a new version of start_ovs (start_ovs_netdev) which modifies the ovs-ctl script in-place to avoid loading the kernel module. Refer to antrea-io#14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes antrea-io#14 Fixes antrea-io#13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. * Use a new version of start_ovs (start_ovs_netdev) which modifies the ovs-ctl script in-place to avoid loading the kernel module. Refer to antrea-io#14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes antrea-io#14 Fixes antrea-io#13
The following changes were required: * Disable TX HW checksum offload in containers. This is done in the Antrea CNI server when setting-up Pod networking, using an ioctl ethtool system call. * Disable TX HW checksum offload in the Linux host for the veth interface of each Kind Node. This must be done by invoking an additional script (hack/kind_linux.sh) after creating the Kind cluster. * Create a secondary br-phy bridge on each Node, as required by OVS userspace tunneling. * Use a new version of start_ovs (start_ovs_netdev) which modifies the ovs-ctl script in-place to avoid loading the kernel module. Refer to #14 for the rationale for all the above bullet points. A new test "provider" was added to the e2e test framework so that all the e2e tests can be run on Kind clusters. As part of this, some changes to the framework had to be performed. For example it is impractical to run SSH commands on Kind Nodes - as they do not have an SSH server - so instead we use "docker exec". Fixes #14 Fixes #13
This patch enables using netdev mode for OVS. This allows multiple docker containers hosting different OVS instances to function without the potential collisions that would occur while sharing the same kernel data path. Note, netdev without DPDK is considered "unsupported" officially, and is something we only want to use for KIND deployments. Therefore the config option to enable it is hidden, using an environment variable that is not exposed in the ovn-kubernetes config. Netdev mode does not support TX checksum offload, therefore it needs to be disabled on pod veth interfaces as well as veth interfaces attached from OVS to the host. See: antrea-io/antrea#14 Co-Authored-by: Andrew Sun <asun@redhat.com> Signed-off-by: Tim Rozet <trozet@redhat.com>
This patch enables using netdev mode for OVS. This allows multiple docker containers hosting different OVS instances to function without the potential collisions that would occur while sharing the same kernel data path. Note, netdev without DPDK is considered "unsupported" officially, and is something we only want to use for KIND deployments. Therefore the config option to enable it is hidden, using an environment variable that is not exposed in the ovn-kubernetes config. Netdev mode does not support TX checksum offload, therefore it needs to be disabled on pod veth interfaces as well as veth interfaces attached from OVS to the host. See: antrea-io/antrea#14 Co-Authored-by: Andrew Sun <asun@redhat.com> Signed-off-by: Tim Rozet <trozet@redhat.com>
This patch enables using netdev mode for OVS. This allows multiple docker containers hosting different OVS instances to function without the potential collisions that would occur while sharing the same kernel data path. Note, netdev without DPDK is considered "unsupported" officially, and is something we only want to use for KIND deployments. Therefore the config option to enable it is hidden, using an environment variable that is not exposed in the ovn-kubernetes config. Netdev mode does not support TX checksum offload, therefore it needs to be disabled on pod veth interfaces as well as veth interfaces attached from OVS to the host. See: antrea-io/antrea#14 Co-Authored-by: Andrew Sun <asun@redhat.com> Signed-off-by: Tim Rozet <trozet@redhat.com>
@antoninbas Hi, I've been working on adding similar support into OVN. I wanted to ask specifically why multiple OVS in separate containers cannot utilize the same kernel path? If each OVS is in its own namespace with its own unique DPID, will there be conflicts in kernel path? Thanks. |
I think it doesn't work, but in reality, I do see people running multiple ovs-vswitcd in multiple There is a talk about this in 2015 mentioning a couple of issues |
@trozet I believe you can make it work, but I also think that wasn't the best option for the Antrea case:
And as William pointed-out, there may be some other issues on top of that. Of course, using the userspace datapath also comes with its own issues :) |
Depends on #13.
This could be very convenient for CI and enable us to run e2e tests as part of a public CI service.
The text was updated successfully, but these errors were encountered: