Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Kind support (Linux hosts only) #137

Merged
merged 1 commit into from
Nov 27, 2019

Conversation

antoninbas
Copy link
Contributor

@antoninbas antoninbas commented Nov 25, 2019

The following changes were required:

  • Disable TX HW checksum offload in containers. This is done in the
    Antrea CNI server when setting-up Pod networking, using an ioctl
    ethtool system call.
  • Disable TX HW checksum offload in the Linux host for the veth
    interface of each Kind Node. This must be done by invoking an
    additional script (hack/kind_linux.sh) after creating the Kind
    cluster.
  • Create a secondary br-phy bridge on each Node, as required by OVS
    userspace tunneling.
  • Use a new version of start_ovs (start_ovs_netdev) which modifies the
    ovs-ctl script in-place to avoid loading the kernel module.

Refer to #14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes #14
Fixes #13

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests. This command can only be run by members of the vmware-tanzu organization
  • /skip-e2e: to skip e2e tests. This command can only be run by members of the vmware-tanzu organization

@@ -66,6 +66,11 @@ func setupInterface(podName string, podNamespace string, ifname string, netns ns
containerIface.Sandbox = netns.Path()
hostIface.Name = hostVeth.Name
hostIface.Mac = hostVeth.HardwareAddr.String()
// TODO: should be conditional on the OVS datapath type
// Waiting for https://github.com/vmware-tanzu/antrea/pull/125 to be merged
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for #125 to be merged and I will update this PR

@antoninbas
Copy link
Contributor Author

/test-e2e

@antoninbas
Copy link
Contributor Author

I'm working on an additional commit for this that will enable a CI job for a Kind cluster.

@antoninbas antoninbas force-pushed the fix-kind-support branch 7 times, most recently from e9e11b4 to eb106dc Compare November 26, 2019 01:41
@antoninbas
Copy link
Contributor Author

The CI job was configured and works properly!

@antoninbas
Copy link
Contributor Author

/test-e2e

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - how could user switch from netdev to kernel datapath or the other way? Do they need some manual cleanup?

build/yamls/patches/kind/startOvs.yml Show resolved Hide resolved
build/images/scripts/start_ovs_netdev Show resolved Hide resolved
build/images/scripts/start_ovs_netdev Show resolved Hide resolved
pkg/agent/util/ethtool.go Outdated Show resolved Hide resolved
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor comments

pkg/agent/util/ethtool.go Outdated Show resolved Hide resolved
pkg/agent/util/ethtool.go Outdated Show resolved Hide resolved
pkg/agent/util/ethtool.go Outdated Show resolved Hide resolved
@antoninbas
Copy link
Contributor Author

I have updated the PR now that #125 has been merged and I have squashed all my commits.

@antoninbas
Copy link
Contributor Author

/test-e2e

jianjuns
jianjuns previously approved these changes Nov 27, 2019
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Could you answer my question - what will happen if we switch between netdev and kernel datapath (not in a Kind env but a normal K8s cluster)? Do we need any manual cleanup before changing the datapath?

@antoninbas
Copy link
Contributor Author

@jianjuns That's a good question. We definitely need to think about what should be cleaned up automatically when the agent Pod exits. At the moment we do not delete the br-int bridge. That means that you cannot change the datapath type without manually deleting the bridge. But if we delete the bridge on exit in start_ovs, then we won't be able to preserve connectivity on restarts / updates (which BTW is a problem we already have since OVS deletes flows from the datapath on exit).

I think ideally we want all of this:

  1. deleting Antrea from a cluster (kubectl delete -f antrea.yml) should clean-up everything created by Antrea on the Nodes
  2. agent Pod restarts (at least planned restarts for upgrades) should not delete OVS flows
  3. on restart, if agent detects that the datapath type has changed, the existing bridge should be deleted and re-created

I can take care of 3 in a follow-up PR, but do you have any good ideas on handling 1 & 2 (assuming OVS can be patched so that flows are not deleted on daemon restart).

@jianjuns
Copy link
Contributor

  1. deleting Antrea from a cluster (kubectl delete -f antrea.yml) should clean-up everything created by Antrea on the Nodes
    This could not be done by kubectl delete -f antrea.yml. We probably need a cleanup DS, or maybe CRD/CLI/ConfigMap to trigger cleanup.

  2. agent Pod restarts (at least planned restarts for upgrades) should not delete OVS flows
    This is the OVS change Ben is helping out?

  3. on restart, if agent detects that the datapath type has changed, the existing bridge should be deleted and re-created
    Actually I asked you this because I did not see you do any thing special for br-int and OVS port (except offload), and so wonder does it mean we do not need to recreate bridge and ports if the datapath type changes?

@antoninbas
Copy link
Contributor Author

  1. No, we need to. But this issue pre-dates this PR and is independent of Kind, so my plan is to do it in a follow-up PR.

@antoninbas
Copy link
Contributor Author

/test-e2e

The following changes were required:
 * Disable TX HW checksum offload in containers. This is done in the
   Antrea CNI server when setting-up Pod networking, using an ioctl
   ethtool system call.
 * Disable TX HW checksum offload in the Linux host for the veth
   interface of each Kind Node. This must be done by invoking an
   additional script (hack/kind_linux.sh) after creating the Kind
   cluster.
 * Create a secondary br-phy bridge on each Node, as required by OVS
   userspace tunneling.
 * Use a new version of start_ovs (start_ovs_netdev) which modifies the
   ovs-ctl script in-place to avoid loading the kernel module.

Refer to antrea-io#14 for the rationale for all the above bullet points.

A new test "provider" was added to the e2e test framework so that all
the e2e tests can be run on Kind clusters. As part of this, some
changes to the framework had to be performed. For example it is
impractical to run SSH commands on Kind Nodes - as they do not have an
SSH server - so instead we use "docker exec".

Fixes antrea-io#14
Fixes antrea-io#13
@antoninbas
Copy link
Contributor Author

/test-e2e

@antoninbas
Copy link
Contributor Author

@jianjuns I need a new approve for this, I renamed kind_linux.sh to kind-linux.sh to match other scripts.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you go.

@antoninbas antoninbas merged commit 089b2d0 into antrea-io:master Nov 27, 2019
@antoninbas antoninbas deleted the fix-kind-support branch November 27, 2019 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support running Antrea in clusters created with Kind Support running OVS in userspace mode
5 participants