Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

antrea-agent failed to start on windows #2013

Closed
anfernee opened this issue Mar 30, 2021 · 15 comments
Closed

antrea-agent failed to start on windows #2013

anfernee opened this issue Mar 30, 2021 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@anfernee
Copy link
Contributor

Describe the bug
antrea-agent failed to start on windows with the following error:

F0329 23:35:27.090421    3936 main.go:58] Error running agent: error initializing agent: failed during hnsCallRawResponse: hnsCall failed in Win32: The parameter is incorrect. (0x57)

One difference from the windows guide is that I ran antrea-agent and kube-proxy as windows service directly in root namespace, not as a daemonset pod.

To Reproduce
Actually I am trying to add antrea support in kubernetes's kube-up script. This is how I found this problem. It's still a working in progress: https://github.com/anfernee/kubernetes/tree/antrea-kubeup but it summarized how I found the issue

Expected
Not to crash

Actual behavior
Crashing

Versions:
Please provide the following information:

  • Antrea version (Docker image tag): v1.13.1
  • Kubernetes version (use kubectl version). If your Kubernetes components have different versions, please provide the version for all of them: v1.21.0 (unreleased in master)
  • Container runtime: which runtime are you using (e.g. containerd, cri-o, docker) and which version are you using? docker
  • Windows kernel version on the Kubernetes Nodes (uname -r). 10.0.17763.1697 and 10.0.17763.1757
  • If you chose to compile the Open vSwitch kernel module manually instead of using the kernel module built into the Linux kernel, which version of the OVS kernel module are you using? Include the output of modinfo openvswitch for the Kubernetes Nodes.

Additional context
Add any other context about the problem here, such as Antrea logs, kubelet logs, etc.

(Please consider pasting long output into a GitHub gist or any other pastebin.)

@anfernee anfernee added the kind/bug Categorizes issue or PR as related to a bug. label Mar 30, 2021
@antoninbas
Copy link
Contributor

@anfernee are you aware of this PR which updates the documentation to run antrea as a Windows service: https://github.com/vmware-tanzu/antrea/pull/1874/files. Maybe it will help you?

@lzhecheng could you help triage the issue?

@anfernee
Copy link
Contributor Author

@antoninbas Thanks for the doc. I am not aware of the change. I did essentially the same or equivalent thing as described in the doc.

@antoninbas
Copy link
Contributor

@anfernee Thanks, I am hoping that @lzhecheng can reproduce the issue

@lzhecheng
Copy link
Contributor

I will look into it.

@lzhecheng
Copy link
Contributor

lzhecheng commented Mar 31, 2021

@anfernee I searched this error and it seems to be a common issue.
moby/moby#40621
moby/moby#40998

Newer OS may introduce this issue. Can you try to downgrade or uninstall a patch in current VM first?
moby/moby#40998 (comment)
moby/moby#40998 (comment)

My Windows version is 17763.1457, slightly newer than yours.

@anfernee
Copy link
Contributor Author

It's pretty concerning if newer os image would introduce the issue. I've seen other issues, and microsoft guys recommended to install the latest patches which fixed those issues. It's like a moving target now.

I think we should provide something like support matrix for windows, or a recommended windows version and patches to install (or not to install), so users know in advance what to choose and what to expect. Right now, it's more like a coin toss.

@anfernee
Copy link
Contributor Author

This seems very os version dependent. can you help address this issue on the newer windows?

@lzhecheng
Copy link
Contributor

@anfernee yes, I will try.

@anfernee
Copy link
Contributor Author

anfernee commented Apr 1, 2021

Thanks @lzhecheng This kubernetes/kubernetes#100736 will help reproduce the issue.

@lzhecheng
Copy link
Contributor

@anfernee The reason is that there's a docker network “l2bridge” on the Node which is created by google cloud init script. As a result, when creating "antrea hsnnetwork", the request is rejected. If the docker network can be deleted, antrea-agent should be started successfully.

@antoninbas
Copy link
Contributor

Is this issue resolved? It seems that the only action item for the Antrea community is to improve the error log: #2213

@lzhecheng
Copy link
Contributor

Is this issue resolved? It seems that the only action item for the Antrea community is to improve the error log: #2213

@tnqn is also reviewing GKE initialization script to avoid that hnsnetwork before Antrea hnsnetwork. @tnqn what's the progress now?

@tnqn
Copy link
Member

tnqn commented Jun 3, 2021

I think we can resolve this one and track the error log improvement in #2213.

@lzhecheng
Copy link
Contributor

Closing this issue and use #2213 to track.

@anfernee
Copy link
Contributor Author

anfernee commented Jun 3, 2021

Make sense. This can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants