-
Notifications
You must be signed in to change notification settings - Fork 260
Closed
Description
Is this a request for help?:
No, issue is under investigation already
Is this an ISSUE or FEATURE REQUEST? (choose one):
Issue
Which release version?:
1.0.14
Which component (CNI/IPAM/CNM/CNS):
CNI
Which Operating System (Linux/Windows):
Windows Server 2019
Which Orchestrator and version (e.g. Kubernetes, Docker)
Kubernetes - reported on v1.13.1
What happened:
This was originally reported on Azure/aks-engine#168
After a server reboots, there is a race condition where pods may try to start, but the Windows Host Networking Service (HNS) is missing the virtual network. This network needs to be recreated before the first network adapter create call is given.
This causes an error in Kubernetes such as
2018/12/11 15:28:52 [net] Creating endpoint &{Id:8f996d65-eth0 ContainerID:8f996d650e31f21195c99d9affac4efb489adb5cf124485ffed4c41f4c96c535 NetNsPath:none IfName:eth0 SandboxKey: IfIndex:0 MacAddress: DNS:{Suffix:ns1.svc.cluster.local Servers:[10.0.0.10 168.63.129.16]} IPAddresses:[{IP:10.240.1.154 Mask:fff00000}] InfraVnetIP:{IP:<nil> Mask:<nil>} Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} Gw:10.240.0.1 DevName:}] Policies:[{Type:EndpointPolicy Data:[123 34 69 120 99 101 112 116 105 111 110 76 105 115 116 34 58 91 34 49 48 46 50 52 48 46 48 46 48 47 49 50 34 44 34 49 48 46 50 52 48 46 48 46 48 47 49 50 34 93 44 34 84 121 112 101 34 58 34 79 117 116 66 111 117 110 100 78 65 84 34 125]} {Type:EndpointPolicy Data:[123 34 68 101 115 116 105 110 97 116 105 111 110 80 114 101 102 105 120 34 58 34 49 48 46 48 46 48 46 48 47 49 54 34 44 34 78 101 101 100 69 110 99 97 112 34 58 116 114 117 101 44 34 84 121 112 101 34 58 34 82 79 85 84 69 34 125]}] Gateways:[] EnableSnatOnHost:false EnableInfraVnet:false EnableMultiTenancy:false PODName:win-busybox PODNameSpace:ns1 Data:map[] InfraVnetAddressSpace:} in network azure.
2018/12/11 15:28:52 [net] HNSEndpointRequest POST request:{"Name":"8f996d65-eth0","VirtualNetwork":"DE928E0D-279B-48C7-93F1-3F737AB3CE1C","Policies":[{"Type":"OutBoundNAT","ExceptionList":["10.240.0.0/12","10.240.0.0/12"]},{"DestinationPrefix":"10.0.0.0/16","NeedEncap":true,"Type":"ROUTE"}],"IPAddress":"10.240.1.154","DNSSuffix":"ns1.svc.cluster.local","DNSServerList":"10.0.0.10,168.63.129.16","PrefixLength":12}
2018/12/11 15:28:52 [net] HNSEndpointRequest POST response:<nil> err:hnsCall failed in Win32: Element not found. (0x490).
2018/12/11 15:28:52 [net] Failed to create endpoint 8f996d65-eth0, err:hnsCall failed in Win32: Element not found. (0x490).
2018/12/11 15:28:52 [azure-vnet] Failed to create endpoint: hnsCall failed in Win32: Element not found. (0x490).
What you expected to happen:
No failures
How to reproduce it (as minimally and precisely as possible):
- Set up a Windows cluster with latest AKS-Engine release (https://aka.ms/windowscontainers/kubernetes has a step by step guide)
- Run a pod - also in above guide
- Reboot Windows nodes
Check if the pod is restarted, or gets stuck in Creating state after reboot.
Reboot again if needed
Anything else we need to know:
cc @ashvindeodhar
Metadata
Metadata
Assignees
Labels
No labels