Public ip is pending for a long time #2956

gball14 · 2020-03-24T09:40:07Z

Describe the bug

Created a new AKS cluster (windows VMs) with 20 VMs. It's been more than 15 hours and still the public IP is pending.

Steps To Reproduce

We deploy the cluster using SDM and SDM does JSON merge. I don't have final merged JSON. I have repro cluster.

Public IP is visible in portal.azure.com but kubectl still shows the public-ip as pending. Is this something to do with rate limiting that is happening? see kubectl events in the addition context section.

Expected behavior
Public IP is created as soon as the cluster is created.

AKS Engine version
v0.39.0

Kubernetes version
1.15.1

Additional context

See relevant events due to rate-limiting.

λ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
12s Normal BackOff pod/dms-k6g6z Back-off pulling image "eafdprodregistry.azurecr.io/dms:eafd-5b22721d-v2"
19m Warning Failed pod/dms-k6g6z Failed to pull image "eafdprodregistry.azurecr.io/dms:eafd-5b22721d-v2": rpc error: code = Unknown desc = Error response from daemon: Get https://eafdprodregistry.azurecr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
3m30s Normal EnsuringLoadBalancer service/fdv2 Ensuring load balancer
24m Warning SyncLoadBalancerFailed service/fdv2 Error syncing load balancer: failed to ensure load balancer: [EnsureHostInPool(default/fdv2): backendPoolID(/subscriptions/a58c9773-189b-4f9c-9f38-6c80aa9169de/resourceGroups/eafd-prod-ln02-rg/providers/Microsoft.Network/loadBalancers/eafd-prod-ln02-eafd/backendAddressPools/eafd-prod-ln02-eafd) - failed to ensure host in pool: "azure - cloud provider rate limited(read) for operation:VMSSGet", EnsureHostInPool(default/fdv2): backendPoolID(/subscriptions/a58c9773-189b-4f9c-9f38-6c80aa9169de/resourceGroups/eafd-prod-ln02-rg/providers/Microsoft.Network/loadBalancers/eafd-prod-ln02-eafd/backendAddressPools/eafd-prod-ln02-eafd) - failed to ensure host in pool: "azure - cloud provider rate limited(read) for operation:VMSSGetInstanceView"]
9m1s Warning UpdateLoadBalancerFailed service/fdv2 Error updating load balancer with new hosts map[1353k8s00000000:{} 1353k8s00000001:{} 1353k8s00000002:{} 1353k8s00000003:{} 1353k8s00000004:{} 1353k8s00000005:{} 1353k8s00000006:{} 1353k8s00000007:{} 1353k8s00000008:{} 1353k8s00000009:{} 1353k8s0000000a:{} 1353k8s0000000b:{} 1353k8s0000000c:{} 1353k8s0000000d:{} 1353k8s0000000e:{} 1353k8s0000000f:{} 1353k8s0000000g:{} 1353k8s0000000h:{} 1353k8s0000000i:{} 1353k8s0000000j:{}]: [EnsureHostInPool(default/fdv2): backendPoolID(/subscriptions/a58c9773-189b-4f9c-9f38-6c80aa9169de/resourceGroups/eafd-prod-ln02-rg/providers/Microsoft.Network/loadBalancers/eafd-prod-ln02-eafd/backendAddressPools/eafd-prod-ln02-eafd) - failed to ensure host in pool: "azure - cloud provider rate limited(read) for operation:VMSSGet", EnsureHostInPool(default/fdv2): backendPoolID(/subscriptions/a58c9773-189b-4f9c-9f38-6c80aa9169de/resourceGroups/eafd-prod-ln02-rg/providers/Microsoft.Network/loadBalancers/eafd-prod-ln02-eafd/backendAddressPools/eafd-prod-ln02-eafd) - failed to ensure host in pool: "azure - cloud provider rate limited(read) for operation:VMSSGetInstanceView"]
3m27s Warning ListLoadBalancers service/fdv2 azure - cloud provider rate limited(read) for operation:LBList
97s Warning Unhealthy pod/ln02-fdv2-d8dd889cf-4c6tx Readiness probe failed: Get http://10.240.2.94:80/fdv2/diagnostics.aspx: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
56m Warning Unhealthy pod/ln02-fdv2-d8dd889cf-4c6tx Liveness probe failed: HTTP probe failed with statuscode: 400
6m40s Warning BackOff pod/ln02-fdv2-d8dd889cf-4c6tx Back-off restarting failed container
10m Warning Unhealthy pod/ln02-fdv2-d8dd889cf-b54hr Readiness probe failed: Get http://10.240.2.72:80/fdv2/diagnostics.aspx: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
30m Warning Unhealthy pod/ln02-fdv2-d8dd889cf-b54hr (combined from similar events): Readiness probe failed: Get http://10.240.2.72:80/fdv2/diagnostics.aspx: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
5m32s Warning BackOff pod/ln02-fdv2-d8dd889cf-b54hr Back-off restarting failed container

welcome · 2020-03-24T09:40:09Z

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

CecileRobertMichon · 2020-03-25T19:39:36Z

this looks like a Kubernetes upstream cloud provider issue in older k8s versions (including 1.15.1) that was fixed in kubernetes/kubernetes#88094.

Can you please try to use the latest 1.15 k8s version and let us know if that fixes it? 1.15.11 has that fix and most of the cache fixes in general are available in 1.15.8+.

gball14 · 2020-04-07T20:10:48Z

We moved to latest AKS engine and Kubernetes 1.15.11. The issue doesn't repro.

gball14 added the bug Something isn't working label Mar 24, 2020

gball14 closed this as completed Apr 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public ip is pending for a long time #2956

Public ip is pending for a long time #2956

gball14 commented Mar 24, 2020

welcome bot commented Mar 24, 2020

CecileRobertMichon commented Mar 25, 2020

gball14 commented Apr 7, 2020

Public ip is pending for a long time #2956

Public ip is pending for a long time #2956

Comments

gball14 commented Mar 24, 2020

welcome bot commented Mar 24, 2020

CecileRobertMichon commented Mar 25, 2020

gball14 commented Apr 7, 2020