Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

Public ip is pending for a long time #2956

Closed
gball14 opened this issue Mar 24, 2020 · 3 comments
Closed

Public ip is pending for a long time #2956

gball14 opened this issue Mar 24, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@gball14
Copy link

gball14 commented Mar 24, 2020

Describe the bug

Created a new AKS cluster (windows VMs) with 20 VMs. It's been more than 15 hours and still the public IP is pending.

Steps To Reproduce

We deploy the cluster using SDM and SDM does JSON merge. I don't have final merged JSON. I have repro cluster.

Public IP is visible in portal.azure.com but kubectl still shows the public-ip as pending. Is this something to do with rate limiting that is happening? see kubectl events in the addition context section.

Expected behavior
Public IP is created as soon as the cluster is created.

AKS Engine version
v0.39.0

Kubernetes version
1.15.1

Additional context

See relevant events due to rate-limiting.

λ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
12s Normal BackOff pod/dms-k6g6z Back-off pulling image "eafdprodregistry.azurecr.io/dms:eafd-5b22721d-v2"
19m Warning Failed pod/dms-k6g6z Failed to pull image "eafdprodregistry.azurecr.io/dms:eafd-5b22721d-v2": rpc error: code = Unknown desc = Error response from daemon: Get https://eafdprodregistry.azurecr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
3m30s Normal EnsuringLoadBalancer service/fdv2 Ensuring load balancer
24m Warning SyncLoadBalancerFailed service/fdv2 Error syncing load balancer: failed to ensure load balancer: [EnsureHostInPool(default/fdv2): backendPoolID(/subscriptions/a58c9773-189b-4f9c-9f38-6c80aa9169de/resourceGroups/eafd-prod-ln02-rg/providers/Microsoft.Network/loadBalancers/eafd-prod-ln02-eafd/backendAddressPools/eafd-prod-ln02-eafd) - failed to ensure host in pool: "azure - cloud provider rate limited(read) for operation:VMSSGet", EnsureHostInPool(default/fdv2): backendPoolID(/subscriptions/a58c9773-189b-4f9c-9f38-6c80aa9169de/resourceGroups/eafd-prod-ln02-rg/providers/Microsoft.Network/loadBalancers/eafd-prod-ln02-eafd/backendAddressPools/eafd-prod-ln02-eafd) - failed to ensure host in pool: "azure - cloud provider rate limited(read) for operation:VMSSGetInstanceView"]
9m1s Warning UpdateLoadBalancerFailed service/fdv2 Error updating load balancer with new hosts map[1353k8s00000000:{} 1353k8s00000001:{} 1353k8s00000002:{} 1353k8s00000003:{} 1353k8s00000004:{} 1353k8s00000005:{} 1353k8s00000006:{} 1353k8s00000007:{} 1353k8s00000008:{} 1353k8s00000009:{} 1353k8s0000000a:{} 1353k8s0000000b:{} 1353k8s0000000c:{} 1353k8s0000000d:{} 1353k8s0000000e:{} 1353k8s0000000f:{} 1353k8s0000000g:{} 1353k8s0000000h:{} 1353k8s0000000i:{} 1353k8s0000000j:{}]: [EnsureHostInPool(default/fdv2): backendPoolID(/subscriptions/a58c9773-189b-4f9c-9f38-6c80aa9169de/resourceGroups/eafd-prod-ln02-rg/providers/Microsoft.Network/loadBalancers/eafd-prod-ln02-eafd/backendAddressPools/eafd-prod-ln02-eafd) - failed to ensure host in pool: "azure - cloud provider rate limited(read) for operation:VMSSGet", EnsureHostInPool(default/fdv2): backendPoolID(/subscriptions/a58c9773-189b-4f9c-9f38-6c80aa9169de/resourceGroups/eafd-prod-ln02-rg/providers/Microsoft.Network/loadBalancers/eafd-prod-ln02-eafd/backendAddressPools/eafd-prod-ln02-eafd) - failed to ensure host in pool: "azure - cloud provider rate limited(read) for operation:VMSSGetInstanceView"]
3m27s Warning ListLoadBalancers service/fdv2 azure - cloud provider rate limited(read) for operation:LBList
97s Warning Unhealthy pod/ln02-fdv2-d8dd889cf-4c6tx Readiness probe failed: Get http://10.240.2.94:80/fdv2/diagnostics.aspx: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
56m Warning Unhealthy pod/ln02-fdv2-d8dd889cf-4c6tx Liveness probe failed: HTTP probe failed with statuscode: 400
6m40s Warning BackOff pod/ln02-fdv2-d8dd889cf-4c6tx Back-off restarting failed container
10m Warning Unhealthy pod/ln02-fdv2-d8dd889cf-b54hr Readiness probe failed: Get http://10.240.2.72:80/fdv2/diagnostics.aspx: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
30m Warning Unhealthy pod/ln02-fdv2-d8dd889cf-b54hr (combined from similar events): Readiness probe failed: Get http://10.240.2.72:80/fdv2/diagnostics.aspx: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
5m32s Warning BackOff pod/ln02-fdv2-d8dd889cf-b54hr Back-off restarting failed container

@gball14 gball14 added the bug Something isn't working label Mar 24, 2020
@welcome
Copy link

welcome bot commented Mar 24, 2020

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

@CecileRobertMichon
Copy link
Contributor

this looks like a Kubernetes upstream cloud provider issue in older k8s versions (including 1.15.1) that was fixed in kubernetes/kubernetes#88094.

Can you please try to use the latest 1.15 k8s version and let us know if that fixes it? 1.15.11 has that fix and most of the cache fixes in general are available in 1.15.8+.

@gball14
Copy link
Author

gball14 commented Apr 7, 2020

We moved to latest AKS engine and Kubernetes 1.15.11. The issue doesn't repro.

@gball14 gball14 closed this as completed Apr 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants