Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting internal loadBalancerIP does not work #422

Closed
theMichaelB opened this issue Jun 11, 2018 · 28 comments
Closed

setting internal loadBalancerIP does not work #422

theMichaelB opened this issue Jun 11, 2018 · 28 comments
Labels

Comments

@theMichaelB
Copy link

Using the following service, attempting to specify the loadBalancerIP fails with -

Error creating load balancer (will retry): failed to ensure load balancer for service default/mongodb-service: timed out waiting for the condition

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
  name: mongodb-service
  labels:
    name: mongo
spec:
  type: LoadBalancer
  loadBalancerIP: 10.16.0.250
  ports:
  - port: 27017
  selector:
    role: mongo

Removing the loadBalancerIP line works as expected

@sauryadas
Copy link
Contributor

when was this cluster created? region? basic networking or advanced?

@theMichaelB
Copy link
Author

a few hours ago, centralus, advanced networking

@sauryadas
Copy link
Contributor

@JunSun17 I believe there was a fix for this issue. can you confirm?

@JunSun17
Copy link

@theMichaelB Can you remove the loadBalancerIP field from your template file, I think it will be assigned from the subnet you provided, but not sure you can specify it.

@theMichaelB
Copy link
Author

@JunSun17 yes if I remove it, everything works, but being able to specify the IP means we can map it via DNS etc (I know I can script it etc, but laziness!)

it is also documented at https://docs.microsoft.com/en-my/azure/aks/internal-lb

Is it possible to specify which subnet the lb is deployed to?

@JunSun17
Copy link

@theMichaelB currently AKS only takes one subnet, so you do not need to provide it. Actually I do not know you can specify the IP address in ILB creation. If so, you will need to make sure:

  1. the IP is from the subnet.
  2. the IP should not be assigned to a node or pod already. You can try to use a high IP from the subnet address range since IPs are allocated from the low end of the subnet range. But it general I feel it is still risky since there might be IP conflicts (depending on Azure CNI implementation, which I do not know much about it)

@theMichaelB
Copy link
Author

@JunSun17 What would it take to get a definitive answer for this? Is the code for this going to be in ACS-Engine? or is it a custom AKS thing?

If it isn't possible then it isn't possible, (and I'll go and update the above doc)

@sauryadas
Copy link
Contributor

sauryadas commented Jun 12, 2018 via email

@theMichaelB
Copy link
Author

theMichaelB commented Jun 12, 2018

@sauryadas the specified IP does reside in the same subnet, and isn't already assigned to a resource.

I have also granted the Service principal access to the Vnet that the subnet is in. (it is in a different resource group - I wonder if that is the issue?)

@VincentSurelle
Copy link

VincentSurelle commented Jun 13, 2018

Hi, I can relate to this issue.

I built an AKS Cluster with basic networking.

When specifying a LoadBalancerIP :

  • First time : Worked well, got an "External Endpoint" in Kubernetes Dashboard
  • Every other time : Error creating load balancer (will retry): failed to ensure load balancer for service default/mongodb-service: timed out waiting for the condition

Region : West Europe

EDIT : Built an hour ago

@VincentSurelle
Copy link

More informations :

I can make as many Load Balancer as I want but they all need to have different IPs.
Same IP with Same port obviously don't work
Same IP with Different port don't work

@sauryadas
Copy link
Contributor

I think this is an Azure Loadbalancer limitation @aanandr Can you please confirm the below?
I can make as many Load Balancer as I want but they all need to have different IPs.
Same IP with Same port obviously don't work
Same IP with Different port don't work

@JunSun17 The below should work. Can you please take a look?

the specified IP does reside in the same subnet, and isn't already assigned to a resource.

@vyta
Copy link

vyta commented Jun 20, 2018

@sauryadas @JunSun17 I am also experiencing this. Any updates?

@marrobi
Copy link
Contributor

marrobi commented Jul 5, 2018

@sauryadas @aanandr getting a similar issue, AKS K8S v 1.10.3, not an internal IP, specifying a public IP for the load balancer, the IP is in a different RG. This did work earlier, same k8s version, but this was a fresh cluster. could do with some more detailed error about "the condition".

  Type     Reason                      Age              From                Message
  ----     ------                      ----             ----                -------
  Normal   EnsuringLoadBalancer        3m (x7 over 9m)  service-controller  Ensuring load balancer
  Warning  CreatingLoadBalancerFailed  3m (x7 over 9m)  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service default/my-service: timed out waiting for the condition

I've tried deleting and recreating the Service.

@marrobi
Copy link
Contributor

marrobi commented Jul 5, 2018

Just tried with brand new cluster, and new IP address, same issue.

If I create a service with no IP specified the LoadBalancer gets created.

If I add the IP back, I get:

  Type    Reason                Age              From                Message
  ----    ------                ----             ----                -------
  Normal  EnsuredLoadBalancer   4m (x2 over 4m)  service-controller  Ensured load balancer
  Normal  EnsuringLoadBalancer  4m (x3 over 7m)  service-controller  Ensuring load balancer
  Normal  LoadbalancerIP        4m               service-controller  -> 13.73.148.26
  Normal  EnsuringLoadBalancer  3m               service-controller  Ensuring load balancer
  Normal  EnsuringLoadBalancer  2m               service-controller  Ensuring load balancer
  Normal  EnsuringLoadBalancer  1m               service-controller  Ensuring load balancer
  Normal  EnsuringLoadBalancer  38s              service-controller  Ensuring load balancer

Anyway I can get more logs?

@JunSun17
Copy link

JunSun17 commented Jul 5, 2018

@marrobi @vyta I will take a look at this reported issue and get back to you. Thanks!

@JunSun17
Copy link

JunSun17 commented Jul 6, 2018

@marrobi @vyta I checked and can not re-produce this issue. Specifying an IP or not in the lb yaml, both work for me without issues.

Just from the timeout error message, is your back end pod running fine? Can you also try to delete the ILB svc and recreate it to see if it works?

@marrobi
Copy link
Contributor

marrobi commented Jul 6, 2018

I've tried to reproduce this morning a number of different ways - everything fresh - and all works as expected. Will update if I see it again, but could do with some guidance on getting additional logs should it occur.

@marrobi
Copy link
Contributor

marrobi commented Jul 6, 2018

Something isn't right. I have no Load Balancer in the resource group. Deploy the exact same YAML, LoadBalancer is now stuck in pending:

$ kubectl get service
NAME         TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP      10.0.0.1     <none>        443/TCP        22m
my-service   LoadBalancer   10.0.14.27   <pending>     80:31209/TCP   12m

And no events if do a describe:

$ kubectl describe service my-service
Name:                     my-service
Namespace:                default
Labels:                   <none>
Annotations:              kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"service.beta.kubernetes.io/azure-load-balancer-resource-group":"tmpAKS2"},"name":"my-se...
                          service.beta.kubernetes.io/azure-load-balancer-resource-group=tmpAKS2
Selector:                 app=my-app
Type:                     LoadBalancer
IP:                       10.0.14.27
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  31209/TCP
Endpoints:                10.244.0.5:80,10.244.1.5:80,10.244.2.6:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

It seems to be if I deploy a service - the first one on a cluster, delete it, then recreate it the LB doesn't get created. Similar to what @VincentSurelle says. Will try verify further.

Happy for you to ping me offline, GitHub name is MS alias to walk through. Thanks.

@marrobi
Copy link
Contributor

marrobi commented Jul 6, 2018

I've narrowed down the situation in which I get the issue, and can reproduce, it's something specific around RBAC and SPs so might not be the case for all the above situations.

If you initially don't have the right rights assigned to the SP to VNets/RGs then the LB creation fails and get a semi useful error in events. this is as expected.

If I then correct the SP assignment, delete and recreate the service I get the "timed out waiting for the condition". I can't seem to recover from this. If I create a new cluster, and assign the correct SP rights prior to deploying the service all works fine.

Here's an example, using an IP in different RG, but expect it could be the same for other SP related issues.

YAML:


kind: Service
apiVersion: v1
metadata:
  name: my-service
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-resource-group: $RG
spec:
  selector:
    app: my-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  loadBalancerIP: $PUBLIC_IP
  type: LoadBalancer

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

Steps:

export RG="MyRG"

az group create -n $RG -l westeurope

az aks create -g $RG -n tmpAKS --kubernetes-version 1.10.3 --node-count 1

CLIENT_ID=$(az aks show -n tmpAKS -g $RG --query servicePrincipalProfile.clientId | sed 's/"//g')

# ASSIGN INSUFFICIENT RIGHTS - Reader
az role assignment create --role "Reader" --assignee $CLIENT_ID --resource-group $RG

az network public-ip create -g $RG -n service-ip --dns-name demo-service-ip-2  --allocation-method Static
export PUBLIC_IP=$(az  network public-ip show -n service-ip -g $RG --query ipAddress | sed 's/"//g')

az aks get-credentials -g $RG -n tmpAKS

envsubst < service-ip-rg.yaml | kubectl apply -f -

kubectl describe service my-service

# THIS DOESN'T WORK, SO I DELETE 

envsubst < service-ip-rg.yaml | kubectl delete -f -

# ASSIGN SUFFICIENT RIGHTS - Network Contributor

az role assignment create --role "Network Contributor" --assignee $CLIENT_ID --resource-group $RG

# REDEPLOY

envsubst < service-ip-rg.yaml | kubectl apply -f -

kubectl describe service my-service

Still doesn't work. I guess need to recreate a session somewhere given there is a new SP assignment? Might be completely wrong...

@JunSun17
Copy link

JunSun17 commented Jul 6, 2018

@marrobi Thanks for the detailed update!

Why do you:
az role assignment create --role "Reader" --assignee $CLIENT_ID --resource-group $RG

I think by default the SP should have owner role on RG, do you have to change role assignments here?

@marrobi
Copy link
Contributor

marrobi commented Jul 6, 2018 via email

@lfshr
Copy link

lfshr commented Sep 26, 2018

I'm hitting this issue as well. The load balancer is created fine when loadBalancerIP is not specified, but when specified, the External-IP is always "pending"

@feiskyer
Copy link
Member

feiskyer commented Sep 26, 2018

@lfshr Please check logs of kube-controller-manager and find what's wrong in that. It's probably the ip is not in same resource group as kubernetes nodes.

Refer https://docs.microsoft.com/en-us/azure/aks/view-master-logs for guides to do this.

@lfshr
Copy link

lfshr commented Sep 26, 2018

Yep, I was being stupid. Thanks @feiskyer

@millergd
Copy link

@lfshr It's a long shot, but do you remember what the issue was? I'm in the same position - pending with specific external IP but created fine without it. Optimally, I'd like to point to the hostname of specific ELB. Unfortunately I am using AWS which hasn't implemented master logs yet so I can't see the error....

@jnoller jnoller added the triage label Apr 3, 2019
@jnoller
Copy link
Contributor

jnoller commented Apr 3, 2019

Closing issue as stale.

@jnoller jnoller closed this as completed Apr 3, 2019
@pankajagrawal16
Copy link

I am also facing the same issue.
I also generated a private load balancer ip and that was success.

Then i deleted my helm release and tried installing it again and since then its stuck with time out or waiting error

@Azure Azure locked as resolved and limited conversation to collaborators Aug 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests