Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission issue using AGIC with an MSI enabled cluster #842

Open
erikschlegel opened this issue May 5, 2020 · 5 comments
Open

Permission issue using AGIC with an MSI enabled cluster #842

erikschlegel opened this issue May 5, 2020 · 5 comments
Assignees

Comments

@erikschlegel
Copy link

erikschlegel commented May 5, 2020

Describe the bug
Now that AKS MSI is GA, I'm trying to configure AGIC with an MSI enabled AKS cluster, and running into the below behavior. I'm using the AGIC chart 1.2.0-rc1.

Error

Possible reasons: AKS Service Principal requires 'Managed Identity Operator' access on Controller Identity; 'identityResourceID' and/or 'identityClientID' are incorrect in the Helm config; AGIC Identity requires 'Contributor' access on Application Gateway and 'Reader' access on Application Gateway's Resource Group;

The AGIC identity resides within the AKS node resource group containing the below role permissions.
Reader scoped to AppGW resource group
Contributor scoped to App GW resource

Are there additional role assignments that I'm missing? I suspect it's related to a missing Managed Identity Operator assignment, but it's unclear which identity I'd scope that permission to for AKS MSI-enabled clusters (ie agentpool identity, app gw identity, etc)?

ERROR: logging before flag.Parse: I0505 21:32:39.663544       1 utils.go:105] Using verbosity level 5 from environment variable APPGW_VERBOSITY_LEVEL
I0505 21:32:39.728294       1 environment.go:210] KUBERNETES_WATCHNAMESPACE is not set. Watching all available namespaces.
I0505 21:32:39.728484       1 main.go:124] App Gateway Details: Subscription: xxxxxxx, Resource Group: erisch-aks-r-risgzihc-aks-rg, Name: erisch-aks-risgzihc-sdms-r2-appgw
I0505 21:32:39.728521       1 auth.go:46] Creating authorizer from Azure Managed Service Identity
I0505 21:32:39.728567       1 httpserver.go:57] Starting API Server on :8123
E0505 21:32:41.064022       1 client.go:132] Possible reasons: AKS Service Principal requires 'Managed Identity Operator' access on Controller Identity; 'identityResourceID' and/or 'identityClientID' are incorrect in the Helm config; AGIC Identity requires 'Contributor' access on Application Gateway and 'Reader' access on Application Gateway's Resource Group;
E0505 21:32:41.064046       1 client.go:145] Unexpected ARM status code on GET existing App Gateway config: 403
E0505 21:32:41.064055       1 client.go:148] Failed fetching config for App Gateway instance. Will retry in 10s. Error: network.ApplicationGatewaysClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'dd0527d8-bd7-xxxxx' with object id 'dd0527d8-286e-4b69-xxxxx' does not have authorization to perform action 'Microsoft.Network/applicationGateways/read' over scope '/subscriptions/xxxxxxxx/resourceGroups/erisch-aks-r-risgzihc-aks-rg/providers/Microsoft.Network/applicationGateways/erisch-aks-risgzihc-sdms-r2-appgw' or the scope is invalid. If access was recently granted, please refresh your credentials."

Ingress Controller details

  • Output of kubectl describe pod <ingress controller> .
LAPTOP-HUT1O52T:/mnt/d/fabtest/charts/sdms# kubectl describe pod ingress-azure-6d874bfd5d
Name:           ingress-azure-6d874bfd5d-jb25v
Namespace:      default
Priority:       0
Node:           aks-default-59078705-vmss000002/10.10.1.66
Start Time:     Tue, 05 May 2020 16:32:20 -0500
Labels:         aadpodidbinding=ingress-azure
                app=ingress-azure
                pod-template-hash=6d874bfd5d
                release=ingress-azure
Annotations:    prometheus.io/port: 8123
                prometheus.io/scrape: true
Status:         Running
IP:             10.10.1.69
IPs:            <none>
Controlled By:  ReplicaSet/ingress-azure-6d874bfd5d
Containers:
  ingress-azure:
    Container ID:   docker://c8c50732d8fda4e7186a8d26689043854e910bb2d2404708fa125dbf169328ae
    Image:          mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.2.0-rc1
    Image ID:       docker-pullable://mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:dd95b2feaf24e7ba6773452fb842d0eba5a6ea8a5d19bf22035fdcad78b18941
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Tue, 05 May 2020 16:35:39 -0500
      Finished:     Tue, 05 May 2020 16:36:00 -0500
    Ready:          False
    Restart Count:  4
    Liveness:       http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      ingress-azure  ConfigMap  Optional: false
    Environment:
      AZURE_CLOUD_PROVIDER_LOCATION:  /etc/appgw/azure.json
      AGIC_POD_NAME:                  ingress-azure-6d874bfd5d-jb25v (v1:metadata.name)
      AGIC_POD_NAMESPACE:             default (v1:metadata.namespace)
    Mounts:
      /etc/appgw/azure.json from azure (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ingress-azure-token-rdl7r (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  azure:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/azure.json
    HostPathType:  File
  ingress-azure-token-rdl7r:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-azure-token-rdl7r
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                                      Message
  ----     ------     ----                   ----                                      -------
  Normal   Scheduled  4m32s                  default-scheduler                         Successfully assigned default/ingress-azure-6d874bfd5d-jb25v to aks-default-59078705-vmss000002
  Warning  Unhealthy  2m56s (x4 over 4m6s)   kubelet, aks-default-59078705-vmss000002  Readiness probe failed: Get http://10.10.1.69:8123/health/ready: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    2m32s (x5 over 3m29s)  kubelet, aks-default-59078705-vmss000002  Back-off restarting failed container
  Normal   Pulling    2m19s (x4 over 4m31s)  kubelet, aks-default-59078705-vmss000002  Pulling image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.2.0-rc1"
  Normal   Pulled     2m19s (x4 over 4m18s)  kubelet, aks-default-59078705-vmss000002  Successfully pulled image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.2.0-rc1"
  Normal   Created    2m19s (x4 over 4m13s)  kubelet, aks-default-59078705-vmss000002  Created container ingress-azure
  Normal   Started    2m18s (x4 over 4m13s)  kubelet, aks-default-59078705-vmss000002  Started container ingress-azure

root@LAPTOP-HUT1O52T:/mnt/d/fabtest/charts/sdms# kubectl describe pod ingress-azure-6d874bfd5d-jb25v
Name:           ingress-azure-6d874bfd5d-jb25v
Namespace:      default
Priority:       0
Node:           aks-default-59078705-vmss000002/10.10.1.66
Start Time:     Tue, 05 May 2020 16:32:20 -0500
Labels:         aadpodidbinding=ingress-azure
                app=ingress-azure
                pod-template-hash=6d874bfd5d
                release=ingress-azure
Annotations:    prometheus.io/port: 8123
                prometheus.io/scrape: true
Status:         Running
IP:             10.10.1.69
IPs:            <none>
Controlled By:  ReplicaSet/ingress-azure-6d874bfd5d
Containers:
  ingress-azure:
    Container ID:   docker://06c4f9f85b0ba7b8ec43219da3d3387ded6edccf7d0ff9db60c4c55f3255f50f
    Image:          mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.2.0-rc1
    Image ID:       docker-pullable://mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:dd95b2feaf24e7ba6773452fb842d0eba5a6ea8a5d19bf22035fdcad78b18941
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Tue, 05 May 2020 16:37:29 -0500
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Tue, 05 May 2020 16:35:39 -0500
      Finished:     Tue, 05 May 2020 16:36:00 -0500
    Ready:          False
    Restart Count:  5
    Liveness:       http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      ingress-azure  ConfigMap  Optional: false
    Environment:
      AZURE_CLOUD_PROVIDER_LOCATION:  /etc/appgw/azure.json
      AGIC_POD_NAME:                  ingress-azure-6d874bfd5d-jb25v (v1:metadata.name)
      AGIC_POD_NAMESPACE:             default (v1:metadata.namespace)
    Mounts:
      /etc/appgw/azure.json from azure (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ingress-azure-token-rdl7r (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  azure:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/azure.json
    HostPathType:  File
  ingress-azure-token-rdl7r:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-azure-token-rdl7r
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                                      Message
  ----     ------     ----                   ----                                      -------
  Normal   Scheduled  5m25s                  default-scheduler                         Successfully assigned default/ingress-azure-6d874bfd5d-jb25v to aks-default-59078705-vmss000002
  Warning  Unhealthy  3m49s (x4 over 4m59s)  kubelet, aks-default-59078705-vmss000002  Readiness probe failed: Get http://10.10.1.69:8123/health/ready: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    3m25s (x5 over 4m22s)  kubelet, aks-default-59078705-vmss000002  Back-off restarting failed container
  Normal   Pulled     3m12s (x4 over 5m11s)  kubelet, aks-default-59078705-vmss000002  Successfully pulled image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.2.0-rc1"
  Normal   Created    3m12s (x4 over 5m6s)   kubelet, aks-default-59078705-vmss000002  Created container ingress-azure
  Normal   Started    3m11s (x4 over 5m6s)   kubelet, aks-default-59078705-vmss000002  Started container ingress-azure
  Normal   Pulling    16s (x6 over 5m24s)    kubelet, aks-default-59078705-vmss000002  Pulling image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.2.0-rc1"
  • Output of `kubectl logs .
LAPTOP-HUT1O52T:/mnt/d/fabtest/charts/sdms# kubectl logs ingress-azure-6d874bfd5d-jb25v
ERROR: logging before flag.Parse: I0505 21:37:29.808530       1 utils.go:105] Using verbosity level 5 from environment variable APPGW_VERBOSITY_LEVEL
I0505 21:37:29.866221       1 environment.go:210] KUBERNETES_WATCHNAMESPACE is not set. Watching all available namespaces.
I0505 21:37:29.866452       1 main.go:124] App Gateway Details: Subscription: xxxxxxxxxxx, Resource Group: erisch-aks-r-risgzihc-aks-rg, Name: xxxxxxx
I0505 21:37:29.866473       1 auth.go:46] Creating authorizer from Azure Managed Service Identity
I0505 21:37:29.866608       1 httpserver.go:57] Starting API Server on :8123
E0505 21:37:29.988885       1 client.go:132] Possible reasons: AKS Service Principal requires 'Managed Identity Operator' access on Controller Identity; 'identityResourceID' and/or 'identityClientID' are incorrect in the Helm config; AGIC Identity requires 'Contributor' access on Application Gateway and 'Reader' access on Application Gateway's Resource Group;
E0505 21:37:29.988908       1 client.go:145] Unexpected ARM status code on GET existing App Gateway config: 403
@erikschlegel erikschlegel changed the title Permission Issue using AGIC with MSI enabled Cluster Permission Issue using AGIC with an MSI enabled cluster May 5, 2020
@erikschlegel erikschlegel changed the title Permission Issue using AGIC with an MSI enabled cluster Permission issue using AGIC with an MSI enabled cluster May 5, 2020
@akshaysngupta akshaysngupta self-assigned this May 6, 2020
@tslavik
Copy link

tslavik commented May 6, 2020

I have exatly the same issue. Seems to be same as #828

@akshaysngupta
Copy link
Member

erikschlegel As @tslavik mentioned, this might be related to #820
To check if you are affected, check whether image used MIC or NMI pod is version v1.6. If yes, then please use the workaround mentioned in #828.
#828 (comment)

@akshaysngupta
Copy link
Member

@erikschlegel did the above suggestion fix the issue ?

@rlevchenko
Copy link

@joymon
Copy link

joymon commented Dec 6, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants