Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure AD latency when attaching ACR - day later replication not completed #695

Closed
mloskot opened this issue May 23, 2023 · 2 comments
Closed

Comments

@mloskot
Copy link

mloskot commented May 23, 2023

I followed https://learn.microsoft.com/en-us/azure/aks/cluster-container-registry-integration to enable my AKS cluster with access to my private ACR. Everything seemed worked fine:

image

apart from the Azure replication process not completing and I'm still seeing the Identity not found for the two of AKS cluster identities that I assigned roles with my ACR:

image

https://learn.microsoft.com/en-us/azure/aks/cluster-container-registry-integration says:

There is a latency issue with Azure Active Directory groups when attaching ACR (...) there may be a delay before the RBAC group takes effect.

I understand it, but it has been more than 12h since creating the role assignments.

Question: Is this typical to wait that long?


I attempted to troubleshoot the problem following https://learn.microsoft.com/en-us/azure/role-based-access-control/troubleshooting#symptom---role-assignments-with-identity-not-found

$ az role assignment list --scope /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-acr-contoso/providers/Microsoft.ContainerRegistry/registries/my-contoso
[
  {
    "condition": null,
    "conditionVersion": null,
    "createdBy": "236564d1-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "createdOn": "2023-05-22T20:58:53.717018+00:00",
    "delegatedManagedIdentityResourceId": null,
    "description": "",
    "id": "/subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-acr-contoso/providers/Microsoft.ContainerRegistry/registries/my-contoso/providers/Microsoft.Authorization/roleAssignments/fa76709d-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "name": "fa76709d-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "principalId": "2f850a88-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "principalName": "",
    "principalType": "ServicePrincipal",
    "resourceGroup": "my-acr-contoso",
    "roleDefinitionId": "/subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/providers/Microsoft.Authorization/roleDefinitions/7f951dda-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "roleDefinitionName": "AcrPull",
    "scope": "/subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-acr-contoso/providers/Microsoft.ContainerRegistry/registries/my-contoso",
    "type": "Microsoft.Authorization/roleAssignments",
    "updatedBy": "236564d1-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "updatedOn": "2023-05-22T20:58:53.717018+00:00"
  },
  {
    "condition": null,
    "conditionVersion": null,
    "createdBy": "236564d1-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "createdOn": "2023-05-22T20:58:53.770669+00:00",
    "delegatedManagedIdentityResourceId": null,
    "description": "",
    "id": "/subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-acr-contoso/providers/Microsoft.ContainerRegistry/registries/my-contoso/providers/Microsoft.Authorization/roleAssignments/f188bf19-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "name": "f188bf19-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "principalId": "7e49d45b-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "principalName": "",
    "principalType": "ServicePrincipal",
    "resourceGroup": "my-acr-contoso",
    "roleDefinitionId": "/subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/providers/Microsoft.Authorization/roleDefinitions/7f951dda-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "roleDefinitionName": "AcrPull",
    "scope": "/subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-acr-contoso/providers/Microsoft.ContainerRegistry/registries/my-contoso",
    "type": "Microsoft.Authorization/roleAssignments",
    "updatedBy": "236564d1-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "updatedOn": "2023-05-22T20:58:53.770669+00:00"
  }
]

Question: Does this empty principalName indicate I should keep waiting?


I also attempted the troubleshooting according to https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/cannot-pull-image-from-acr-to-aks-cluster

$ az aks check-acr --resource-group my-aks-contoso-dev --name aks-contoso-uks-dev-aks --acr my-contoso.azurecr.io
Merged "aks-contoso-uks-dev-aks" as current context in C:\Users\mateuszl\AppData\Local\Temp\tmpgrz7ae2s
[2023-05-23T11:15:33Z] Checking host name resolution (my-contoso.azurecr.io): SUCCEEDED
[2023-05-23T11:15:33Z] Canonical name for ACR (my-contoso.azurecr.io): r0509uks.uksouth.cloudapp.azure.com.
[2023-05-23T11:15:33Z] ACR location: uksouth
[2023-05-23T11:15:33Z] Checking managed identity...
[2023-05-23T11:15:33Z] Kubelet managed identity client ID: 7e49d45b-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[2023-05-23T11:15:33Z] Validating managed identity existance: SUCCEEDED
[2023-05-23T11:15:35Z] Validating image pull permission: FAILED
[2023-05-23T11:15:35Z] ACR my-contoso.azurecr.io rejected token exchange: ACR token exchange endpoint returned error status: 401. body:
$ az role assignment list  --scope /subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-acr-contoso/providers/Microsoft.ContainerRegistry/registries/my-contoso --output table
Principal   Role    Scope
----------- ------- ---------------------------------------------------------------------------------------------------------------------------------------------
            AcrPull /subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-acr-contoso/providers/Microsoft.ContainerRegistry/registries/my-contoso
            AcrPull /subscriptions/4629e9b5-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/my-acr-contoso/providers/Microsoft.ContainerRegistry/registries/my-contoso

Question: Does this empty Principal also indicate I should keep waiting?

@mloskot
Copy link
Author

mloskot commented May 23, 2023

Interestingly, when I go to portal.azure.com > my container registry > Access control (IAM) > Check access > Find > Managed identity > User-assigned managed identity > select one of my aks-*-agentpool-s that is one corresponding with the "principalId": "2f850a88-xxxx-xxxx-xxxx-xxxxxxxxxxxx" above, then I'm getting very different result than from the az role assignment list command above:

image

@mloskot
Copy link
Author

mloskot commented May 23, 2023

Solved!

I assigned role using Client ID of aks-*-agentpool managed identity of my AKS clusters, instead of Object (principal) ID:

resource "azurerm_role_assignment" "aks_acr_pull_allowed" {
  principal_id                = ...I put Client ID of AKS managed identity instead of Object (principal) ID...
  role_definition_name = "AcrPull"
...
}

As soon as I corrected my Terraform code, applied, then my ACR shows the expected identities and my AKS clusters can pull images from my ACR.

Apologies for the false issue report.

OTOH, this could be added to the catalogue of issues in the troubleshooting guide :)

I owe huge thanks to @alexeldeib for his great help via #provider-azure channel on Kubernetes Slack.

@mloskot mloskot closed this as completed May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant