Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't deploy an application from ACR #76

Closed
RohanNagar opened this issue Dec 13, 2017 · 32 comments
Closed

Can't deploy an application from ACR #76

RohanNagar opened this issue Dec 13, 2017 · 32 comments
Labels
azure/acr Azure Container Registry question

Comments

@RohanNagar
Copy link

I'm following this tutorial: https://docs.microsoft.com/en-in/azure/aks/tutorial-kubernetes-deploy-application

I can't seem to get azure-vote-front to deploy in AKS in part 4 of the tutorial. Both the ACR and the AKS are in the same resource group, but looking at the Kubernetes logs shows that there was an authentication failure, where it is failing to pull the image from ACR:

Containers:
  azure-vote-front:
    Container ID:
    Image:          ronagaraksregistry.azurecr.io/azure-vote-front:redis-v1
    Image ID:
    Port:           80/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:  500m
    Requests:
      cpu:  250m
    Environment:
      REDIS:  azure-vote-back
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-f09nh (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  default-token-f09nh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-f09nh
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age               From                               Message
  ----     ------                 ----              ----                               -------
  Normal   Scheduled              3m                default-scheduler                  Successfully assigned azure-vote-front-1287338239-9lqvz to aks-nodepool1-15440196-0
  Normal   SuccessfulMountVolume  3m                kubelet, aks-nodepool1-15440196-0  MountVolume.SetUp succeeded for volume "default-token-f09nh"
  Normal   Pulling                59s (x5 over 3m)  kubelet, aks-nodepool1-15440196-0  pulling image "ronagaraksregistry.azurecr.io/azure-vote-front:redis-v1"
  Warning  Failed                 59s (x5 over 3m)  kubelet, aks-nodepool1-15440196-0  Failed to pull image "ronagaraksregistry.azurecr.io/azure-vote-front:redis-v1": rpc error: code = 2 desc = Error response from daemon: {"message":"Get https://ronagaraksregistry.azurecr.io/v2/azure-vote-front/manifests/redis-v1: unauthorized: authentication required"}
  Warning  FailedSync             3s (x18 over 3m)  kubelet, aks-nodepool1-15440196-0  Error syncing pod
  Normal   BackOff                3s (x13 over 3m)  kubelet, aks-nodepool1-15440196-0  Back-off pulling image "ronagaraksregistry.azurecr.io/azure-vote-front:redis-v1"

I tried deleting the cluster and deploying again after a day, same issue happened so it wasn't a one-off thing. I've been following the tutorial.

Does anyone have any ideas?

@RohanNagar
Copy link
Author

Any ideas?

@slack slack added bug azure/acr Azure Container Registry labels Dec 20, 2017
@slack
Copy link
Contributor

slack commented Dec 20, 2017

We may have a regression in the built-in ACR integration. When you created the service principal (SPN) for the cluster create what scopes did you use, or did you use the SPN created by az aks create ...?

As a workaround, you should be able to provide an imagePullSecret in the pod spec or attached to the pod's service account.

@marrobi
Copy link
Contributor

marrobi commented Dec 20, 2017

I witnessed this last week. I resolved the issue by manually added the SPN to IAM as a contributor for the ACR resource. The SPN was created using az aks create ...

@pjbgf
Copy link

pjbgf commented Dec 20, 2017

@RohanNagar as your container registry is private and the kubelet is failing to pull the image, most probably your Service Principal does not have access to the ACR.

Overall, that authentication can be done by either Service Principal Context - the one generated/provided at cluster creation - or by providing a docker secret that contains username/password. To do the latter, you need to create a secret in your cluster and reference it on your yaml file. Here's a walkthrough.

@slack
Copy link
Contributor

slack commented Dec 20, 2017

@RohanNagar just re-tested the ACR integration with AKS, and everything is working as expected.

If you are not explicitly providing a username/password, check that the service principal used by your AKS cluster has sufficient scope to ACR:

$ az aks show -g jahanse-prod-eastus --name monday1 -o json | jq -r ".servicePrincipalProfile.clientId"
FAKEE95E-BC11-4621-AD43-986F2DA78423
$ az role assignment list --assignee FAKEE95E-BC11-4621-AD43-986F2DA78423
Principal                                                                Role         Scope
-----------------------------------------------------------------------  -----------  ---------------------------------------------------
http://21250d.eas-57ac26.eastus.cloudapp.azure.com  Contributor  /subscriptions/0D334CA7-9C73-4852-A0EB-SUBBASUB8CAD0

@slack slack added question and removed bug labels Dec 20, 2017
@slack
Copy link
Contributor

slack commented Dec 20, 2017

@marrobi do you happen to remember what the scope of your cluster SPN was before and after adding access to ACR?

@RohanNagar
Copy link
Author

@slack Thank you for the help. It looks like the service principal used doesn't have sufficient scope:

$ az aks show -g aksGroup --name K8sCluster -o json | jq -r ".servicePrincipalProfile.clientId"
9a943293-7c4c-4c8d-8c4b-bbb0a33dbf54
$ az role assignment list --assignee 9a943293-7c4c-4c8d-8c4b-bbb0a33dbf54
[]
$ 

But I'm actually a bit confused, because when I look at the portal, I see in the IAM for the cluster and the ACR, there is a "App Service or Function App" reader that is the same for both: cln77ea1ddf-8850-415e-9446-57a960b79a3f

Do you know where that is coming from, and why the UUID is different from the one displayed on the command line?

@slack
Copy link
Contributor

slack commented Dec 20, 2017

@RohanNagar ah, I didn't realize that role assignment list doesn't include non-subscription resources, by default. Try az role assignment list --assignee 9a943293-7c4c-4c8d-8c4b-bbb0a33dbf54 --all and you will see a scope for the node resource group.

I now realizee that our previous instructions for ACS and AKS walked folks through subscription-scoped SPN which let the Kubernetes ACR integration work magically.

My test cluster was sub-scoped as well. doh!

Add a {Reader,Contributor,Owner} scope to the cluster SPN and reference the instance of ACR.

$ az role assignment create --assignee FAKEE95E-BC11-4621-AD43-986F2DA78423 --role=Reader \
--scope=/subscriptions/0D334CA7-9C73-4852-A0EB-SUBBASUB8CAD0/resourceGroups/jahanse-prod-westus2/providers/Microsoft.ContainerRegistry/registries/jahanseacrwestus2

@RohanNagar
Copy link
Author

@slack Looks to have solved my problem, the pod is able to pull the image and serve the front end now. Thanks for all your help!

@marrobi
Copy link
Contributor

marrobi commented Dec 21, 2017

@slack Not 100% sure. As I say was created automatically. I believe it had scope to the agent pool RG, but not the primary AKS resource RG, as thats where ACR was also deployed. But don't hold me to that!

I actually saw two different people with the exact same issue that day - we were running a hack.

@KSLHacks
Copy link

KSLHacks commented Mar 6, 2018

@slack Hey all, this thread helped me out a ton - thanks!

We ran into the same issue with the cluster SPN not having contributor access to the ACR resource.
Note: Reader access is not enough - we were running into the same unauthorized error with Reader scope. Contributor is required.

Also, the workaround suggested on this thread and in the Kubernetes docs using suggesting to use the secret and imagePullSecret did not work. Even though we created a secret and added it to the pod yaml file it did not work without the SPN with contributor access.

@tolu
Copy link

tolu commented Mar 8, 2018

Unfortunately non of the ideas in this thread worked for me.
I had to create a kubernetes secret using yaml to get up and running.

My context being that my ACR and AKS are in separate Azure subscriptions (if that should have an impact or not I'm not sure).

So the solution I ended up with is this

  1. create a temporary kubernetes secret as described in the AKS docs (using a Reader role)
  2. get the base64 values from the secret like so: kubectl get secrets/acr-auth-tmp -o yaml
  3. Create a new secret using the base64 data with yaml as described under Bypassing kubectl create secrets in the kubernetes images docs like so:
apiVersion: v1
kind: Secret
metadata:
  name: acr-auth
data:
  .dockerconfigjson: <base64-string>
type: kubernetes.io/dockerconfigjson
  1. Use this secret as imagePullSecrets

Why the original kubernetes secret did not work is just a mystery to me...

@mleneveut
Copy link

@tolu Thanks you saved my day ! Had to do the same thing :

  • kubectl create secret docker-registry regcred --docker-server=...
  • kubectl get secrets/regcred -o yaml
  • create secret with kubectl and your yaml and name acr-auth
  • spec:
    containers:
    - name: myName
    image: xxx.azurecr.io/yyy:1.0
    imagePullPolicy: Always
    ports:
    - containerPort: 80
    imagePullSecrets:
    - name: acr-auth

I added before trying this the Service Principal of my k8s cluster as Reader and Contributor of the ACR, so I don't know if it is required or not.

@benbuckland
Copy link

benbuckland commented Mar 15, 2018

@mleneveut were you deploying to a custom namespace or the default?

I am deploying to a custom name space and have configured the Azure Container Registry to write the secret to the container but it appears to be writing it to the default namespace and not the custom namespace specified?

@slack any ideas here?

@mleneveut
Copy link

@benbuckland I was deploying to default namespace.

@tolu
Copy link

tolu commented Mar 16, 2018

@benbuckland I'm also using custom namespaces.

From: https://kubernetes.io/docs/concepts/configuration/secret/

Secret API objects reside in a namespace. They can only be referenced by pods in that same namespace.

So make sure your kubectl context is set up with the correct namespace before creating the secret.

Like so:
kubectl config set-context $(kubectl config current-context) --namespace=<insert-namespace-name-here>

@benbuckland
Copy link

benbuckland commented Mar 16, 2018

Hey @tolu, i was trying to use the 'Deploy to Kubernetes' task in VSTS and expecting the task to create the ACR secret. The VSTS guys have come back and confirmed the fault I was seeing here microsoft/azure-pipelines-tasks#6695

Thanks again.

@ThorstenHans
Copy link

@slack can you provide a bit more information how to solve this issue properly when using SP and following the best practices to have ACR in a dedicated resource group.

So my AKS is in resource group a and ACR in b

The Service Principal had initially only Reader role assigned for the ACR scope.
try to pull an image -> ImagePullBackOff

I added Contributor role to the SP
Try to pull an image -> ImagePullBackOff

I added Owner role (which is pretty ugly)
Try to pull an image -> ImagePullBackOff

If I query for the role assignments using az role assignments list --all --assignee $SP_ID I can see the roles being assigned correctly

@ThorstenHans
Copy link

@slack quick update. Seems like k8s caches the service principal for a couple of minutes. After a couple of minutes I was able to pull the image from ACR. But it still feels a bit wrong to assign Owner role to the Service Principal

@philippneugebauer
Copy link

I cannot get it working too. Neither with Reader or Contributor. Since I don't consider using owner, I haven't tried that

@rikkigouda
Copy link

@slack @RohanNagar slack's solution worked for me too. Thank you both for sharing..

I figured I need not to modify the secrets in the pod yml file even. So everything stays the same - and the client service principal gets the new assignments; although... I've noticed that if you create a ACK resource group - the resource group that actually contains the running nodes is separated (I think to simplify the view for the Azure console end-user?...), and the default contributor role, although correctly assigned to the principal id, is scoped to the same resource group... SO... your ACR must be located within the same resource group as the one your nodes are scoped with. (Perhaps as an action item this should be a ticket to document a note on the guide lines/azure docs?...)

Took me a good 3 late nights to understand what's going on... hope this saves someone else's time.

@necevil
Copy link

necevil commented May 24, 2018

@philippneugebauer I had the same issues.
Even after granting the properly scoped (as far as I could tell) service principal Contibutor (and Reader) roles I was not able to connect.

Using the alternate 'Image Pull Secret' methodology described here let me get rolling:
https://docs.microsoft.com/en-us/azure/container-registry/container-registry-auth-aks#access-with-kubernetes-secret

@andrew-vdb
Copy link

To people who face issue when aks unable to pull image from acs,
at this moment of writing, aks only support LINUX container

so make sure to check acr first, see if you get successful pull or not
if you see lot of successful pull request but aks failed to pull the image then you might trying to run WINDOWS container app in aks
which means your issue got nothing to do with what people face in this thread

Create asp.net core, docker it with LINUX as target host
Then it should be working...

@ohdihe
Copy link

ohdihe commented Jul 12, 2018

@andrew-vandenbrink Does this include using aci-connectors? I thought that with aci-connectors, you can run either Linux and Windows containers.

@andrew-vdb
Copy link

@ohdihe
Copy link

ohdihe commented Jul 13, 2018

@andrew-vandenbrink. I was able to run Windows container on AKS using aci-connectors today. Aci-connectors acts like a virtual node in the AKS cluster. Also, I created a kubectl secret to grant AKS cluster permission to my private ACR in other to pull the windows image and run it.
I created the kubectl secret by using a SPN that has owner role scoped at the registery resource instead of the resource group. For some reason, reader or contributor role do not grant AKS enough permission to the images.
Link below.
https://github.com/virtual-kubelet/virtual-kubelet/blob/master/providers/azure/README.md

@andrew-vdb
Copy link

@ohdihe yes, what you described also in the faq...

@dev-tim
Copy link

dev-tim commented Oct 14, 2018

@Goudarzi wow it sounds like an actual bug. @slack I just reproduced same thing on my AKS deployment.

@ferantivero
Copy link

@slacks workaround worked for me...

@daanl
Copy link

daanl commented Nov 7, 2018

Have same problem as @tolu all the options didn't work for me, any idea when this is getting properly fixed @slack ?

@r3ziel
Copy link

r3ziel commented Feb 19, 2019

im stuck in a same issue i think

The push refers to repository [docker.io/prova01/azure-vote-front]
6a3740f27370: Preparing
d1a6bb5dc38f: Preparing
f1dd186bf5eb: Preparing
c592491fcab1: Preparing
1ca7cd51ceeb: Preparing
643259079b32: Waiting
0f1e35d00fa7: Waiting
172a5377ee25: Waiting
db87404f5d3f: Waiting
1b655c784a8d: Waiting
d168ac8a1095: Waiting
4c13c63751a1: Waiting
e09c150c6aa8: Waiting
5c1068d126aa: Waiting
44ff51fbafa7: Waiting
c706028b53c1: Waiting
a67398d274c8: Waiting
3b520191885f: Waiting
08092d60523f: Waiting
d9499c02e387: Waiting
d47b056b8c3a: Waiting
3aecb0863872: Waiting
72e1f0f78475: Waiting
56a89222b908: Waiting
a89464ad2a8f: Waiting
76dfa41f0a1d: Waiting
c240c542ed55: Waiting
badfbcebf7f8: Waiting
denied: requested access to the resource is denied

any ideas?

@Sudharma
Copy link

@slack still this does not work for us. I mean the behaviour is inconsistent. It works sometimes and after few deployments it fails and if we start the deployment again. it works. No idea whats wrong here.

The original SP is with Contributor role and the same SPN has the AcrPullRole. If I list the role assignment with --all then I can see the role

@Azure Azure locked as resolved and limited conversation to collaborators Aug 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
azure/acr Azure Container Registry question
Projects
None yet
Development

No branches or pull requests