Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

400 error when mounting pvc for calls to retrieve instance id. #1146

Closed
yvan opened this issue Apr 25, 2019 · 18 comments
Closed

400 error when mounting pvc for calls to retrieve instance id. #1146

yvan opened this issue Apr 25, 2019 · 18 comments

Comments

@yvan
Copy link

yvan commented Apr 25, 2019

I'm having some trouble mounting my PVCs. It seems like an api call to get or refresh a token is failing. Anyone have thoughts, ideas, suggestions on what might cause the below?

Pod/container stuck initializing:

kubectl get pod -n res-jhub
NAME                     READY   STATUS              RESTARTS   AGE
hub-6bdd67b469-jltrl     0/1     ContainerCreating   0          28m

Events output from my pod:

Events:
  Type     Reason              Age                  From                               Message
  ----     ------              ----                 ----                               -------
  Normal   Scheduled           28m                  default-scheduler                  Successfully assigned res-jhub/hub-6bdd67b469-jltrl to aks-agentpool-57634498-3
  Warning  FailedAttachVolume  28m                  attachdetach-controller            AttachVolume.Attach failed for volume "pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" : failed to get azure instance id for node "aks-agentpool-57634498-3" (azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Compute/virtualMachines/aks-agentpool-57634498-3?%!!(MISSING)e(MISSING)xpand=instanceView&api-version=2018-04-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier 'MYSTERY_ID_HERE' was not found in the directory 'XXX'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant.\r\nTrace ID: e45c002b-0788-4447-9c77-59d0ed048500\r\nCorrelation ID: c0a4cf91-9c05-468b-b0c7-e36df71fab31\r\nTimestamp: 2019-04-25 12:20:19Z","error_codes":[700016],"timestamp":"2019-04-25 12:20:19Z","trace_id":"e45c002b-0788-4447-9c77-59d0ed048500","correlation_id":"c0a4cf91-9c05-468b-b0c7-e36df71fab31","error_uri":"https://login.microsoftonline.com/error?code=700016"})

MYSTERY_ID_HERE - an azure active directory app id i do not recognize, nor appears when i list all my active directory apps (including recently deleted ones).

The meat from the error above:

AttachVolume.Attach failed for volume "pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" : failed to get azure instance id for node "aks-agentpool-57634498-3" (azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Compute/virtualMachines/aks-agentpool-57634498-3?%!!(MISSING)e(MISSING)xpand=instanceView&api-version=2018-04-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier 'MYSTERY_ID_HERE' was not found in the directory 'XXX'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant

CC @andyzhangx

@welcome
Copy link

welcome bot commented Apr 25, 2019

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

@andyzhangx
Copy link
Contributor

I think you got a wrong tenant id or service principal, can you open /etc/kubernetes/azure.json and try use the sp and tenantid to login

az login --service-principal -u <aadClientId> -p <aadClientSecret> -t <tenantId>

@yvan
Copy link
Author

yvan commented Apr 25, 2019

i was thinking the same thing but am a little stumped. where should i look for /etc/kubernetes/azure.json in what machine/location?

@andyzhangx
Copy link
Contributor

/etc/kubernetes/azure.json is under master node or any agent node

@yvan
Copy link
Author

yvan commented Apr 25, 2019

hey andy -- what's the easiest way to do that? should i try to ssh into a node? i tried this as part of an issue before and it was quite difficult.

@andyzhangx
Copy link
Contributor

@yvan You should ssh to that master node, otherwise how do you use k8s cluster?
And also you could check your service principal in your aks-engine config file.

@yvan
Copy link
Author

yvan commented Apr 25, 2019

I am using AKS. I don't know where my aks-engine config file is.

ok i just used 'Run Command' to check the file on all of the nodes. I tried to run the az login command and i get the same error.

az login --service-principal -u MYSTERY_ID_HERE -p XXX -t TENANT_ID
Cloud Shell is automatically authenticated under the initial account signed-in with. Run 'az login' only if you need to use a different account
Get Token request returned http error: 400 and server response: {"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier 'MYSTERY_ID_HERE' was not found in the directory 'TENANT_ID'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant.\r\nTrace ID: 57759e1b-269e-475f-83ea-bda492588800\r\nCorrelation ID: 37f97ea7-f75e-460f-aed9-f721c140bbfa\r\nTimestamp: 2019-04-25 13:45:25Z","error_codes":[700016],"timestamp":"2019-04-25 13:45:25Z","trace_id":"57759e1b-269e-475f-83ea-bda492588800","correlation_id":"37f97ea7-f75e-460f-aed9-f721c140bbfa","error_uri":"https://login.microsoftonline.com/error?code=700016"}

@yvan
Copy link
Author

yvan commented Apr 25, 2019

could this be happening because the cluster is lacking an SP service point azure active directory application? if so how can i remedy that situation and get a new one in azure active directory?

@andyzhangx
Copy link
Contributor

@yvan
Copy link
Author

yvan commented Apr 25, 2019

thanks for your help, giving it a go ... how long should it take to update? been running for a few minutes now.

@yvan
Copy link
Author

yvan commented Apr 25, 2019

ok i updated the SP and the old error is gone. now i'm getting a new one:

2019-04-25 14:28:03+00:00 [Warning] AttachVolume.Attach failed for volume “pvc-7b7976d7-3a46-11e9-93d5-dee1946e6ce9” : Attach volume “kubernetes-dynamic-pvc-7b7976d7-3a46-11e9-93d5-dee1946e6ce9" to instance “/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Compute/virtualMachines/aks-agentpool-57634498-0” failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status= Code=“ConflictingUserInput” Message=“Disk ‘/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-7b7976d7-3a46-11e9-93d5-dee1946e6ce9’ cannot be attached as the disk is already owned by VM ‘/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Compute/virtualMachines/aks-agentpool-57634498-1’.”

seems like my PVC is bound to a different node and cannot be bound again.

@andyzhangx
Copy link
Contributor

one disk PVC could only be attached to one VM node

@yvan
Copy link
Author

yvan commented Apr 25, 2019

i know that you can only attach a pvc to one node. i'm trying to figure out why i am getting this message given that that the PVC in question is not currently being used by any pod. if the PVC is unused but i'm getting the above message that's a bit of a confusing situation. i'm trying to get context/ideas to understand what's happening better.

@andyzhangx
Copy link
Contributor

@yvan I noticed that your service principal has been invalid for a period of time, not sure whether it's related.

@yvan
Copy link
Author

yvan commented Apr 25, 2019

@andyzhangx i made a new service principal put it onto the cluster. that part seems fine now. i'm not sure why it would be invalid for a period of time and all of a sudden start causing issues today.

the attach volume issue is not affecting other PVCs on our cluster. just the ones i was touching/trying to mount while trying to debug the original issue (AADSTS700016) above. my hypothesis is that somehow when the pvc failed to attach originally they become locked and cluster wont unattach them now.

any thoughts on how to unattach a pvc that the cluster seems to not want to release on its own?

not dissimilar to Azure/AKS#884, last time i just waited a day, but obviously i need something a little better here.

@andyzhangx
Copy link
Contributor

you can just detach that pvc on azure portal, btw, what's your k8s version?

@yvan
Copy link
Author

yvan commented Apr 25, 2019

k8s version 1.12.5

@yvan
Copy link
Author

yvan commented Apr 25, 2019

I manually detached my disks roughly following this guide (https://docs.microsoft.com/en-us/azure/lab-services/devtest-lab-attach-detach-data-disk). And now all issues are resolved. Outstanding issues i recommend investigating:

1- failure to attach a disk, maybe multiple times, due to api errors like the above cause pvc to become stuck in an attached state

2- it should be more clearly documented that the XXX_SP_XXX apps that kuberentes clusters created from the portal auto generate should not be deleted.

3- the XXX_SP_XXX apps should be more clearly identified/marked when in use, i had many of them and could not tell which ones my cluster was actually using for RBAC and cluster operations.

also: as usual thank you for your help @andyzhangx !

@yvan yvan closed this as completed Apr 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants