title | description | ms.date | editor | ms.reviewer | ms.service | ms.custom |
---|---|---|---|---|---|---|
Troubleshoot UpgradeFailed errors due to eviction failures caused by PDBs |
Learn how to troubleshoot UpgradeFailed errors due to eviction failures caused by Pod Disruption Budgets when you try to upgrade an Azure Kubernetes Service cluster. |
12/21/2023 |
v-jsitser |
chiragpa, v-leedennis, v-weizhu |
azure-kubernetes-service |
sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool) |
This article discusses how to identify and resolve UpgradeFailed errors due to eviction failures caused by Pod Disruption Budgets (PDBs) that occur when you try to upgrade an Azure Kubernetes Service (AKS) cluster.
This article requires Azure CLI version 2.0.65 or a later version. To find the version number, run az --version
. If you have to install or upgrade Azure CLI, see How to install the Azure CLI.
For more detailed information about the upgrade process, see the "Upgrade an AKS cluster" section in Upgrade an Azure Kubernetes Service (AKS) cluster.
An AKS cluster upgrade operation fails with the following error message:
Code: UpgradeFailed
Message: Drain node <node-name> failed when evicting pod <pod-name>. Eviction failed with Too many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. Seehttp://aka.ms/aks/debugdrainfailures
. Original error: API call to Kubernetes API Server failed.
This error might occur if a pod is protected by the Pod Disruption Budget (PDB) policy. In this situation, the pod resists being drained.
To test this situation, run kubectl get pdb -A
, and then check the Allowed Disruption value. The value should be 1 or greater. For more information, see Plan for availability using pod disruption budgets.
If the Allowed Disruption value is 0, the node drain will fail during the upgrade process.
To resolve this issue, use one of the following solutions.
- Adjust the PDB to enable pod draining. Generally, The allowed disruption is controlled by the
Min Available / Max unavailable
orRunning pods / Replicas
parameter. You can modify theMin Available / Max unavailable
parameter at the PDB level or increase the number ofRunning pods / Replicas
to push the Allowed Disruption value to 1 or greater. - Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
- Take a backup of the PDB
kubectl get pdb <pdb-name> -n <pdb-namespace> -o yaml > pdb_backup.yaml
, and then delete the PDBkubectl delete pdb <pdb-name> -n /<pdb-namespace>
. After the upgrade is finished, you can redeploy the PDBkubectl apply -f pdb_backup.yaml
. - Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
-
Delete the pods that can't be drained.
[!NOTE] If the pods were created by a deployment or StatefulSet, they'll be controlled by a ReplicaSet. If that's the case, you might have to delete the deployment or StatefulSet. Before you do that, we recommend that you make a backup:
kubectl get <kubernetes-object> <name> -n <namespace> -o yaml > backup.yaml
. -
Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
[!INCLUDE Azure Help Support]