NicReservedForAnotherVM Error leaves nics in MC Resource Group #92
Labels
area/networking
Issues or PRs related to networking
kind/bug
Categorizes issue or PR as related to a bug.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
Version
v0.2.0
Expected Behavior
When karpenter sends a VM POST Request to create a vm, and that post request fails, karpenter should always clean up the leftover resources from that failed attempt.
Actual Behavior
When karpenter sends a VM Post Request to create a vm, and that post request fails(Due to quota or various other issues), there is a race conditon between the arm representation of the vm and nic deletion.
When we attempt to issue a delete call for the network interface, we will get an error
NicReservedForAnotherVM
. This often occurs because the arm representation of the vm we tried to create isn't yet deleted when we issue the network interface deletion call.This results in the end delete call failing saying that the nic is reserved for another vm.
One can attempt to fix this by
Steps to Reproduce the Problem
make az-perftest-300, essentially you just need to scale up to a high volume.
Resource Specs and Logs
{"level":"ERROR","time":"2024-01-10T23:27:06.884Z","logger":"controller.nodeclaim.lifecycle","message":"Creating virtual machine "aks-default-bzzjc" failed: PUT https://management.azure.com/subscriptions//resourceGroups/MC_blah/providers/Microsoft.Compute/virtualMachines/aks-default-bzzjc\n--------------------------------------------------------------------------------\nRESPONSE 409: 409 Conflict\nERROR CODE: OperationNotAllowed\n--------------------------------------------------------------------------------\n{\n "error": {\n "code": "OperationNotAllowed",\n "message": "Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: uksouth, Current Limit: 100, Current Usage: 60, Additional Required: 48, (Minimum) New Limit Required: 108. Submit a request for Quota increase at by specifying parameters listed in the ‘Details’ section for deployment to succeed. Please read more about quota limits at https://docs.microsoft.com/en-us/azure/azure-supportability/regional-quota-requests\"\n }\n}\n--------------------------------------------------------------------------------\n","commit":"832597b-dirty","nodeclaim":"default-bzzjc","nodepool":"default"}
Community Note
The text was updated successfully, but these errors were encountered: