-
Notifications
You must be signed in to change notification settings - Fork 21
Application stuck in rolling upgrade state #1279
Comments
PreUpgradeSafetyCheck - https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade-troubleshooting
EnsureAvailability: https://docs.microsoft.com/en-us/rest/api/servicefabric/sfclient-model-ensureavailabilitysafetycheck
More on that safetycheck kind: https://docs.microsoft.com/en-us/previous-versions/azure/reference/mt280061(v=azure.100)
I'm not aware of a way to get past this, other than bringing down the process, which holds that replica manually. You could try this API if you are ok loosing the availbility: https://docs.microsoft.com/en-us/rest/api/servicefabric/sfclient-api-restartdeployedcodepackage - aka kill that process.
Let me know if this helps. |
@meanin , regarding the unhealthy evaluations on the naming service - they say that the service deletion is taking more than 30 minutes. There is nothing wrong with naming service, it's just the entity this issue surfaces on. So you start application upgrade, this fails on safety checks and while the upgrade is pending, you issue delete app and that is stuck (needs node reboot). @motanv , can you look more into this? FYI, deleting the app during upgrade is probably not the best mitigation. You can try rollback or you can change UpgradeReplicaSetCheckTimeout - since you delete the app I understand that you are OK with losing state. |
@mikkelhegn I will try this way next time (hope there won't be next time :) ). I was digging into deployment parameters also, but we were in a hurry. @oanapl It happened on a development cluster, so it is not a big deal (node reboot). Furthermore, we do not have stateful services yet. Again, we were in a hurry, just before a business demo, so I took any opportunity to get the cluster in a valid state. Next time I will do this in a proper manner, with rollback first and then redeploy the application. |
Hi all,
after a new application version deployment failed, it stuck in a rolling upgrade state. To be able to work on this application, I decided to remove it from a cluster through an explorer. Service Fabric is unable to delete this application for some reason.
30 minutes later, NamingService shows unhealthy evaluations:
Seems that AODeleteService is not able to start.
I was trying to investigate this from a PowerShell level also.
Application state:
Update-ServiceFabricApplicationUpgrade results timeout as well.
Basically, I am not able to remove broken application without node restart.
The text was updated successfully, but these errors were encountered: