Tasks stuck in cancelling if connection is lost to the server #3388

droyad · 2017-04-06T11:03:15Z

If the server cannot connect to the database when updating a task's status to Finished, the following message is logged: Unable to mark task ServerTasks-206 as complete

When the server regains access to the database, the task appears to be still running, but the underlying task thread has finished.

When the Cancel button is clicked, the task goes into Cancelling state, which prompts each server node to cancel the task thread. Since it can't find one with that task id, the task never gets updated to `Cancelled.

The trick is that each server node does not know whether the thread was running on itself or on another node (in a HA config).

We should detect whether the node owned the task, and if so, move it to cancelled.

Also we should see if we can try a bit harder to set the final status.

The text was updated successfully, but these errors were encountered:

droyad · 2017-04-06T11:42:16Z

Should error 976 (SELECT * FROM SYS.MESSAGES Where message_id = 976):

The target database, '%.*ls', is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group.  For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.

be added to the SqlDatabaseTransientErrorDetectionStrategy?https://github.com/OctopusDeploy/Nevermore/blob/8bc8ff97a13d4f7081df7478a3082b40ebca6b6c/source/Nevermore/Transient/SqlDatabaseTransientErrorDetectionStrategy.cs#L14?

droyad · 2017-04-07T05:34:52Z

Previously if the task completion failed, the task would be left in the running state and removed from the running tasks dictionary.

Now the task is not removed from the dictionary until it's state has been successfully updated to a complete status. If there is an error during completion, the task is marked as such and completion is retried during the task cancellation/cleanup process.

Also the error now shows up in the logs if SQL is unavailable.

Example of a task that failed to complete, but then managed to complete later:
image

octoreleasebot · 2017-04-10T02:18:25Z

Release Note: Tasks now no longer get stuck in running or cancellation state if there is a intermittent database connection problem

lock · 2018-11-24T23:13:18Z

This thread has been automatically locked since there has not been any recent activity after it was closed. If you think you've found a related issue, please contact our support team so we can triage your issue, and make sure it's handled appropriately.

droyad added the area/execution label Apr 6, 2017

droyad closed this as completed Apr 10, 2017

octoreleasebot assigned droyad Apr 10, 2017

octoreleasebot added this to the 3.12.2 milestone Apr 10, 2017

lock bot locked as resolved and limited conversation to collaborators Nov 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks stuck in cancelling if connection is lost to the server #3388

Tasks stuck in cancelling if connection is lost to the server #3388

droyad commented Apr 6, 2017 •

edited

droyad commented Apr 6, 2017 •

edited

droyad commented Apr 7, 2017 •

edited

octoreleasebot commented Apr 10, 2017

lock bot commented Nov 24, 2018

Tasks stuck in cancelling if connection is lost to the server #3388

Tasks stuck in cancelling if connection is lost to the server #3388

Comments

droyad commented Apr 6, 2017 • edited

droyad commented Apr 6, 2017 • edited

droyad commented Apr 7, 2017 • edited

octoreleasebot commented Apr 10, 2017

lock bot commented Nov 24, 2018

droyad commented Apr 6, 2017 •

edited

droyad commented Apr 6, 2017 •

edited

droyad commented Apr 7, 2017 •

edited