Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrent task with expiration > 7 days fails but continually retries without any indication to the user #1006

Open
papadeltasierra opened this issue Nov 9, 2023 · 1 comment

Comments

@papadeltasierra
Copy link

We wrote a recurring orchestrator, recurring every 15 minutes, and wanted it to run for a long time. We set the expiration to 15days and this seemed to work but... We later discovered that the first iteration of the orchestrator happened but subsequent iterations did not and there was no indication of an error back to our code and no obvious logs. We were eventually able to find that indicated that the DTF failing to run the orchestration and then retrying, and DTF continued to do this every 10 minutes, forever!

Having debugged this (against tagged release durabletask.azurestorage-v1.13.6) we have traced the issue to this Exception.

System.ArgumentOutOfRangeException: The argument 'initialVisibilityDelay' is larger than maximum of '7.00:00:00' (Parameter 'initialVisibilityDelay')
   at Microsoft.WindowsAzure.Storage.Core.Util.CommonUtility.AssertInBounds[T](String paramName, T val, T min, T max)
   at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.AddMessageImpl(CloudQueueMessage message, Nullable`1 timeToLive, Nullable`1 initialVisibilityDelay, QueueRequestOptions options)
   at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.AddMessageAsync(CloudQueueMessage message, Nullable`1 timeToLive, Nullable`1 initialVisibilityDelay, QueueRequestOptions options, OperationContext operationContext, CancellationToken cancellationToken)
   at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.AddMessageAsync(CloudQueueMessage message, Nullable`1 timeToLive, Nullable`1 initialVisibilityDelay, QueueRequestOptions options, OperationContext operationContext)
   at DurableTask.AzureStorage.Storage.Queue.<>c__DisplayClass12_0.<AddMessageAsync>b__0(OperationContext context, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/Storage/Queue.cs:line 55
   at DurableTask.AzureStorage.Storage.AzureStorageClient.WrapFunctionWithReturnType(Func`3 storageRequest, OperationContext context, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/Storage/AzureStorageClient.cs:line 159
   at DurableTask.AzureStorage.TimeoutHandler.ExecuteWithTimeout[T](String operationName, String account, AzureStorageOrchestrationServiceSettings settings, Func`3 operation, AzureStorageOrchestrationServiceStats stats, String clientRequestId) in /_/src/DurableTask.AzureStorage/TimeoutHandler.cs:line 133
   at DurableTask.AzureStorage.Storage.AzureStorageClient.MakeStorageRequest[T](Func`3 storageRequest, String accountName, String operationName, String clientRequestId, Boolean force) in /_/src/DurableTask.AzureStorage/Storage/AzureStorageClient.cs:line 150
   at DurableTask.AzureStorage.Storage.Queue.AddMessageAsync(QueueMessage queueMessage, Nullable`1 visibilityDelay, Nullable`1 clientRequestId) in /_/src/DurableTask.AzureStorage/Storage/Queue.cs:line 54
   at DurableTask.AzureStorage.Messaging.TaskHubQueue.AddMessageAsync(TaskMessage taskMessage, OrchestrationInstance sourceInstance, SessionBase session) in /_/src/DurableTask.AzureStorage/Messaging/TaskHubQueue.cs:line 129
   at DurableTask.AzureStorage.Utils.InvokeThrottledAction[T](T item, Func`2 action, SemaphoreSlim semaphore) in /_/src/DurableTask.AzureStorage/Utils.cs:line 81
   at DurableTask.AzureStorage.Utils.ParallelForEachAsync[T](IList`1 items, Int32 maxConcurrency, Func`2 action) in /_/src/DurableTask.AzureStorage/Utils.cs:line 72
   at DurableTask.AzureStorage.AzureStorageOrchestrationService.CommitOutboundQueueMessages(OrchestrationSession session, IList`1 outboundMessages, IList`1 orchestratorMessages, IList`1 timerMessages, TaskMessage continuedAsNewMessage) in /_/src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs:line 1251
   at DurableTask.AzureStorage.AzureStorageOrchestrationService.CompleteTaskOrchestrationWorkItemAsync(TaskOrchestrationWorkItem workItem, OrchestrationRuntimeState newOrchestrationRuntimeState, IList`1 outboundMessages, IList`1 orchestratorMessages, IList`1 timerMessages, TaskMessage continuedAsNewMessage, OrchestrationState orchestrationState) in /_/src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs:line 1037
   at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 578
   at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemSessionAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 194
   at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 373 | formatter=TaskOrchestrationDispatcher-da755bb068694a5d8c860100a5800fdd-0: Unhandled exception with work item 'f810cf8a-5e75-404d-95db-f84a07881dc3:DummyTaskOrchestration:V1': System.ArgumentOutOfRangeException: The argument 'initialVisibilityDelay' is larger than maximum of '7.00:00:00' (Parameter 'initialVisibilityDelay')
   at Microsoft.WindowsAzure.Storage.Core.Util.CommonUtility.AssertInBounds[T](String paramName, T val, T min, T max)
   at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.AddMessageImpl(CloudQueueMessage message, Nullable`1 timeToLive, Nullable`1 initialVisibilityDelay, QueueRequestOptions options)
   at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.AddMessageAsync(CloudQueueMessage message, Nullable`1 timeToLive, Nullable`1 initialVisibilityDelay, QueueRequestOptions options, OperationContext operationContext, CancellationToken cancellationToken)
   at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.AddMessageAsync(CloudQueueMessage message, Nullable`1 timeToLive, Nullable`1 initialVisibilityDelay, QueueRequestOptions options, OperationContext operationContext)
   at DurableTask.AzureStorage.Storage.Queue.<>c__DisplayClass12_0.<AddMessageAsync>b__0(OperationContext context, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/Storage/Queue.cs:line 55
   at DurableTask.AzureStorage.Storage.AzureStorageClient.WrapFunctionWithReturnType(Func`3 storageRequest, OperationContext context, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/Storage/AzureStorageClient.cs:line 159
   at DurableTask.AzureStorage.TimeoutHandler.ExecuteWithTimeout[T](String operationName, String account, AzureStorageOrchestrationServiceSettings settings, Func`3 operation, AzureStorageOrchestrationServiceStats stats, String clientRequestId) in /_/src/DurableTask.AzureStorage/TimeoutHandler.cs:line 133
   at DurableTask.AzureStorage.Storage.AzureStorageClient.MakeStorageRequest[T](Func`3 storageRequest, String accountName, String operationName, String clientRequestId, Boolean force) in /_/src/DurableTask.AzureStorage/Storage/AzureStorageClient.cs:line 150
   at DurableTask.AzureStorage.Storage.Queue.AddMessageAsync(QueueMessage queueMessage, Nullable`1 visibilityDelay, Nullable`1 clientRequestId) in /_/src/DurableTask.AzureStorage/Storage/Queue.cs:line 54
   at DurableTask.AzureStorage.Messaging.TaskHubQueue.AddMessageAsync(TaskMessage taskMessage, OrchestrationInstance sourceInstance, SessionBase session) in /_/src/DurableTask.AzureStorage/Messaging/TaskHubQueue.cs:line 129
   at DurableTask.AzureStorage.Utils.InvokeThrottledAction[T](T item, Func`2 action, SemaphoreSlim semaphore) in /_/src/DurableTask.AzureStorage/Utils.cs:line 81
   at DurableTask.AzureStorage.Utils.ParallelForEachAsync[T](IList`1 items, Int32 maxConcurrency, Func`2 action) in /_/src/DurableTask.AzureStorage/Utils.cs:line 72
   at DurableTask.AzureStorage.AzureStorageOrchestrationService.CommitOutboundQueueMessages(OrchestrationSession session, IList`1 outboundMessages, IList`1 orchestratorMessages, IList`1 timerMessages, TaskMessage continuedAsNewMessage) in /_/src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs:line 1251
   at DurableTask.AzureStorage.AzureStorageOrchestrationService.CompleteTaskOrchestrationWorkItemAsync(TaskOrchestrationWorkItem workItem, OrchestrationRuntimeState newOrchestrationRuntimeState, IList`1 outboundMessages, IList`1 orchestratorMessages, IList`1 timerMessages, TaskMessage continuedAsNewMessage, OrchestrationState orchestrationState) in /_/src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs:line 1037
   at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 578
   at DurableTask.Core.TaskOrchestrationDispatcher.OnProcessWorkItemSessionAsync(TaskOrchestrationWorkItem workItem) in /_/src/DurableTask.Core/TaskOrchestrationDispatcher.cs:line 194
   at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 373

The call to CloudQueue.AddMessageAsync Method, https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.storage.queue.cloudqueue.addmessageasync?view=azure-dotnet-legacy#definition, seems to have an undocumented limit on the initialVisibilityDelay of 7 days. Setting <=7 days works, setting anything more fails with the error above.

@papadeltasierra
Copy link
Author

We traced this back through the DTF code (current code seems similar to the tagged release that we were using) and can suggest at least three possible solutions.

  1. In src\DurableTask.AzureStorage\Storage\AzureStorageClient.cs, the code already catches StorageException but does no special processing for ArgumentOutOfRangeException. It is assumed that the DurableTaskStorageException would flow back to the user so perhaps the catch statement could be extended. However it would also be useful if the error could map from the Azure storage information (which the user might know little about) to the actual DTF field that the user has set incorrectly and needs to change.
  2. Allow the ArgumentOutOfRangeException to flow back up the stack but now consider such exceptions as FATAL (ifFatal in src\DurableTask.Core\Common\Utils.cs) so that src\DurableTask.Core\WorkItemDispatcher.cs, line 405, will not just keep retrying the orchestration/task. The issue with this is that the ArgumentOutOfRangeException returned to the user will again indicate a field that the user knows nothing about (initialVisibilityDelay).
  3. Implement logic that handles long expiration times of >7 days by submitting expirations of 7 days and then repeating as the 7-day expiration occurs until the required long expiration has completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant