Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow timers to last longer than 7 days #14

Open
cgillum opened this issue Jul 15, 2017 · 21 comments
Open

Allow timers to last longer than 7 days #14

cgillum opened this issue Jul 15, 2017 · 21 comments
Assignees
Labels

Comments

@cgillum
Copy link
Collaborator

@cgillum cgillum commented Jul 15, 2017

Issue

The Azure Storage provider uses invisible queue messages to implement durable timers. Queue messages have a maximum lifetime of 7 days, so durable timer messages cannot live longer than this.

Proposal

Allow durable timers to have an arbitrary lifetime, which could go as high as weeks or months.

Motivation

Real world scenarios often involve extended timeouts. For example, a late fee might be applied to a bill if it is not paid 30 days after it is due. A durable timer is a natural way to track this kind of expiration, and a 7 day limitation reduces the usefulness of the feature.

Design Notes

The fix would be specific to the Azure Storage provider extension in the Durable Task Framework (the Service Bus provider has no such limits). Azure Storage cannot support a message living longer than 7 days in the queue. To work around this, if an orchestrator creates a timer for 30 days, internally a message should be tagged with metadata indicating the intended receive time (e.g. 30 days from now) but it should be enqueued for only 72 hours (using a number small than 7 days is preferred to avoid potential data loss if the message cannot be immediately dequeued for some reason).

Once a message is received, the provider checks to see if a ScheduledReceiveTimeUtc metadata property exists and if it does, compares that time to the current time. If the ScheduledReceiveTimeUtc is within 5 minutes from the current time, the message can be processed. Otherwise, a new copy of the message should be enqueued for as many as another 72 hours (or less if the scheduled receive time is less than 72 hours away).

Drawbacks

  • If an orchestrator function is waiting on a timer for extended periods of time, then it will be periodically woken up (consuming CPU and memory) without having any real work to do.
  • A single timer event can result in multiple messages being enqueued and received in Azure Storage, which means the message ID cannot be used for end-to-end correlation. The extension already defines its own correlation ID for messages and end-to-end tracing relies on this, so it's not a significant problem, but it is something to be aware of in case someone attempts to change this correlation behavior or try to do correlation purely using Storage logs.
@cgillum cgillum added the enhancement label Jul 15, 2017
@cgillum cgillum added this to the Beta milestone Jul 15, 2017
@cgillum cgillum modified the milestone: Beta Sep 7, 2017
@tohling tohling self-assigned this Sep 26, 2017
@cgillum cgillum added the dtfx label Nov 11, 2017
@SimonLuckenuik

This comment has been minimized.

Copy link

@SimonLuckenuik SimonLuckenuik commented Dec 21, 2017

@cgillum is there any target date for this issue? Is there a durable task framework related issue I can monitor? I guess this is low priority considering that I can reschedule the message myself.

@cgillum

This comment has been minimized.

Copy link
Collaborator Author

@cgillum cgillum commented Dec 21, 2017

No specific date - we just need to get around to doing it. We'll definitely include it for GA, which is targeting Spring 2018, but ideally this fix will come much sooner than that.

Right now I'm keeping the durable task framework bugs logged here rather than in the DTFx repo since I'm the only consumer of the Azure Storage provider at this point.

@SimonLuckenuik

This comment has been minimized.

Copy link

@SimonLuckenuik SimonLuckenuik commented Jan 9, 2018

In the meantime, I am simply creating timer with a max value below the limitation and rescheduling until the target time is reached.

@vhe1

This comment has been minimized.

Copy link

@vhe1 vhe1 commented Mar 10, 2018

Why not just use service bus queues instead of storage queues?

@cgillum

This comment has been minimized.

Copy link
Collaborator Author

@cgillum cgillum commented Mar 10, 2018

We considered Service Bus at the beginning of this project since it's natively supported by the Durable Task Framework. The main problem was that the required features require a standard subscription ($10 USD/month). Azure Storage, on the other hand, is purely consumption-based billing, and is already part of every function app. We are considering allowing different storage providers in the future, but for now we're sticking with Azure Storage, in spite of its limitations.

@gled4er

This comment has been minimized.

Copy link
Collaborator

@gled4er gled4er commented Apr 5, 2018

Hello @cgillum ,

Do you think the information here can help us with this feature?

In the new version of Azure Storage .NET SDK we are able to set arbiratry expiration period as shown below:

//WindowsAzure.Storage 8.7
queue.AddMessage(message); // TTL is 7 days
queue.AddMessage(message, TimeSpan.MaxValue); // Throws The argument 'timeToLive' is larger than maximum of '7.00:00:00'

//WindowsAzure.Storage 9.0
queue.AddMessage(message); // TTL is 7 days
queue.AddMessage(message, TimeSpan.MaxValue); //Expiration Time is now 31.dec 9999

Currently we are on a lower version of WindowsAzure.Storage dependency.

Thank you!

@cgillum cgillum removed this from the General Availability (GA) milestone Apr 18, 2018
@SimonLuckenuik

This comment has been minimized.

Copy link

@SimonLuckenuik SimonLuckenuik commented May 10, 2018

@gled4er according to the release notes, they only support infinite lifetime, which is probably the only reason why they changed the API (Queues: Added support for infinite TTL on queue messages.).

@cgillum I see that has been removed from GA, any chance this will be added soon? Even if my workaround works, it's a pain to do, and not obvious for the developers that are getting started with Durable Functions.

@gled4er

This comment has been minimized.

Copy link
Collaborator

@gled4er gled4er commented May 10, 2018

Hello @SimonLuckenuik,

Thank you for the comment.

I haven't tried the new TTL options yet but based on this documentation page I think you can set any positive int and -1 value now. In my opinion infinite lifetime is only one possible option, not the only one allowed.

Actually the possible values for messagettl are very well described in the previously linked page:

Optional. Specifies the time-to-live interval for the message, in seconds. Prior to version 2017-07-29, the maximum time-to-live allowed is 7 days. For version 2017-07-29 or later, the maximum time-to-live can be any positive number, as well as -1 indicating that the message does not expire. If this parameter is omitted, the default time-to-live is 7 days.

So I still believe this will be the best way to implement timers for period longer than 7 days.

Thank you!

@cgillum

This comment has been minimized.

Copy link
Collaborator Author

@cgillum cgillum commented May 10, 2018

@SimonLuckenuik We had to punt this out of GA for schedule/resourcing reasons, unfortunately. We're still motivated to get this done soon, though - lots of people asking for it.

I'll take a look at the documented storage updates. Arbitrary lifetime seems like a pretty big deal! The only problem is that Azure Functions is fixed at a particular Storage SDK version, so I'm not sure if we'll be able to take advantage of it (yet)...

@cgillum cgillum added this to the Functions V2 GA milestone May 10, 2018
@gled4er

This comment has been minimized.

Copy link
Collaborator

@gled4er gled4er commented Jun 11, 2018

Hello @cgillum,

Did you have time to check if we will be able to use the latest Azure Storage Nuget package or we will need to come up with custom logic to support this feature?

Thank you!

@cgillum

This comment has been minimized.

Copy link
Collaborator Author

@cgillum cgillum commented Jun 11, 2018

Yes, I confirmed that Azure Functions v2 will be upgraded to a much more recent version of the Azure Storage SDK before it goes GA, so we will be able to take advantage of this feature once that happens.

@gled4er

This comment has been minimized.

Copy link
Collaborator

@gled4er gled4er commented Jun 11, 2018

Great!

@markheath

This comment has been minimized.

Copy link
Contributor

@markheath markheath commented Jul 7, 2018

Here's a quick hack to create an extension method that could be used to extend the capacity of the timer to an arbitrarily long time. Basically we create a Timer for the maximum period (72 hours as suggested above) and when we wake up, check if we need to keep waiting for longer by starting another CreateTimer:

    public static Task CreateTimerExtended(this DurableOrchestrationContext ctx, CancellationToken token, TimeSpan timeout)
    {
        var completeTimeout = ctx.CurrentUtcDateTime + timeout;
        var tcs = new TaskCompletionSource<int>();
        void WaitUntil()
        {
            var maxTimerDuration = TimeSpan.TimeSpan.FromHours(72);
            var timeoutAt = ctx.CurrentUtcDateTime + (timeout > maxTimerDuration ? maxTimerDuration : timeout);

            var timeoutTask = ctx.CreateTimer(timeoutAt, token);
            timeoutTask.ContinueWith(t =>
            {
                if (t.IsCanceled)
                    tcs.TrySetCanceled();
                else if (t.IsFaulted)
                    tcs.TrySetException(t.Exception);
                else if (ctx.CurrentUtcDateTime > completeTimeout)
                    tcs.TrySetResult(0);
                else
                    WaitUntil();
            }, TaskContinuationOptions.ExecuteSynchronously);
        }
        WaitUntil();
        
        return tcs.Task;
    }

I've done some very simple tests (setting a maximumTimerDuration of 1 minute is a nice easy way to test this) and it seems to work OK. I haven't tested cancellation support yet.

The downside of this approach is that the orchestrator keeps waking up, so for a very long wait (e.g. > 1 month), you'd start to build up a substantial event sourcing history.

One possible way to work around that is to create this extended timer as a sub-orchestration. That way, its just the sub-orchestration that has a growing event source history that doesn't pollute the history of the main orchestration. Here's how a sub-orchestration could implement a long wait

    [FunctionName("LongWait")]
    public static async Task LongWait(
        [OrchestrationTrigger] DurableOrchestrationContext ctx,
        TraceWriter log)
    {
        var timeout = ctx.GetInput<TimeSpan>();
        var completeTimeout = ctx.CurrentUtcDateTime + timeout;
        var maxTimerDuration = TimeSpan.FromHours(72);

        using (var cts = new CancellationTokenSource())
        {
            while (ctx.CurrentUtcDateTime < completeTimeout)
            {
                var timeoutAt = ctx.CurrentUtcDateTime + (timeout > maxTimerDuration ? maxTimerDuration : timeout);
                await ctx.CreateTimer(timeoutAt, cts.Token);
            }
        }
    }

You might even be able to use ContinueAsNew in this sub-orchestrator to eliminate the need for a growing event sourcing history altogether.

The downside of the sub-orchestrator approach is that there doesn't seem to be a way to pass a cancellation token to a sub-orchestration. But if there is a way to do that, I wonder if this is a quicker way to get this feature without needing to put metadata on the queue messages.

@kashimiz

This comment has been minimized.

Copy link
Member

@kashimiz kashimiz commented Nov 15, 2018

This limitation also applies to the overloads of WaitForExternalEvent that take a timeout parameter (1) (2).

@Expecho

This comment has been minimized.

Copy link

@Expecho Expecho commented Jun 13, 2019

Any estimation on when this issue will be resolved?

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Aug 2, 2019

Any update on this issue ?

@cgillum

This comment has been minimized.

Copy link
Collaborator Author

@cgillum cgillum commented Aug 2, 2019

We're tracking this for the final release of v2.0, which we hope to have released later this year. Until then, I suggest looking into the workarounds mentioned by Mark above.

@underscoreHao

This comment has been minimized.

Copy link

@underscoreHao underscoreHao commented Nov 4, 2019

Any update on this issue?

@cgillum

This comment has been minimized.

Copy link
Collaborator Author

@cgillum cgillum commented Nov 4, 2019

Unfortunately we had to punt this out past the initial v2.0 release due to time constraints. We're still looking into getting this done soon, however, since it's a common ask.

@cgillum cgillum added this to the v2.1.0 Release milestone Nov 4, 2019
ConnorMcMahon added a commit that referenced this issue Dec 11, 2019
In Functions V2, we now use version 9.x of the Azure Storage SDK, which
removed the maximum visibility delay on queues. We can now support
indefinite timers.

Resolves #14
@cgillum cgillum removed this from the v2.1.0 Release milestone Dec 11, 2019
@ConnorMcMahon

This comment has been minimized.

Copy link
Contributor

@ConnorMcMahon ConnorMcMahon commented Dec 13, 2019

I was hoping to address this with this PR, but unfortunately the improvements from the Azure Storage SDK in 8.x vs 9.x only improved TTL, not visibility delays.

I think the best approach for this work item is for us to internally handle timers in a way similar to @markheath's workaround. This unfortunately will no longer make it into 2.1 because of that.

@jonasgranlund2

This comment has been minimized.

Copy link

@jonasgranlund2 jonasgranlund2 commented Jan 26, 2020

Hi Chris and Connor, I would like to create long running Scheduled Durable Entity events with the ctx.SignalEntity method that take the dateTime overload for scheduling. Since it is the same 7 days max-time on those I was wondering if you have any recomended pattern? It would be great to create something generic but since the SignalEntity is fire and forget I dont succeed to come up with a good solution. I have tried Mark Hats solution with Extension method and it works gret, but in this scenario I would like it to be more light weight than to create a new orchstration instance per incoming event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

You can’t perform that action at this time.