Per Function Queue message Visibility Timeout configuration #1040

fabiomaulo · 2017-02-28T17:23:27Z

The visibility timeout of a message could be configured via a specific attribute or a new property of the QueueTrigger attribute. I suspect that currently the timeout is not the default and there is no way to define a custom timeout. I mean something like EstimatedTimeToProcessMessage (an int with minutes or milliseconds).

Just a feature request.

mathewc · 2017-03-01T17:58:07Z

What problem are you having? What visibility timeout are you referring to exactly?

fabiomaulo · 2017-03-01T19:18:04Z

Hi Math.
I have messages that can be processed in ~30" and can fails max 2 times. Those messages have to be processed ASAP or fails ASAP. The problem come when a message fails, it seems that it will be "re-processed" ~10minutes after the first fault.
I would have something that allow me to define an EstimatedTimeToProcessMessage, or something that the SDK can learn by itself, to establish a more accurate visibility timeout.
When the EstimatedTimeToProcessMessage is defined the SDK can use a function of it (example EstimatedTimeToProcessMessage * 2) to define the default timeout for a specific queue.

brunoklein99 · 2017-03-07T23:16:54Z

I'm also looking into Azure Web Jobs for a new project and just checked this comment by @mathewc which states that the default queue visibility timeout is set at 10 minutes. This is a value highly dependent on project context and should be configurable.

mathewc · 2017-03-07T23:29:55Z

Note that this 10 minute timeout will only occur in rare cases, say if the host dies, etc. During regular processing, if an invocation fails, there is a different configurable timeout that is used. See the code here. You can configure that via JobHostConfiguration.Queues.VisibilityTimeout. I believe that is what you are looking for. In regular processing while the host remains up and running, there is no 10 minute delay.

We could also make that initial 10 minute timeout configurable if we wanted - do you require that?

brunoklein99 · 2017-03-07T23:44:19Z

Thank you, Mathew, for the rapid response.

The configuration you provided is enough for me. Although I don't NEED it, for my specific project, in case of the host dying, a lower timeout would be desirable. I think it's a valid feature for the SDK.

Thank you.

mathewc · 2017-03-07T23:51:36Z

@fabiomaulo can you confirm that this existing knob also meets your needs? Feel free to log a feature request for the other timeout config if it turns out you need that. But in all the years of this project, I haven't heard people having problems with that timeout.

Note that the new configuration knob I mentioned is new in 2.0.0 which we released last week. So upgrade if you need to.

fabiomaulo · 2017-03-08T15:24:18Z

Sorry for the delay...
Math, it doesn't.
I know the configurable timeout (for all queues managed inside the same worker) and even the possibility to use the IQueueStorageProcessorFactory to have specific configuration per Queue.
In fact I could use a specific implementation of IQueueStorageProcessorFactory to configure the specific QueueProcessor but... why implements n classes, where one (the factory) have to check by string-comparison (queue name), when queueName, storageAccountConnectionString and estimatedTime can be specified in the same line exactly where the message will be consumed ?

mathewc · 2017-03-09T02:25:58Z

Reopening. @fabiomaulo I want to be sure I understand exactly what you're asking for. You're saying that the initial timeout we use with the 10 minute delay is causing you issues? Again you should only see that delay in play if the host died unexpectedly, which should happen rarely. That timeout is here in the code. How specifically is this causing you issues in practice - are you really seeing 10 minute delays often?

fabiomaulo · 2017-03-09T11:14:24Z

@mathewc what happen when the job fail ? Which is the time between the first fail and the second dequeue ?
The message-process may fail more than one time (that is why we have the maxdequeuecount).

mathewc · 2017-03-10T20:50:17Z

When the job function fails, the aforementioned JobHostConfiguration.Queues.VisibilityTimeout governs, as I mentioned above. I think this is all you are looking for - its already there.

fabiomaulo · 2017-03-11T13:58:58Z

That is right but... JobHostConfiguration.Queues.VisibilityTimeout is for all queues managed in a WebJob (queueS).
Perhaps is a matter of philosophy, let me hypothesize to understand better:

a webjob (app) run inside a WebApp
a WebApp runs in the hw defined by the AppPlan and is invoiced by it's AppPlan
to have x WebApps each with y WebJobs where each WebJob has z Queue triggers all running in the same AppPlan has no impact in the cost.
So...
we can have a WebJob with unique configuration per unique queuetrigger.

If this is the philosophy, the unique VisibilityTimeout for all queues managed in a WebJob is acceptable even if it should be clear to everybody.

If the WebJob SDK let us work and group QueuesTriggers in the way we need (as so far) without create a WebJob project per each queue, we should have a more fine grained configuration per queue without implements "custom" QueueProcessor just to configure each.

That is my opinion.

christopheranderson · 2017-04-24T18:01:55Z

This makes sense, but it's pretty big. We'll need to see some more folks suggest this before we can justify tackling it over other features.

fabiomaulo · 2017-04-24T18:35:44Z

Ok, no problem.
Btw the code to implements it is already there...
https://github.com/Azure/azure-webjobs-sdk/blob/61aa42461696de855f0780aafa52ca386027f62e/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs

In the ctor the QueueProcessor copy the configuration in its state so each queueprocessor can work independently from others.
Even the QueueProcessorFactoryContext has all needed properties.
The matter is read all specific configuration in the same place where the name of the queue is... ;)

suhu · 2017-05-11T16:30:59Z

@mathewc I am having a similar problem. I think there are 2 settings.

My webjob runs for 10min. So I don't want the message to reappear in the queue after 5 min
If the web job function threw an exception, I would like the message to reappear in the queue quite soon.

I set config.Queues.VisibilityTimeout = new TimeSpan(0, 0, 15, 0);

But now when there is an exception, it takes 15min for the message to appear in the queue again.

How do I solve this problem?

gorillapower · 2017-11-08T14:02:06Z

I am also experiencing repeatable behaviour, whereby the code that is supposed to be 'renewing' the visibility timeout seems to not get executed. One outcome is that the queue message is processed twice. How? The original message is still in memory waiting to be processed and the same queue message becomes visible again on the queue.

This only seems to happen when i stress test my application and there is a backlog of thousands of queue messages. Im assuming the competition for resources is causing the 'renew' task to fail/not get executed, but i cant be 100% sure. Maybe the application is running out of threads?

When I increase the visibility timeout to 6 hours of the message (using a local built version of the SDK on https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/Listeners/QueueListener.cs#L79 ) this behaviour stops.

I think i saw something similar mentioned in a different thread. Is there a known work around to increase the default visibility timeout without implementing a custom code fix? Or perhaps another solution to this problem...like increase the processing power?

gunzip · 2017-12-25T23:04:26Z

why don't just let configure the visibilityTimeout through output bindings ?

ie.

await` ReleaseMessageAsync(message, result, message.VisibilityTimeout ? message.VisibilityTimeout : VisibilityTimeout, cancellationToken);

in

azure-webjobs-sdk/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs

Line 123 in bd5891d

    
           await ReleaseMessageAsync(message, result, VisibilityTimeout, cancellationToken);

this would let the user plug some custom delay strategy (ie. exponential backoff)

see also Azure/azure-functions-host#1465

nixa333 · 2018-06-05T22:36:46Z

@gorillapower I have the same problem. When we "attack" the queue with thousands of messages in short period of time, we start receiving Exceptions because the storage itself cannot handle the load, but once we've resolved that, now we're seeing function executions not ending, or waiting (while marked as "Never finished"), and messages being processed multiple times.
@mathewc @christopheranderson if the function hangs, and visibility timeout expires, thus returning the message into the queue, does the dequeue count change? How can we resolve this issue? It does happen rarely, under heavy loads, but it still happens. The function ends up idling in that weird state, not throwing exceptions and not succeeding, so the configurable VisibilityTimeout setting never kicks in. As @gorillapower said the problem would probably be resolved if we would be able to set the initial timeout to higher value.

mathewc · 2018-06-05T23:54:24Z

@gorillapower's comments above on host instances under extreme load resulting in background visibility renewal threads not being able to run is correct. We've seen this come up in other situations as well (e.g. Singleton logs which rely on background renewals of blob leases). If you're running into issues like this (e.g. you're maxing out CPU/memory, etc.) then you need to either scale up/out, or throttle your instance concurrency down using the the queue config settings (BatchSize/NewBatchThreshold).

@nixa333 Yes, when messages fail processing due to visibility timeout expiry, Azure Storage will increment the dequeue count the next time that message is fetched. You CAN set the initial visibility timeout to a higher value via JobHostQueuesConfiguration.VisibilityTimeout.

Anyhow, the issues that are being discussed now are not the same as the original issue that this item remains open for - the request to allow the visibility timeout to be declaratively configured per function, as opposed to the current host level knob that applies to all functions.

nixa333 · 2018-06-06T07:39:07Z

@mathewc I meant the initial 10 minute visibility delay, and this cannot be altered with the property you mentioned, as it only has effect on failed calls. I would like to alter this property and set it for example to 2-3 hours.

christopheranderson assigned mathewc Mar 7, 2017

christopheranderson added the Discuss label Mar 7, 2017

mathewc closed this as completed Mar 7, 2017

mathewc reopened this Mar 9, 2017

christopheranderson added the improvement label Apr 24, 2017

christopheranderson added this to the Backlog milestone Apr 24, 2017

mathewc changed the title ~~Queue message Visibility Timeout~~ Per Function Queue message Visibility Timeout configuration Jun 5, 2018

asalvo mentioned this issue Jun 19, 2018

Azure Functions not dequeing messages under load Azure/azure-functions-host#3022

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per Function Queue message Visibility Timeout configuration #1040

Per Function Queue message Visibility Timeout configuration #1040

fabiomaulo commented Feb 28, 2017

mathewc commented Mar 1, 2017

fabiomaulo commented Mar 1, 2017

brunoklein99 commented Mar 7, 2017

mathewc commented Mar 7, 2017 •

edited

brunoklein99 commented Mar 7, 2017

mathewc commented Mar 7, 2017 •

edited

fabiomaulo commented Mar 8, 2017

mathewc commented Mar 9, 2017

fabiomaulo commented Mar 9, 2017

mathewc commented Mar 10, 2017

fabiomaulo commented Mar 11, 2017

christopheranderson commented Apr 24, 2017

fabiomaulo commented Apr 24, 2017

suhu commented May 11, 2017

gorillapower commented Nov 8, 2017

gunzip commented Dec 25, 2017 •

edited

nixa333 commented Jun 5, 2018

mathewc commented Jun 5, 2018

nixa333 commented Jun 6, 2018

Per Function Queue message Visibility Timeout configuration #1040

Per Function Queue message Visibility Timeout configuration #1040

Comments

fabiomaulo commented Feb 28, 2017

mathewc commented Mar 1, 2017

fabiomaulo commented Mar 1, 2017

brunoklein99 commented Mar 7, 2017

mathewc commented Mar 7, 2017 • edited

brunoklein99 commented Mar 7, 2017

mathewc commented Mar 7, 2017 • edited

fabiomaulo commented Mar 8, 2017

mathewc commented Mar 9, 2017

fabiomaulo commented Mar 9, 2017

mathewc commented Mar 10, 2017

fabiomaulo commented Mar 11, 2017

christopheranderson commented Apr 24, 2017

fabiomaulo commented Apr 24, 2017

suhu commented May 11, 2017

gorillapower commented Nov 8, 2017

gunzip commented Dec 25, 2017 • edited

nixa333 commented Jun 5, 2018

mathewc commented Jun 5, 2018

nixa333 commented Jun 6, 2018

mathewc commented Mar 7, 2017 •

edited

mathewc commented Mar 7, 2017 •

edited

gunzip commented Dec 25, 2017 •

edited