Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per Function Queue message Visibility Timeout configuration #1040

Open
fabiomaulo opened this issue Feb 28, 2017 · 19 comments
Open

Per Function Queue message Visibility Timeout configuration #1040

fabiomaulo opened this issue Feb 28, 2017 · 19 comments
Assignees
Milestone

Comments

@fabiomaulo
Copy link

The visibility timeout of a message could be configured via a specific attribute or a new property of the QueueTrigger attribute. I suspect that currently the timeout is not the default and there is no way to define a custom timeout. I mean something like EstimatedTimeToProcessMessage (an int with minutes or milliseconds).

Just a feature request.

@mathewc
Copy link
Member

mathewc commented Mar 1, 2017

What problem are you having? What visibility timeout are you referring to exactly?

@fabiomaulo
Copy link
Author

Hi Math.
I have messages that can be processed in ~30" and can fails max 2 times. Those messages have to be processed ASAP or fails ASAP. The problem come when a message fails, it seems that it will be "re-processed" ~10minutes after the first fault.
I would have something that allow me to define an EstimatedTimeToProcessMessage, or something that the SDK can learn by itself, to establish a more accurate visibility timeout.
When the EstimatedTimeToProcessMessage is defined the SDK can use a function of it (example EstimatedTimeToProcessMessage * 2) to define the default timeout for a specific queue.

@brunoklein99
Copy link

I'm also looking into Azure Web Jobs for a new project and just checked this comment by @mathewc which states that the default queue visibility timeout is set at 10 minutes. This is a value highly dependent on project context and should be configurable.

@mathewc
Copy link
Member

mathewc commented Mar 7, 2017

Note that this 10 minute timeout will only occur in rare cases, say if the host dies, etc. During regular processing, if an invocation fails, there is a different configurable timeout that is used. See the code here. You can configure that via JobHostConfiguration.Queues.VisibilityTimeout. I believe that is what you are looking for. In regular processing while the host remains up and running, there is no 10 minute delay.

We could also make that initial 10 minute timeout configurable if we wanted - do you require that?

@brunoklein99
Copy link

Thank you, Mathew, for the rapid response.

The configuration you provided is enough for me. Although I don't NEED it, for my specific project, in case of the host dying, a lower timeout would be desirable. I think it's a valid feature for the SDK.

Thank you.

@mathewc
Copy link
Member

mathewc commented Mar 7, 2017

@fabiomaulo can you confirm that this existing knob also meets your needs? Feel free to log a feature request for the other timeout config if it turns out you need that. But in all the years of this project, I haven't heard people having problems with that timeout.

Note that the new configuration knob I mentioned is new in 2.0.0 which we released last week. So upgrade if you need to.

@mathewc mathewc closed this as completed Mar 7, 2017
@fabiomaulo
Copy link
Author

Sorry for the delay...
Math, it doesn't.
I know the configurable timeout (for all queues managed inside the same worker) and even the possibility to use the IQueueStorageProcessorFactory to have specific configuration per Queue.
In fact I could use a specific implementation of IQueueStorageProcessorFactory to configure the specific QueueProcessor but... why implements n classes, where one (the factory) have to check by string-comparison (queue name), when queueName, storageAccountConnectionString and estimatedTime can be specified in the same line exactly where the message will be consumed ?

@mathewc
Copy link
Member

mathewc commented Mar 9, 2017

Reopening. @fabiomaulo I want to be sure I understand exactly what you're asking for. You're saying that the initial timeout we use with the 10 minute delay is causing you issues? Again you should only see that delay in play if the host died unexpectedly, which should happen rarely. That timeout is here in the code. How specifically is this causing you issues in practice - are you really seeing 10 minute delays often?

@mathewc mathewc reopened this Mar 9, 2017
@fabiomaulo
Copy link
Author

@mathewc what happen when the job fail ? Which is the time between the first fail and the second dequeue ?
The message-process may fail more than one time (that is why we have the maxdequeuecount).

@mathewc
Copy link
Member

mathewc commented Mar 10, 2017

When the job function fails, the aforementioned JobHostConfiguration.Queues.VisibilityTimeout governs, as I mentioned above. I think this is all you are looking for - its already there.

@fabiomaulo
Copy link
Author

That is right but... JobHostConfiguration.Queues.VisibilityTimeout is for all queues managed in a WebJob (queueS).
Perhaps is a matter of philosophy, let me hypothesize to understand better:

  • a webjob (app) run inside a WebApp
  • a WebApp runs in the hw defined by the AppPlan and is invoiced by it's AppPlan
  • to have x WebApps each with y WebJobs where each WebJob has z Queue triggers all running in the same AppPlan has no impact in the cost.
    So...
    we can have a WebJob with unique configuration per unique queuetrigger.

If this is the philosophy, the unique VisibilityTimeout for all queues managed in a WebJob is acceptable even if it should be clear to everybody.

If the WebJob SDK let us work and group QueuesTriggers in the way we need (as so far) without create a WebJob project per each queue, we should have a more fine grained configuration per queue without implements "custom" QueueProcessor just to configure each.

That is my opinion.

@christopheranderson
Copy link
Contributor

This makes sense, but it's pretty big. We'll need to see some more folks suggest this before we can justify tackling it over other features.

@fabiomaulo
Copy link
Author

Ok, no problem.
Btw the code to implements it is already there...
https://github.com/Azure/azure-webjobs-sdk/blob/61aa42461696de855f0780aafa52ca386027f62e/src/Microsoft.Azure.WebJobs.Host/Queues/QueueProcessor.cs

In the ctor the QueueProcessor copy the configuration in its state so each queueprocessor can work independently from others.
Even the QueueProcessorFactoryContext has all needed properties.
The matter is read all specific configuration in the same place where the name of the queue is... ;)

@suhu
Copy link

suhu commented May 11, 2017

@mathewc I am having a similar problem. I think there are 2 settings.

  1. My webjob runs for 10min. So I don't want the message to reappear in the queue after 5 min
  2. If the web job function threw an exception, I would like the message to reappear in the queue quite soon.

I set config.Queues.VisibilityTimeout = new TimeSpan(0, 0, 15, 0);

But now when there is an exception, it takes 15min for the message to appear in the queue again.

How do I solve this problem?

@gorillapower
Copy link

I am also experiencing repeatable behaviour, whereby the code that is supposed to be 'renewing' the visibility timeout seems to not get executed. One outcome is that the queue message is processed twice. How? The original message is still in memory waiting to be processed and the same queue message becomes visible again on the queue.

This only seems to happen when i stress test my application and there is a backlog of thousands of queue messages. Im assuming the competition for resources is causing the 'renew' task to fail/not get executed, but i cant be 100% sure. Maybe the application is running out of threads?

When I increase the visibility timeout to 6 hours of the message (using a local built version of the SDK on https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Queues/Listeners/QueueListener.cs#L79 ) this behaviour stops.

I think i saw something similar mentioned in a different thread. Is there a known work around to increase the default visibility timeout without implementing a custom code fix? Or perhaps another solution to this problem...like increase the processing power?

@gunzip
Copy link

gunzip commented Dec 25, 2017

why don't just let configure the visibilityTimeout through output bindings ?

ie.

await` ReleaseMessageAsync(message, result, message.VisibilityTimeout ? message.VisibilityTimeout : VisibilityTimeout, cancellationToken);

in

await ReleaseMessageAsync(message, result, VisibilityTimeout, cancellationToken);

this would let the user plug some custom delay strategy (ie. exponential backoff)

see also Azure/azure-functions-host#1465

@nixa333
Copy link

nixa333 commented Jun 5, 2018

@gorillapower I have the same problem. When we "attack" the queue with thousands of messages in short period of time, we start receiving Exceptions because the storage itself cannot handle the load, but once we've resolved that, now we're seeing function executions not ending, or waiting (while marked as "Never finished"), and messages being processed multiple times.
@mathewc @christopheranderson if the function hangs, and visibility timeout expires, thus returning the message into the queue, does the dequeue count change? How can we resolve this issue? It does happen rarely, under heavy loads, but it still happens. The function ends up idling in that weird state, not throwing exceptions and not succeeding, so the configurable VisibilityTimeout setting never kicks in. As @gorillapower said the problem would probably be resolved if we would be able to set the initial timeout to higher value.

@mathewc
Copy link
Member

mathewc commented Jun 5, 2018

@gorillapower's comments above on host instances under extreme load resulting in background visibility renewal threads not being able to run is correct. We've seen this come up in other situations as well (e.g. Singleton logs which rely on background renewals of blob leases). If you're running into issues like this (e.g. you're maxing out CPU/memory, etc.) then you need to either scale up/out, or throttle your instance concurrency down using the the queue config settings (BatchSize/NewBatchThreshold).

@nixa333 Yes, when messages fail processing due to visibility timeout expiry, Azure Storage will increment the dequeue count the next time that message is fetched. You CAN set the initial visibility timeout to a higher value via JobHostQueuesConfiguration.VisibilityTimeout.

Anyhow, the issues that are being discussed now are not the same as the original issue that this item remains open for - the request to allow the visibility timeout to be declaratively configured per function, as opposed to the current host level knob that applies to all functions.

@mathewc mathewc changed the title Queue message Visibility Timeout Per Function Queue message Visibility Timeout configuration Jun 5, 2018
@nixa333
Copy link

nixa333 commented Jun 6, 2018

@mathewc I meant the initial 10 minute visibility delay, and this cannot be altered with the property you mentioned, as it only has effect on failed calls. I would like to alter this property and set it for example to 2-3 hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants