-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous Web Job frozen and preventing further QueueTriggers #590
Comments
|
A bit more requested info: Web Jobs SDK: using Web Jobs SDK 1.0.6 JobHost setup: Processing Code: |
|
What is "logger" and how does it log? That's a bit of unknown code - it might be that there was no error, and your logger didn't write out the message. Where does it log to? Also, how do you guarantee timeouts occur in worker.DoWork? I strongly suspect that somewhere AFTER we invoke your job function it is hanging/never returning. The SDK does not make any assumptions currently about how long your function may need to run so doesn't enforce any timeout. So if your code hangs, the job hangs indefinitely. I'm considering adding a TimeoutAttribute (e.g [Timeout("1:00:00")] timeout after 1 hour) that allows you to opt-in to this behavior. We'd also have global knob on JobHostConfiguration that you can set. |
|
@mathewc - "logger" is a DI instance of NLog, it outputs to console as well as an integration with Raygun (an online error tracking system). The odd thing is that not even the initial log of "logger.Info("Processing message:" + queuedMessage);" was in the logs, which indicates to me that perhaps there was an error before the function could even fire? Inside DoWork, any async calls are with RestSharp, which has a default 30 second timeout. Having the TimeoutAttribute sounds like a good addition. |
|
@rustd @ThreeScreenStudios Ok, I've implemented TimeoutAttribute. Here's an example function that would hang for a day if [Timeout("00:00:10")]
public static async Task ProcessMessage(
[QueueTrigger("samples-input")] string message,
TextWriter log,
CancellationToken cancellationToken)
{
log.WriteLine("Begin ProcessMessage");
await Task.Delay(TimeSpan.FromDays(1), cancellationToken);
log.WriteLine("PRocessMessage complete");
}Notes:
@ThreeScreenStudios I'll also point out that the reason your function hung and wouldn't process any more messages is because you have |
|
had the same problem several times in the last days where triggered functions get stuck in the code below forever. I had a BatchSize of 32 and all of them got stuck after a while. The Timeout attribute is a great solution for that, exactly what I am looking for. public static async Task FtpToBlob(
[QueueTrigger("ftp-download-file")] FtpToAzureBlobArgs ftpToAzureBlobArgs,
string Filename,
string FtpFolder,
string SomeId,
string CloudDir,
[Blob("mycontainer/{CloudDir}/{SomeId}/{FileName}")] ICloudBlob output,
TextWriter log)
{
try
{
var uri = new Uri($"ftp://ftp.example.com/Foo/Bar/{FtpFolder}/{Filename}");
FtpWebRequest request = (FtpWebRequest)WebRequest.Create(uri);
request.Method = WebRequestMethods.Ftp.DownloadFile;
request.Credentials = new NetworkCredential(FtpUser, FtpPass);
FtpWebResponse response = (FtpWebResponse)request.GetResponse();
await output.UploadFromStreamAsync(response.GetResponseStream());
await log.WriteLineAsync("Downloaded: " + uri.ToString());
}
catch (Exception ex)
{
await log.WriteLineAsync(ex.StackTrace);
}
await log.WriteLineAsync("Finished");
} |
|
@mathewc - ah thanks for pointing out the batch size issue - is there any guidance on how to choose an optimal batch size? Also thanks for putting the TimeoutAttribute, I think that will be quite helpful for many folks. |
|
@agnauck If all of your functions are getting stuck after a while, that indicates a problem in your code. To use the new TimeoutAttribute, you'll update your method signature to take the CancellationToken, and should then pass that to other async operations you initiate. No there isn't a build out yet - I'll get one out today (on our myget feed) and let you guys know. @ThreeScreenStudios Well, the defaults are designed to be optimal (default is 16, max is 32). I was wondering why you dialed it back to 1. |
|
@mathewc the code is posted above is all the code I have in this WebJob. I will add the CancellationToken as suggested. |
|
Regarding batchSize This limit applies separately to each function that has a QueueTrigger attribute. If you don't want parallel execution for messages received on one queue, set the batch size to 1. |
|
Ok, the TimeoutAttribute feature is in. Please see the release notes for details, and for a link to a sample. @agnauck @ThreeScreenStudios Can you guys please give this a try and verify that it meets your needs? Thanks. You can pull the latest bits from our myget feed (instructions here). Version 1.1.0-beta1-10149 includes the changes. |
|
works perfect. Thanks, this is a great new feature and very helpful for us. |
|
@agnauck @ThreeScreenStudios @mathewc Hi guys. I kind of have the same situation where the web job is getting hung on a single process for hours and even with extensive logging, I couldn't log anything. No exceptions or errors too. It's like the thread doesn't reach the code itself and it hangs indefinitely. I am running a single instance continuous web job with a restart time of 2 seconds. I also have similar continuous web jobs that are running fine. I have tried to restart it, rename it, delete it, redeploy it, but nothing fixes the issue. Rechecked the code multiple times, the code is running fine locally without any issues. What are all the possible reasons for this to happen? Can anyone help with this? |
|
Hi, |
|
Got same issue two days ago, WebJob stuck to process messages from queue. It just stuck with message: Never Finished. The underlying code does database calls and other API calls, but it was unchanged for a few months and bad thing that this is happened in production without any notifications or warning or failures. BatchSize = 16, MaxDequeueCount = 2, MaxPollingInterval = 3 seconds. |
|
My concern is that the "hung" part of the job might be a single unit that it won't throw exception when the How we can stop the webjob process execution in the same way we can do from Azure portal after a specific timeout? Is there a way to kill the specific queue message process without the usage of |
Hi,
I have an continous Web Job that executes with a QueueTrigger. Normally if there is an exception or any problem, the job will fail, the queued message will go back into the queue, and the job will try to reprocess (until it finishes or fails 3 times and goes into the poison queue).
However, I noticed on 10/26/2015 that no messages had processed in the past day or so. I investigated on the Azure Portal, and saw that the webjob still had a "running" status. I clicked into the web job, and discovered that the current execution was still going, and had been executing for the past 2 days. For some reason, the job did not time out or quit, and there were no further QueueTriggers even though there were multiple messages backed up in the Queue.
There were also no logs or exceptions/errors thrown (I have a decent amount of logging and exception handling in the method).
I aborted the current job execution via the Azure Portal, and once that happened, all of the backed up queue messages began processing immediately.
I can provide account details via email if needed (woot@threescreenstudios.com).
The text was updated successfully, but these errors were encountered: