Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure webjob not appearing to respect MaxDequeueCount property #1045

Closed
mflps opened this issue Mar 7, 2017 · 17 comments
Closed

Azure webjob not appearing to respect MaxDequeueCount property #1045

mflps opened this issue Mar 7, 2017 · 17 comments
Labels
Milestone

Comments

@mflps
Copy link

mflps commented Mar 7, 2017

Repro steps

We are having the same issue as described here:

http://stackoverflow.com/questions/42260068/azure-webjob-not-appearing-to-respect-maxdequeuecount-property

We have a function on a queueu, but the webjob is not respecting the maxdequeuecount property of 5.

It's running for days now....

Expected behavior

The webjob function to respect the maxdequeuecount property

Actual behavior

Webjob is not respecting the maxdequeuecount property of 5. In the screenshot you can see it has been running for 142 times...

image

Known workarounds

Don't know....

Related information

Provide any related information

  • Package version

Microsoft.Azure.Webjobs v2.0.0

  • Links to source:

http://stackoverflow.com/questions/42260068/azure-webjob-not-appearing-to-respect-maxdequeuecount-property

@christopheranderson
Copy link
Contributor

@soninaren - Assigning to Naren to investigate

@mathewc
Copy link
Member

mathewc commented Mar 7, 2017

Here is a pointer to the code our logic for handling max dequeue count.

As you can see, we first copy the queue message, then we delete it. Perhaps the delete is failing for some reason. That would explain things.

Do you see the message in the poison queue - did it get copied there?

@mflps
Copy link
Author

mflps commented Mar 7, 2017 via email

@mathewc
Copy link
Member

mathewc commented Mar 7, 2017

Yes, good question. As you can see in our delete code here we have some specific handling for various exception types.

What would really help diagnose this would be if you could write a quick app using the storage SDK that tries to delete this message and share the results. That would pinpoint the issue and we could get a fix in for it. I haven't been able to repro this, but since you have a repro already that would help.

@mflps
Copy link
Author

mflps commented Mar 8, 2017 via email

@Bio2hazard
Copy link

Symptoms that @mflps describes are identical to #985 , so it's probably related to Azure Storage version 8.0.1.

@mathewc
Copy link
Member

mathewc commented Mar 8, 2017

Good point @Bio2hazard. @mflps did you by chance move to 8.0.1?

@fabiomaulo
Copy link

fabiomaulo commented Mar 9, 2017

I have the same problem (message in poison and message in the queue forever) and another very similar one. The first is probably the most difficult to recreate because, IMO, you need a multi-instance process consuming the queue (4 o 5 instances for example).
The second and very similar problem is when you have an OutOfMemory or "Never finished" job.
The solution could be very easy to implement (for us, users, or better for the SDK it self): the dequeue count have to be checked even before pass it to the job (before or in BeginProcessingMessageAsync default impl).

Note: I'm using 8.1.1 Storage sdk, and 2.0.0 webJob sdk.

@mathewc
Copy link
Member

mathewc commented Mar 10, 2017

Note that storage 8.x is not really supported. We only claim support for the version we ship with - 7.2.1 currently. That's all we're testing against. If you're upgrading to new major versions you may have issues. That said, we're planning on doing a test run with 8.x so that we can unblock this, but in general any time you're force moving to later versions, you might have issues that we can't anticipate.

@fabiomaulo
Copy link

Mat,
there is no problem with issues; more people are moving forward than more opportunity to make the SDK better and stable.
The matter here is about possible issues checking the DequeueCount only after the job run. If the job cause a situation that prevents the catch of the exception (for example OutOfMemory) you can't check the DequeueCount; if the situation happens again and again, at each job run, you will never put the message in the poison queue.

@tomeastham
Copy link

I had this issue and reverted back to storage 7.2.1 and it all works fine now.

@mflps
Copy link
Author

mflps commented Mar 17, 2017 via email

@chadwackerman
Copy link

@christopheranderson If there are issues like this, please, please mark your NuGet packages as requiring WindowsAzure.Storage < 8.0 until things get fixed. You could put a company out of business with a bug like this. Also given the obvious Sev 1 nature of this problem I'm surprised to see a lack of followup. I'd prefer not to discover critical bugs by reading issue one-thousand-something on a Sunday morning.

Open source is great, but I'm noticing that the primitive GitHub issue management system doesn't scale well to projects with an audience as large as Microsoft technologies. It seems to be flooding Program Managers with so many little issues that big ones like this keep getting ignored.

@christopheranderson
Copy link
Contributor

Sorry for the confusion here folks. We're working on updating all our package versions in another major version release which will likely start to have some pre-release bits this summer, which will unlock us to run on dotnet standard 2.0/core, etc.

Unfortunately, since some folks can use 8 (if they aren't using Storage bindings), we can't add the version flag now since that's technically a breaking change. This is our bad for not finding this before we released 2.0. After discussing it, we think the least impactful thing is to leave the dependency version the same and document the issue with Azure Storage 8.0.

I've opened up a separate issue to track adding 8.0 support - #1091

This issue will remain open until we address that one and/or add a proper version cap to our dependency version.

@chadwackerman
Copy link

A package manager refusing to install or upgrade something because there's a known compatibility issue is a feature. It's not something to hide from.

The goal is maximal transparency -- not keeping version numbers low and issues buried. It's just numbers. Unless a new version is imminent (days away), I'd release 3.0 and fix the NuGet dependency.

Here's an example of what not to do. The CoreFX team broke the entire .NET HTTP stack with 4.0. People screamed but nobody took it seriously. They released System.Net.Http 4.1, 4.2 and even 4.3 without fixing the bug. Six months later they had to roll back features and remove API. Did a beta as 4.4. Then released the final version as... 4.3.1.

Breaking changes, API removed, and they tacked on 0.0.1. Yikes.

I think some of this is driven by ego and embarrassment because it makes little technical sense. I really don't know what's going on with versioning but I'd encourage you to champion SemVer internally because the random versioning is really causing problems for devs on the outside.

http://semver.org/

Sorry for the lecture but somebody has to wave this flag.

@brettsam
Copy link
Member

Addressed with #1141.

@claudio-yuri
Copy link

Hi @mflps, is this fix included in the latest version of the SDK?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants