Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic Consumers suddenly stopped receiving messages from topic #366

Closed
yoniadir opened this issue Jun 28, 2020 · 16 comments
Closed

Topic Consumers suddenly stopped receiving messages from topic #366

yoniadir opened this issue Jun 28, 2020 · 16 comments

Comments

@yoniadir
Copy link

I have a system that looks like this:

image

When the topic sender sends a message to the topic, both SubscriptionClients get the message, process it and release it (using "ReceiveMode = PeekLock" and "Task CompleteAsync(string lockToken)")

After X time, both SubscriptionClients stopped receiving messages from topic.
No errors occurred for both TopicSender and SubscriptionClients.
I guess it has something to do with the connection of the SubscriptionClients to their Subscriptions.
So I tried to reproduce the problem by going to the topic properties in the Azure portal and change the Topic state to Disable and then to Active

image

The behavior of the system looked the same as the problem I had - The topic sender continued to send messages without getting any errors and the SubscriptionClients didn't receive any new messages (and no errors...).
The SubscriptionClient property IsClosedOrClosing = false through all the time of the problem
Restart to the SubscriptionClients solved the problem...

  1. Is there any connection timeout?
  2. Is there any indication for a non active connection?
@johneadan
Copy link

Same problem here.
My clients at some point just doesn't get messages until I manually restart (While the sender continues sending...)
No error messages at any point of course.
Any ideas?

@avishih6
Copy link

avishih6 commented Aug 2, 2020

Known Issue, encountered it many times. As far as I know, there's nothing to do but restart.

@shankarsama
Copy link

@yoniadir @avishih6 @johneadan Can you please share the repro steps if you are able to repro it ? Or open a support ticket from Azure portal if the issue occurred recently so that we will debug the issue using logs.

@yoniadir
Copy link
Author

yoniadir commented Aug 6, 2020

@shankarsama
I can't really reproduce it...
The problem can suddenly happen after X time, and it is quite invisible (no exceptions nor indications about something wrong with the connection)

I succeeded to create that scenario with these steps:

  1. Run a sender on your system that sends messages to topic X
  2. Run a subscriber that listens to topic X
  3. Send a message from sender to topic X
  4. Subscriber enqueues the message from topic X
  5. Open Azure portal and change the Topic X state to Disable and then to Active

  1. Send another message from the sender to topic X
  2. Subscriber didn't receive the message

@yvesgoeleven
Copy link

I can confirm that I'm seeing the same problem as well. After a while (can be months), receiving stops without exception and a restart is required.

@alexcampana
Copy link

I was experiencing the same problem and it turned out to be this issue: Azure/azure-sdk-for-net#6450.
After adding AbandonAsync on failed messages the problem went away.

@mronnlun
Copy link

mronnlun commented Oct 18, 2020

I have the exact same issue. We have an Azure Function with a service bus trigger. We only run CompleteAsync when a message is successfully handled. If there is an error, the exception is hidden and we just let the code run out. This is desired by us because then the message will be retried after the lock timeout.

If we see a lot of failures in the function, e.g. because some external resource is down, the function stops working after a while and we need to restart it. There are no errors recorded anywhere.

The alternative, that we are using for queues, is to complete the message and add a new message with a schedule to the queue. This approach cannot however be using with a topic that could have several subscribers.

@EldertGrootenboer
Copy link
Contributor

If this issue still occurs, please let us know and we can re-open the issue.

@SeanFeldman
Copy link
Contributor

I've personally seen several customers having this kind of issue. Unfortunately, each time it happens in production, they cannot allow days/weeks/months of investigation, ending up deleting and recreating the problematic entity.

This issue is not resolved but reopening and handling it in the same manner it was won't help either.

@rmandvikar
Copy link

rmandvikar commented Oct 5, 2022

I hit this issue too twice so far in PRD, 9/8/2021 and today (10/5/2022). The enqueues are ok, dequeues stop working. The app runs fine for days to months without issues. I don't see any exceptions in logs for the ASB leading up to the incident. A restart will fix the issue. 🤯

i see another issue that's similar: #549

cc: @shankarsama, @EldertGrootenboer

@EldertGrootenboer
Copy link
Contributor

EldertGrootenboer commented Oct 5, 2022

Reopening this for further investigation. @rmandvikar can you please log a support request for this, and share details about your ticket and environment on servicebuspm@microsoft.com? I will open an investigation item in our backlog for this.

@shankarsama
Copy link

We fixed a bug related to this recently. Production rollout will be completed by end of March 2023. Please reopen this issue if it reproduces in future.

@rmandvikar
Copy link

@shankarsama Release notes, please?

@jperegri
Copy link

jperegri commented Apr 4, 2023

We hit this issue today, 04/04/2023, and twice in the past in PRD, 9/8/2021 and today (10/5/2022). The enqueues are ok, dequeues stop working. We are still checking for more details.

cc: @shankarsama, @EldertGrootenboer, @rmandvikar

@EldertGrootenboer
Copy link
Contributor

@jperegri Can you please open a support request, providing all the details you have, such as SDK version, timelines, etc. We will investigate if this is related to the fix we deployed, or if this is a different scenario.

@yvgopal
Copy link
Member

yvgopal commented Apr 21, 2023

@jperegri It is not the same issue. Let's not reopen this issue unless we are sure it is the same issue. Others may think it is not yet fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests