Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceBusExceptions thrown corresponding to missing messages #67

Closed
lyonsmg opened this issue Mar 2, 2017 · 9 comments
Closed

ServiceBusExceptions thrown corresponding to missing messages #67

lyonsmg opened this issue Mar 2, 2017 · 9 comments
Assignees
Labels
Milestone

Comments

@lyonsmg
Copy link

lyonsmg commented Mar 2, 2017

Actual Behavior

  1. com.microsoft.azure.servicebus.ServiceBusException exceptions thrown at unpredictable times that correspond to times that clients reading from event hub skip over messages

Examples:
com.microsoft.azure.servicebus.ServiceBusException: The message container is being closed (823). TrackingId:1ade7936-4120-4e5c-a658-d2c3bb941b92_B23, SystemTracker:NoSystemTracker, Timestamp:3/2/2017 6:29:12 PM, errorContext[NS: XXX, PATH: XXX/Partitions/4, REFERENCE_ID: b9b87c_00cbc20143e1403a860cc926e2006001_G19, LAST_OFFSET: 2240952, PREFETCH_COUNT: 999, LINK_CREDIT: 992, PREFETCH_Q_LEN: 0, R_TYPE: EPOCH]
at com.microsoft.azure.servicebus.ExceptionUtil.toException(ExceptionUtil.java:93)
at com.microsoft.azure.servicebus.MessageReceiver.onError(MessageReceiver.java:393)
at com.microsoft.azure.servicebus.MessageReceiver.onClose(MessageReceiver.java:646)
at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:83)
at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.onLinkRemoteClose(BaseLinkHandler.java:52)
at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:176)
at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:276)
at com.microsoft.azure.servicebus.MessagingFactory$RunReactor.run(MessagingFactory.java:340)
at java.lang.Thread.run(Thread.java:745)

com.microsoft.azure.servicebus.ServiceBusException: com.microsoft.azure.servicebus.amqp.AmqpException: The link 'G25:32590172:65c6ee_4544660c07c8411e92e1c8a43f7a9ef3_G25' is force detached by the broker due to errors occurred in consumer(link221668). Detach origin: InnerMessageReceiver was closed. TrackingId:290bcb3500020337000361e458b82c36_G25_B23, SystemTracker:XXX~19662|XXX, Timestamp:3/2/2017 3:49:21 PM, errorContext[NS: XXX, PATH: XXX/Partitions/11, REFERENCE_ID: 65c6ee_4544660c07c8411e92e1c8a43f7a9ef3_G25, LAST_OFFSET: 279304, PREFETCH_COUNT: 999, LINK_CREDIT: 988, PREFETCH_Q_LEN: 0, R_TYPE: EPOCH]
at com.microsoft.azure.servicebus.ExceptionUtil.toException(ExceptionUtil.java:86)
at com.microsoft.azure.servicebus.MessageReceiver.onError(MessageReceiver.java:393)
at com.microsoft.azure.servicebus.MessageReceiver.onClose(MessageReceiver.java:646)
at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:83)
at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.onLinkRemoteClose(BaseLinkHandler.java:52)
at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:176)
at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:276)
at com.microsoft.azure.servicebus.MessagingFactory$RunReactor.run(MessagingFactory.java:340)
at java.lang.Thread.run(Thread.java:745)

Expected Behavior

  1. No exceptions are thrown, or
  2. Some way to recognize when the exception occurs so we don't skip over the messages and just restart our process or do some other clean up task

Versions

  • OS platform and version: Windows Server 2012 R2
  • Maven package version or commit ID: 0.9.0
@JamesBirdsall
Copy link
Contributor

This looks a lot like issue 132 in the dotnet version Azure/azure-event-hubs-dotnet#132 . We are in the process of porting the fix to Java EPH.

@JamesBirdsall JamesBirdsall self-assigned this Mar 4, 2017
@JamesBirdsall
Copy link
Contributor

Unfortunately this is not the same as the dotnet issue 132. Continuing to investigate.

@JamesBirdsall
Copy link
Contributor

Some questions:

  1. The exceptions shown above were reported to the onError method of the customer’s IEventProcessor implementation, right? If not, where were they caught?
  2. Are you using checkpoints? If so, when do you create checkpoints? If not, what initial offset provider do you set up in EventProcessorOptions? Exceptions like these will cause the existing IEventProcessor instance for that partition to shut down, and then it will be restarted, possibly on another EventProcessorHost instance if there are multiple. There’s a potential to skip messages during such a shutdown and restart if the starting point for the new receiver is not carefully chosen.

Thanks!

@sjkwak sjkwak added this to the 0.13.0 milestone Mar 10, 2017
@lyonsmg
Copy link
Author

lyonsmg commented Mar 10, 2017 via email

@nirmalpshah
Copy link

@JamesBirdsall answers to your questions:

  1. You're correct, these exceptions were reported to the onError method of our IEventProcessor - I had missed them in the noise of the log.
  2. We are using checkpoints. We call context.checkpoint() in the last line of our onEvents method. We are also, as one of the first lines of our onEvents method, calling context.setOffsetAndSequenceNumber() with details of the latest event provided in onEvents.

I see now that setOffsetAndSequenceNumber is deprecated. What is the recommended way to checkpoint to avoid the data loss you describe?

@JamesBirdsall
Copy link
Contributor

Sreeram has identified the origin of this issue and his fix is pull request 78, linked above. My initial suspicion of a checkpointing problem was not correct.

The checkpointing recommendation is pretty straightforward: don't checkpoint until you're absolutely sure that you have either processed all events up to and including the checkpointed offset, or that you don't care if some haven't been. Depending on the nature of the processing, implementing that recommendation correctly can be simple or complicated, such as if the processing is async, or worse yet if it appears to be synchronous but is actually doing async under the covers, like a lazy or batched write/commit.

As far as checkpointing with a specific offset, the recommended way is PartitionContext.checkpoint(EventData), which checkpoints the offset and sequence number of the event given as the argument.

@jtaubensee jtaubensee modified the milestones: 0.12.0, 0.13.0 Mar 20, 2017
@SreeramGarlapati
Copy link
Contributor

@lyonsmg - this bug was fixed as part of our release http://mvnrepository.com/artifact/com.microsoft.azure/azure-eventhubs/0.12.0
We truly appreciate you for filing the issue and the patience with which you lead this bug to a resolution. Thanks a lot!
Sreeram

@madhunaidu2468
Copy link

Hi @SreeramGarlapati ,

I am still facing this issue in azure-eventhubs (version 0.13.0). Could you please let me know if I need to make any changes in code. I have 4 partitions and 2 of my partitions stopped suddenly, so I was not receiving events sent from device.

My code is same as the tutorial in this link

Error message I receive is:
2017-08-09 23:46:03 - IN: CH[1] : Detach{handle=0, closed=true, error=Error{condition=com.microsoft:container-close, description='The message container is being closed (20074). TrackingId:30a8f397-7843-41a5-aae3-ac2bbe25a0d6_B5, SystemTracker:NoSystemTracker, Timestamp:8/9/2017 11:46:03 PM', info=null}}
2017-08-09 23:46:03 - linkName[76e5b7_e0fb_G5_1502316395138], ErrorCondition[com.microsoft:container-close, The message container is being closed (20074). TrackingId:30a8f397-7843-41a5-aae3-ac2bbe25a0d6_B5, SystemTracker:NoSystemTracker, Timestamp:8/9/2017 11:46:03 PM]
2017-08-09 23:46:03 - linkName[76e5b7_e0fb_G5_1502316395138]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants