Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving messages with MSMQ to the error queue with TimeToBeReceived causes the endpoint to shutdown #3378

Merged
merged 1 commit into from Jan 28, 2016

Conversation

danielmarbach
Copy link
Contributor

Who's affected

  • All users with MSMQ and having messages with TimeToBeReceived set

Symptoms

When a message arrives on an endpoint with a TimeToBeReceived set and that messages fails to be processed (i.e. because of SerializationException) the message cannot be moved to the error queue. The following exception is raised:

InvalidOperationException: Sending messages with a custom TimeToBeReceived is not supported on transactional MSMQ

Since this is a fatal error which triggers the CircuitBreaker the endpoint will shutdown after a while. Restarts won't help since the message will be tried to process again and the end resullt will be another endpoint shutdwon

Description

Originally raised here Particular/ServiceControl#624

When a message can't be properly deserialized we try to send it to the error queue. But if that message contains a TimeToBeReceived the following excpetion is raised internally:

2016-01-27 10:53:55.0534|26|Fatal|NServiceBus|Fault manager failed to process the failed message with id cfe47b9b-99e0-4533-94a3-cb1731c97a69\6724376
System.InvalidOperationException: Could not forward failed message to error queue. ---> System.Exception: Failed to send message to address: Particular.ServiceControl.Errors@ENLIGHTEMENT. Sending messages with a custom TimeToBeReceived is not supported on transactional MSMQ.
   at NServiceBus.Transports.Msmq.MsmqMessageSender.Send(TransportMessage message, SendOptions sendOptions) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Transports\Msmq\MsmqMessageSender.cs:line 41
   at NServiceBus.Faults.Forwarder.FaultManager.HandleProcessingAlwaysFailsForMessage(TransportMessage message, Exception e, Int32 numberOfRetries) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Faults\Forwarder\FaultManager.cs:line 76
   at NServiceBus.Faults.Forwarder.FaultManager.TryHandleFailure(Action failureHandlingAction) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Faults\Forwarder\FaultManager.cs:line 112
   --- End of inner exception stack trace ---
   at NServiceBus.Faults.Forwarder.FaultManager.TryHandleFailure(Action failureHandlingAction) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Faults\Forwarder\FaultManager.cs:line 124
   at NServiceBus.Unicast.Transport.FirstLevelRetries.TryInvokeFaultManager(TransportMessage message, Exception exception, Int32 numberOfAttempts) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Unicast\Transport\FirstLevelRetries.cs:line 66

https://github.com/Particular/NServiceBus/blob/master/src/NServiceBus.Core/Transports/Msmq/MsmqMessageSender.cs#L41

This then ultimately leads to a ServiceControl shutdown at some point.

It seems that in V5 we don't remove the TimeToBeReceived when sending to the error queue.

https://github.com/Particular/NServiceBus/blob/master/src/NServiceBus.Core/Faults/Forwarder/FaultManager.cs#L57

When fixing the bug we assumed passing in a new SendOption is enough. But the FaultManager uses the TimeToBeReceived set on the TransportMessage. So everytime we hit this the endpoint will trigger critical error and shutdown the endpoint but at least we don't loose a message

/ cc @janovesk

@danielmarbach
Copy link
Contributor Author

@Particular/nservicebus-maintainers can someone quickly glance over it. I had to backport manually the tests and the implementation from 5.1.9 to 5.0.11. The nasty thing is I had to use dynamic for the builder fake :(

sender.Send(message, new SendOptions(ErrorQueue));
return;
}

message.TimeToBeReceived = TimeSpan.MaxValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we move that to line 61 and delete the duplicate at 64?

@danielmarbach
Copy link
Contributor Author

@timbussmann fixed. The duplication was introduced by the way I tested and evolved the code. Good catch

danielmarbach added a commit that referenced this pull request Jan 28, 2016
Reset TTBR before error queueMoving messages with MSMQ to the error queue with `TimeToBeReceived` causes the endpoint to shutdown
@danielmarbach danielmarbach merged commit cc3272d into hotfix-5.0.11 Jan 28, 2016
@danielmarbach danielmarbach deleted the ResetTTBRBeforeErrorQueue branch January 28, 2016 15:18
@danielmarbach danielmarbach changed the title tiReset ttbr before error queueMoving messages with MSMQ to the error queue with TimeToBeReceived causes the endpoint to shutdown Reset ttbr before error queueMoving messages with MSMQ to the error queue with TimeToBeReceived causes the endpoint to shutdown Jan 28, 2016
@timbussmann
Copy link
Contributor

@danielmarbach how about renaming the issue title to something readable like "Reset TimeToBeReceived before moving message to ErrorQueue"?

@danielmarbach danielmarbach changed the title Reset ttbr before error queueMoving messages with MSMQ to the error queue with TimeToBeReceived causes the endpoint to shutdown Moving messages with MSMQ to the error queue with TimeToBeReceived causes the endpoint to shutdown Jan 28, 2016
@danielmarbach
Copy link
Contributor Author

@timbussmann that slipped in. Changed

@SeanFeldman
Copy link
Contributor

Removed workaround section as it's not helping customers.

Original section was:

  • Remove the TimeToBeReceived from the message definition
  • For messages in the queue, move the messages out of the queue into the error queue and remove the TimeToBeReceived

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants