New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If message consistently fails in staging phase of retries, ServiceControl is unable to retry any messages #1508

Merged
merged 5 commits into from Dec 5, 2018

Conversation

Projects
None yet
4 participants
@SzymonPobiega
Copy link
Member

SzymonPobiega commented Dec 3, 2018

The ServiceControl stage process does not have the concept of poison message handling so if there is a single message that consistently fails staging (i.e. it is too big for the transport or otherwise unacceptable by the transport Dispatcher), the staging phase can't complete resulting in all messages scheduled for retry to be stuck in the staging phase.

Messages that exceed the failure limit are removed from the staging batch and a notification entry is created informing the user about the problem.

SzymonPobiega added some commits Dec 3, 2018

Fix

@SzymonPobiega SzymonPobiega self-assigned this Dec 3, 2018

@SzymonPobiega SzymonPobiega requested a review from mikeminutillo Dec 3, 2018

await store.AsyncDatabaseCommands.PatchAsync(message.Id,
new[]
{
new PatchRequest

This comment has been minimized.

@SzymonPobiega

SzymonPobiega Dec 3, 2018

Member

Existing document would not have this property set but this way of setting/incrementing it should be fine

@mikeminutillo
Copy link
Member

mikeminutillo left a comment

Maybe out of the scope of this issue but I wonder if we shouldn't extract all of the staging logic into a RetryStageManager or something. That way the processor simplifies down to "If there is a forwarding batch then ask the forwarder to forward it. Otherwise, if there is a staging batch, ask the stage manager to stage it."

@SzymonPobiega

This comment has been minimized.

Copy link
Member

SzymonPobiega commented Dec 4, 2018

@WilliamBZA @mikeminutillo ready for another round

Fix error message
Fix Off-by-one error in staging retries
@SzymonPobiega

This comment has been minimized.

Copy link
Member

SzymonPobiega commented Dec 5, 2018

@mikeminutillo @WilliamBZA how do you feel about bundling this with the maintenance release to save time & effort related to deploying SC?

@WilliamBZA

This comment has been minimized.

Copy link
Member

WilliamBZA commented Dec 5, 2018

👍 from my side

@SzymonPobiega SzymonPobiega changed the base branch from master to maintenance-release-03-12 Dec 5, 2018

@SzymonPobiega SzymonPobiega added this to the 3.4.0 milestone Dec 5, 2018

@SzymonPobiega SzymonPobiega changed the title Handle poison messages in staging If message consistently fails in staging phase of retries, ServiceControl is unable to retry any messages Dec 5, 2018

@SzymonPobiega SzymonPobiega merged commit bbe3c63 into maintenance-release-03-12 Dec 5, 2018

1 check passed

continuous-integration/teamcity Finished TeamCity Build ServiceControl :: 1. Build : Tests passed: 195, ignored: 8
Details

@SzymonPobiega SzymonPobiega deleted the handle-poison-messages-in-staging branch Dec 5, 2018

@boblangley boblangley added the Bug label Dec 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment