Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Service Bus] Port fix to use new rhea connection object for connection recovery #8580

Merged
merged 42 commits into from
May 1, 2020

Conversation

HarshaNalluru
Copy link
Member

@HarshaNalluru HarshaNalluru commented Apr 28, 2020

More info - #8447 and #8401
This PR attempts to port the changes from

chradek and others added 19 commits April 27, 2020 12:52
…andler (Azure#8401)

* initial fix

* [service-bus] surface disconnect errors to streaming receiver error handler

* remove unnecessary flag from clientEntityContext.onDetached

* address feedback

* improve logs when calling onDetached concurrently

* make logs less noisy

* remove log

* Apply suggestions from code review

Co-Authored-By: Ramya Rao <ramya.rao.a@outlook.com>

* reduce 3 conditionals to 2 that exit early

Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com>
* [service-bus] recreate Connection object on disconnect

* update amqp-common

* fix onDetached to be ran in parallel instead of serial, adds tests

* update changelog

* add tests for management client/link

* address feedback

* Update sdk/servicebus/service-bus/CHANGELOG.md

Co-Authored-By: Ramya Rao <ramya.rao.a@outlook.com>

Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com>
await serviceBusClient.test.afterEach();
});

it("can receive and settle messages after a disconnect", async function(): Promise<void> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test times out. Not able to receive the second message.
Worked fine in track 1. Yet to find the root cause.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

Upon simulating the disconnect, the error thrown is being perceived as non-retryable by the core-amqp. Which is why the receiver never came back up for receiving the second message.
image

This is not the case with amqp-common, this error is treated as a generic-error and is wrapped as a MessagingError(and by default, the retryable flag is set to true).
image

This is only the difference b/w track 1 and track 2 and is the cause behind the test failure.

Hence, in track 2, when the error is caused by a disconnect and if the error is not translated as a messaging-error, we don't care about the retryable flag on the error and re-open anyway because we don't want to miss out on errors caused by disconnect.

@HarshaNalluru
Copy link
Member Author

/azp run js - servicebus - tests

@azure-pipelines
Copy link

Pull request contains merge conflicts.

@HarshaNalluru
Copy link
Member Author

/azp run js - servicebus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@HarshaNalluru
Copy link
Member Author

/azp run js - servicebus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

if (isMessagingError(translatedError)) {
shouldReopen = translatedError.retryable;
} else {
shouldReopen = !translatedError || causedByDisconnect || false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case are we trying to handle with false in the end here?
Non messaging error not caused by disconnect? Why would they be not retryable?

Maybe this should distill down to

const shouldReopen = isMessagingError(translatedError) ? translatedError.retryable : true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good simplification...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a false since typescript was complaining here about the "undefined" being assigned to the boolean value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case are we trying to handle with false in the end here?

Adding || false won't affect anything. No additional case is being handled other than fixing the typescript complaint.

Non messaging error not caused by disconnect? Why would they be not retryable?

They are not retryable because the translate() method did not consider them retryable.

const shouldReopen = isMessagingError(translatedError) ? translatedError.retryable : true;

Are you suggesting that we should allow all non-messaging errors to be retried?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding || false won't affect anything. No additional case is being handled other than fixing the typescript complaint.

Thats not true. There are errors for which translate() will return as is and not change it to a MessagingError. For these, you will be setting shouldReopen to false, but in Track 1 these will be true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said it "won't affect anything", I meant that this code is in the same state as the code from your previous suggestion(from the other PR).

There are errors for which translate() will return as is and not change it to a MessagingError. For these, you will be setting shouldReopen to false, but in Track 1 these will be true

I thought that's what we agreed upon and mentioned in the comments too.

So finally, Going with re-open if

  • No error
  • Messaging-error that is retryable
  • Non-messaging-error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Contributor

@chradek chradek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! :shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants