Skip to content

fix(iot-dev): Fix bug where AMQP layer would block a while on closing if the proton reactor thread crashed#1383

Merged
timtay-microsoft merged 4 commits intomainfrom
timtay/reactorCrashHandler
Oct 8, 2021
Merged

fix(iot-dev): Fix bug where AMQP layer would block a while on closing if the proton reactor thread crashed#1383
timtay-microsoft merged 4 commits intomainfrom
timtay/reactorCrashHandler

Conversation

@timtay-microsoft
Copy link
Copy Markdown
Member

#1379

We had a basic assumption that onReactorFinal would always be called when proton-j's thread was finished, but if proton-j's thread encountered a runtime exception or some handler exception, the thread would exit prematurely and would never execute onReactorFinal. Because of that, we need to release some state latches when this happens so that the user's close thread doesn't block for 20 seconds as discussed in the issue above.

… if the proton reactor thread crashed

still need to do manual testing
}
}

private static class ReactorRunner implements Callable<Object>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is a bit hard to follow, but basically I moved this class out to its own file so that this file didn't have other classes defined in it. I then moved all the logic from the IotHubReactor class into this class since IotHubReactor was an unnecessary layer to have here.

}
this.reactor.stop();
this.reactor.process();
this.reactor.free();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one function was the sole reason we had this class abstracting the Reactor. I moved this logic into the ReactorRunner class to cut down on the number of classes we have.


public void free()
{
this.reactor.free();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A perfect illustration of why this layer of abstraction was pretty pointless


// Notify the AMQP connection layer so it can tear down the reactor's resources that would usually
// get cleaned up in a graceful close of the reactor
this.reactorRunnerStateCallback.onReactorClosedUnexpectedly();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new stuff that actually fixes the bug. Just another state callback so that we can release the latches like the comment mentions.

@timtay-microsoft
Copy link
Copy Markdown
Member Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@timtay-microsoft timtay-microsoft merged commit 887099b into main Oct 8, 2021
@timtay-microsoft timtay-microsoft deleted the timtay/reactorCrashHandler branch October 8, 2021 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants