Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Handle gRPC failure #40

Closed
jwulf opened this issue Jun 19, 2019 · 2 comments · Fixed by #41
Closed

Handle gRPC failure #40

jwulf opened this issue Jun 19, 2019 · 2 comments · Fixed by #41

Comments

@jwulf
Copy link
Member

jwulf commented Jun 19, 2019

If the broker is not up when workers start, then the first call to a ZBClient method will throw an unhandled exception, which can crash an entire process if it's not handled in the application.

Here is an example of handling it at the application layer, when waiting - for example - for a broker container to start:

async function main() {
	const zbc = new ZBClient(config.ZEEBE_GATEWAY);
	await promiseRetry((retry, number) => {
		if (number > 1) {
			console.log("gRPC connection is in failed state...");
		}
		return deployBpmn(zbc).catch(retry);
	});
	console.log("gRPC connection to broker established.");
	startZeebeWorkers(zbc);
	startRESTServer(zbc);
}

This should be wrapped into the ZBClient class, to remove the responsibility for handling this from the application.

@jwulf
Copy link
Member Author

jwulf commented Jun 19, 2019

It's not that simple.....

When running the integration tests on the proposed change, I notice these cases, which you do not want to retry:

  • cancelWorkflowInstance() - if the workflowInstance doesn't exist, you don't want to keep retrying.

  • setVariables - probably the same thing.

  • deployWorkflow - what if the workflow fails to deploy because it has an error in the BPMN? You don't want to retry that.

Proposed behaviour

  • Operations retry, but only if the error message starts with '14' - indicating a transient network failure. This can be caused by passing in an unresolvable gateway address (DNS Resolution failed), or by the gateway not being ready yet (UNAVAILABLE: failed to connect to all addresses).
  • Retry is enabled by default, and can be disabled by passing { retry: false } to the client constructor.
  • maxRetries and maxRetryTimeout are also configurable through the constructor options. By default, if not supplied, the values are:
ZBClient(gatewayAddress, {
    retry: true,
    maxRetries: 50,
    maxRetryTimeout: 5000
})
  • Retry is provided by promise-retry, and the back-off strategy is simple ^2.

@jwulf
Copy link
Member Author

jwulf commented Jun 20, 2019

With 2.3.0 of the zeebe-node client, my worker container would blow up if it started before the broker (which it always did, because Node vs Java), and I had to use restart: always to get it sync up with the broker.

This was an issue for me, even with restart: always, because it meant that the logs from the restarted container were not displayed by docker-compose up.

Here is what it looks like now, using zeebe-node-next with PR #41 :

14 UNAVAILABLE: failed to connect to all addresses
gRPC connection is in failed state. Attempt 2. Retrying in 5s...
14 UNAVAILABLE: failed to connect to all addresses
gRPC connection is in failed state. Attempt 3. Retrying in 5s...
14 UNAVAILABLE: failed to connect to all addresses
gRPC connection is in failed state. Attempt 4. Retrying in 5s...
gRPC connection to broker established.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant