Skip to content

Commit

Permalink
HBASE-24361 Make RESTApiClusterManager more resilient (#1701)
Browse files Browse the repository at this point in the history
* sometimes API calls return with null/empty response bodies. thus,
  wrap all API calls in a retry loop.
* calls that submit work in the form of "commands" now retrieve the
  commandId from successful command submission, and track completion
  of that command before returning control to calling context.
* model CM's process state and use that model to guide state
  transitions more intelligently. this guards against, for example,
  the start command failing with an error message like "Role must be
  stopped".
* improvements to logging levels, avoid spamming logs with the
  side-effects of retries at this and higher contexts.
* include references to API documentation, such as it is.

Signed-off-by: stack <stack@apache.org>
  • Loading branch information
ndimiduk committed May 19, 2020
1 parent 61e2225 commit cf9e337
Show file tree
Hide file tree
Showing 2 changed files with 346 additions and 74 deletions.
Expand Up @@ -73,11 +73,11 @@ protected enum Signal {
"timeout %6$s /usr/bin/ssh %1$s %2$s%3$s%4$s \"sudo %5$s\"";
private String tunnelSudoCmd;

private static final String RETRY_ATTEMPTS_KEY = "hbase.it.clustermanager.retry.attempts";
private static final int DEFAULT_RETRY_ATTEMPTS = 5;
static final String RETRY_ATTEMPTS_KEY = "hbase.it.clustermanager.retry.attempts";
static final int DEFAULT_RETRY_ATTEMPTS = 5;

private static final String RETRY_SLEEP_INTERVAL_KEY = "hbase.it.clustermanager.retry.sleep.interval";
private static final int DEFAULT_RETRY_SLEEP_INTERVAL = 1000;
static final String RETRY_SLEEP_INTERVAL_KEY = "hbase.it.clustermanager.retry.sleep.interval";
static final int DEFAULT_RETRY_SLEEP_INTERVAL = 1000;

protected RetryCounterFactory retryCounterFactory;

Expand Down

0 comments on commit cf9e337

Please sign in to comment.