Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle task executor gateway connection failures gracefully #442

Merged
merged 1 commit into from
May 16, 2023

Conversation

sundargates
Copy link
Collaborator

Context

^^^

Checklist

  • ./gradlew build compiles code correctly
  • Added new tests where applicable
  • ./gradlew test passes all tests
  • Extended README or added javadocs where applicable

@github-actions
Copy link

Test Results

127 files  ±0  127 suites  ±0   6m 25s ⏱️ +9s
537 tests ±0  528 ✔️ ±0  8 💤 ±0  1 ±0 
538 runs  ±0  529 ✔️ ±0  8 💤 ±0  1 ±0 

For more details on these failures, see this check.

Results for commit dabf29f. ± Comparison against base commit 5034b7b.

"Failed to establish connection with the task executor {}; Resubmitting the request",
event.getTaskExecutorID(), e);
connectionFailures.increment();
self().tell(event.getScheduleRequestEvent().onFailure(e), self());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any backoff/delay needed to avoid short-circuiting the scheduler

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sending a retry message here, which delays how it re-queues the original request.

log.error("Failed to submit the request {}; Retrying in {} because of ", event.getScheduleRequestEvent(), event.getThrowable());
            getTimers().startSingleTimer(
                getSchedulingQueueKeyFor(event.getScheduleRequestEvent().getRequest().getWorkerId()),
                event.onRetry(),
                intervalBetweenRetries);

@sundargates sundargates merged commit e7424ed into Netflix:master May 16, 2023
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants