Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Offer Loop Usage of Request Locks #2239

Merged
merged 7 commits into from
Nov 16, 2021
Merged

Improve Offer Loop Usage of Request Locks #2239

merged 7 commits into from
Nov 16, 2021

Conversation

WH77
Copy link
Contributor

@WH77 WH77 commented Oct 28, 2021

This PR attempts to reduce the consequences (ie scheduling lag) of attempting to schedule a task whose request lock is locked elsewhere for a long period of time (ie history persisters).

Changes:

  • Actually make offer scoring parallel while waiting on request locks. The offer scoring thread pool uses an unbounded queue, so it would never grow past the core pool size of 1 / was still single threaded.
  • Limit the amount of time that the offer loop will wait for a request lock - less relevant now that the thread pool can have more than 1 thread, but would still help if the thread pool was saturated with locked requests.

Took a very quick/lazy approach to the tryRunWithLock in the offer loop, so please send feedback - should it throw an error, log whether it was able to run, etc.

Considered attempting a larger scale refactor (poll from a single priority queue of pending tasks, remove request locks, etc), but that seemed very likely to introduce new and exciting bugs. The fact that there's separate offer-scheduler and SingularitySchedulerPoller threads that schedule tasks slightly differently (drainPendingQueue) is annoying, but doesn't seem to be the main source of problems.

cc - @ssalinas, @pschoenfelder

return start;
} else {
LOG.trace(
"{} - Failed to acquire lock on {} ({})",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a debug or even an info? Feel like we want to be more aware of when locks are timing out

return -1;
}
} catch (InterruptedException e) {
throw new RuntimeException(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note that any uncaught exception that bubbles up through the offer scheduling loop will cause singularity to abort/restart. Interrupted seems like we'd already be in the process of shutdown if that happens, but worth a quick trace to see where it would be caught

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a change made to prevent uncaught exceptions for tasks from killing the entire scheduler (#2233), but this method probably should return -1 in this case as well, unable to acquire lock due to interrupt.

Comment on lines +267 to +271
deployManager.createDeployIfNotExists(
updatedRequest,
deployMarker,
validatedDeploy
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this before the deploy-already-in-progress check? We would already save this data right afterwards in saveDeploy anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, I see what this is doing now, pending deploy gets created slightly before the actual deploy object. without it

@ssalinas
Copy link
Member

🚢

@WH77 WH77 merged commit df03e6c into master Nov 16, 2021
@WH77 WH77 deleted the offer-check-lag branch November 16, 2021 16:18
@ssalinas ssalinas added this to the 1.5.0 milestone May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants