Singularity performance improvements #1702
Conversation
…ests to be synchronous
Additional improvements:
Test updates:
|
@@ -421,13 +421,14 @@ private void deleteScheduledTasks(final Collection<SingularityPendingTask> sched | |||
} | |||
|
|||
private List<SingularityTaskId> getMatchingTaskIds(SingularityRequest request, SingularityDeployKey deployKey) { | |||
List<SingularityTaskId> activeTaskIdsFroRequest = leaderCache.getActiveTaskIdsForRequest(deployKey.getRequestId()); |
baconmania
Feb 7, 2018
Contributor
Minor typo
Minor typo
} | ||
|
||
final long start = schedulerLock.lock(maybeTaskId.get().getRequestId(), "statusUpdate"); | ||
try { |
baconmania
Feb 7, 2018
Contributor
Does it make sense to use callWithRequestLock()
here as well, and then make SingularitySchedulerLock#(lock|unlock)
private?
Does it make sense to use callWithRequestLock()
here as well, and then make SingularitySchedulerLock#(lock|unlock)
private?
ssalinas
Feb 7, 2018
Author
Member
yes, I think I had a different exception in here at one point and didn't refactor back to callWithRequestLock, thanks for finding that
yes, I think I had a different exception in here at one point and didn't refactor back to callWithRequestLock, thanks for finding that
} | ||
|
||
public long lock(String name) { | ||
public long lock(String requestId, String name) { |
baconmania
Feb 7, 2018
Contributor
Could it be a good idea to accept the calling class here instead of a name, and then log out the class name in the trace logs below? Just to ensure that the names we pass when locking aren't arbitrary and are useful for tracing the flow of the scheduler.
Could it be a good idea to accept the calling class here instead of a name, and then log out the class name in the trace logs below? Just to ensure that the names we pass when locking aren't arbitrary and are useful for tracing the flow of the scheduler.
ssalinas
Feb 7, 2018
Author
Member
maybe even going as far as class#methodName could be useful for that logging
maybe even going as far as class#methodName could be useful for that logging
return System.currentTimeMillis(); | ||
} | ||
|
||
public void unlock(String name, long start) { | ||
LOG.info("{} - Unlocking ({})", name, JavaUtils.duration(start)); | ||
public void unlock(String requestId, String name, long start) { |
baconmania
Feb 7, 2018
Contributor
Should be safe to make lock()
and unlock()
private now, and only allow the rest of the scheduler to interact via runWithRequestLock()
.
Should be safe to make lock()
and unlock()
private now, and only allow the rest of the scheduler to interact via runWithRequestLock()
.
|
This is a first pass at updates to the scheduler internals to improve the speed of offer processing, status updates, and other processes that contend for the scheduler lock. The main theme here is parallelizing things that were previously sequential.
The biggest change in this PR that still needs additional testing is the removal of the global scheduler lock in favor of many smaller locks. There are now three classes of locks. One lock for the scheduler state, one for accessing/processing offers, and one ConcurrentHashMap of request locks such that only one process can be updating the state or tasks for a request at a time. This change lets offers, status updates, deploy checks, pending request processes, and more run in parallel.
Other changes include:
The request level locking along with our proxy to leader code opens us up to some better UI experiences in the future as well. Instead of waiting for pollers, we can more easily execute many actions like bounces or task kills at request time and immediately return results to the UI.
Further TODOs include finishing off fixing tests, running additional benchmarks of the new code, and cleaning up some rough edges (like hard coded values that should instead be configurable)
@darcatron @baconmania @pschoenfelder