Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix tracking of finished jobs within JobMaster
After some investigation it was found that the JobMaster was not properly tracking finished jobs due to discrepancies in the implementation of the `compareTo` and `equals` methods of `JobInfo`. This led to jobs being removed from the ConcurrentSkipListSet without ever actually calling the `remove` function. The resulting behavior is that we would see the master hit the maximum job capacity very frequently as there would be un-flushable jobs in the `mIdToCoordinators` map. This change fixes the original issue by altering the underlying structure used to keep track of the finished jobs and by making more guarantees about how the status of jobs may change. First, with this change we no longer allow a JobInfo to change states after it has been marked with a state where `JobInfo#isFinished` returns true. This means all jobs states should now follow a DAG (RUNNING is an optional state as a job may move from CREATED to any one of the finished states): ``` CREATED -> (RUNNING) -> [FAILED|CANCELLED|COMPLETED] ``` Second, instead of utilizing a ConcurrentSkipListSet I've opted for a simpler concurrent structure; the LinkedBlockingQueue. Using a FIFO queue with the guarantee that once jobs are finished they cannot change states, we gain the opportunity to use a simple queue that has O(1) offer/poll operations. With this approach jobs that are marked as finished may get added in different orders depending on how the internal locks within the queue operate. This implementation no longer strictly guarantees that objects in the queue are ordered by the lastStatusChangeMs time. The amount of time difference between concurrent jobs being added should not be so great that leaves jobs such that the retention time severely affects the amount of jobs that can be evicted. This implementation introduces the JobTracker class which is now responsible for adding and evicting jobs from the job master. It encapsulates what was the mIdToCoordinator map, and also houses the FIFO queue that used to be the mFinishedJobs set. Closes #9874 pr-link: #9934 change-id: cid-c4495dfd0a353e90a154b29c640f15b66f3a4208
- Loading branch information
1 parent
6f40083
commit 7681087
Showing
10 changed files
with
559 additions
and
126 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.