Skip to content
This repository has been archived by the owner on Apr 2, 2023. It is now read-only.

Improve performance for Probabilistic priority queueing #344

Merged
merged 41 commits into from
Oct 27, 2021
Merged

Improve performance for Probabilistic priority queueing #344

merged 41 commits into from
Oct 27, 2021

Conversation

lenhattan86
Copy link
Collaborator

Description:

For each scheduling latency, we need to have a set of priorites fromt the pending tasks.
if we do it for every scheduling cycle, it is not scalable because the latency grows with respect o the number of tasks in task_store.

Instead of doing this for every scheduling cycle, we can pull the pending tasks every probabilistic_priority_assigner_task_fetch_interval. We don't need the accurate number because

  • we are doing probabilistic assignment, so even we get accurate data we still do not schedule a task.
  • The priority set is often much smaller than pending tasks because jobs often have more than 1 tasks and multiple jobs may have the same priority.
  • we never starve to-be-assigned tasks in this cycle because we take their priority into account.

Testing Done:

unit test.
integration test.
performance test.

Copy link
Contributor

@ridv ridv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. Maybe one place for potential weirdness.

@lenhattan86 lenhattan86 requested a review from ridv October 27, 2021 00:00
Copy link
Contributor

@ridv ridv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@lenhattan86 lenhattan86 merged commit d0174dd into aurora-scheduler:master Oct 27, 2021
@@ -54,7 +55,8 @@
private static final Logger LOG = LoggerFactory.
getLogger(ProbabilisticPriorityAssigner.class);

private final Storage storage;
private static Iterable<IScheduledTask> pendindTasks = new LinkedList<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pendingTasks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use an ArrayList here for better performance, this is on a fast path and the size of the list can get significant putting pressure on memory pages. while iterating

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what makes you use a static variable here?

@@ -131,8 +132,9 @@ boolean isScheduled(Set<Integer> prioritySet, int priority) {
}

@VisibleForTesting
Iterable<IScheduledTask> getPendingTasks() {
return Storage.Util.fetchTasks(storage, Query.unscoped().byStatus(ScheduleStatus.PENDING));
public static synchronized void fetchPendingTasks(Storage storage) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VisibleForTesting but I don't see any tests calling this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the need for this method at all in this class. Your PendingTasksFetcher should call Storage.Util.fetchTasks(storage, Query.unscoped().byStatus(ScheduleStatus.PENDING)); directly from its run() method, and the responsibility of fetchPendingTasks should be with that class, not this. The objects here should only use methods from that class. This class is responsible for assigning priorities, not for fetching tasks as a public method.


import org.apache.aurora.scheduler.storage.Storage;

public class TaskFetcher implements Runnable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to fetch only pending tasks, why is it named TaskFetcher?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be having multiple classes doing similar things. You should use a single executor to fetch these tasks with whatever filters are needed and use them from different classes. Having multiple executors for the same thing is not a good idea.


@Override
protected void startUp() {
executor.scheduleAtFixedRate(taskFetcher, 0, taskFetchIntervalMs, TimeUnit.MILLISECONDS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the taskFetchIntervalMs here?

@lenhattan86 lenhattan86 added this to the 0.26.0 milestone Oct 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants