-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid copying nodes where possible #2621
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #2621 +/- ##
==========================================
+ Coverage 58.70% 58.76% +0.05%
==========================================
Files 238 238
Lines 30462 30477 +15
==========================================
+ Hits 17883 17909 +26
+ Misses 11230 11219 -11
Partials 1349 1349
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
if skipNode { | ||
continue | ||
if node, err = nodedb.BindJobsToNode(q.schedulingConfig.Preemption.PriorityClasses, jobs, node); err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is now the same as in the new scheduler.
|
||
if _, ok := node.AllocatableByPriorityAndResource[evictedPriority]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of this check once we stop storing Node
values in the node database.
jobs[j] = jobs[j].WithNewRun("executor-01", node.Name) | ||
} | ||
} | ||
rand.Shuffle(len(jobs), func(i, j int) { jobs[i], jobs[j] = jobs[j], jobs[i] }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm shuffling this so that grouping jobs
by node (which we didn't have to do before) is slightly harder; I haven't actually checked if it makes a big difference.
// - Within AllocatableByPriorityAndResource, the resources allocated to these jobs are moved from | ||
// the jobs' priorities to evictedPriority; they are not subtracted from AllocatedByJobId and | ||
// AllocatedByQueue. | ||
func EvictJobsFromNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid some copies here by passing in a slice to which the evicted nodes are appended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea; will follow up.
internal/scheduler/nodedb/nodedb.go
Outdated
pMin = p | ||
ok = true | ||
} | ||
if jobFilter == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to interpret nil
as evict everything, i.e., always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this, I didn't actually mean to change the semantics. I think that I was looking at this from the perspective of "we never pass in nil here, so this is just a defensive check", but I agree that nil can actually make sense here.
Concretely, there are two situations in which we can avoid intermediate copies of nodes:
When constructing a
NodeDb
, we were making a separate call toBindJobToNode
for each already-running job (which meant that we had to copy aNode
for each already-running job); instead, we now first group the slice of already-running jobs according to the nodes that they're running on, and then only copy each node at most once.I added a benchmark for this case, which is much faster now (after b3ac7c4, it runs in a few seconds on my machine; I don't know how long it took before that because I haven't had the patience to let it run for more than 15 minutes).
When evicting jobs at the start of the scheduling round, we were making a separate call to
EvictJobFromNode
for each preemptible job (which, again, meant that we had to copy aNode
for each preemptible job); instead, we now evict all preemptible jobs on a given node in one go.I already attempted to make this change in Avoid repeatedly copying node when evicting jobs #2590, but this PR comes with a cleaner API (no in-place node operations are exported anymore). As in the previous PR, this change makes a noticeable difference when running
go test ./internal/scheduler -bench='^BenchmarkPreemptingQueueScheduler$' -count=10 -run='^$'
: