Avoid copying nodes where possible #2621

zuqq · 2023-06-28T18:31:23Z

Concretely, there are two situations in which we can avoid intermediate copies of nodes:

When constructing a NodeDb, we were making a separate call to BindJobToNode for each already-running job (which meant that we had to copy a Node for each already-running job); instead, we now first group the slice of already-running jobs according to the nodes that they're running on, and then only copy each node at most once.

I added a benchmark for this case, which is much faster now (after b3ac7c4, it runs in a few seconds on my machine; I don't know how long it took before that because I haven't had the patience to let it run for more than 15 minutes).

When evicting jobs at the start of the scheduling round, we were making a separate call to EvictJobFromNode for each preemptible job (which, again, meant that we had to copy a Node for each preemptible job); instead, we now evict all preemptible jobs on a given node in one go.

I already attempted to make this change in Avoid repeatedly copying node when evicting jobs #2590, but this PR comes with a cleaner API (no in-place node operations are exported anymore). As in the previous PR, this change makes a noticeable difference when running go test ./internal/scheduler -bench='^BenchmarkPreemptingQueueScheduler$' -count=10 -run='^$':

goos: darwin
goarch: amd64
pkg: github.com/armadaproject/armada/internal/scheduler
cpu: Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
                                                            │   old.txt    │                new.txt                │
                                                            │    sec/op    │    sec/op      vs base                │
PreemptingQueueScheduler/100_nodes_1_queue_32000_jobs-4       541.0m ±  8%   414.8m ± 137%  -23.33% (p=0.023 n=10)
PreemptingQueueScheduler/100_nodes_10_queues_32000_jobs-4     802.0m ± 17%   529.3m ±   9%  -34.00% (p=0.000 n=10)
PreemptingQueueScheduler/1000_nodes_1_queue_320000_jobs-4      7.651 ± 35%    5.123 ±  12%  -33.05% (p=0.001 n=10)
PreemptingQueueScheduler/1000_nodes_10_queues_320000_jobs-4    5.572 ± 47%    4.132 ±  25%  -25.85% (p=0.001 n=10)
PreemptingQueueScheduler/1_node_1_queue_320_jobs-4            4.762m ±  6%   3.525m ±   4%  -25.99% (p=0.000 n=10)
PreemptingQueueScheduler/1_node_10_queues_320_jobs-4          5.792m ±  6%   4.256m ±   5%  -26.51% (p=0.000 n=10)
PreemptingQueueScheduler/10_nodes_1_queue_3200_jobs-4         49.45m ±  2%   38.04m ±   3%  -23.08% (p=0.000 n=10)
PreemptingQueueScheduler/10_nodes_10_queues_3200_jobs-4       62.23m ±  6%   46.07m ±   2%  -25.97% (p=0.000 n=10)
geomean                                                       188.1m         136.7m         -27.32%

codecov · 2023-06-28T18:39:56Z

Codecov Report

Patch coverage: 74.11% and project coverage change: +0.05 🎉

Comparison is base (295390f) 58.70% compared to head (c868156) 58.76%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2621      +/-   ##
==========================================
+ Coverage   58.70%   58.76%   +0.05%     
==========================================
  Files         238      238              
  Lines       30462    30477      +15     
==========================================
+ Hits        17883    17909      +26     
+ Misses      11230    11219      -11     
  Partials     1349     1349

Flag	Coverage Δ
armada-server	`58.76% <74.11%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
internal/armada/server/lease.go	`7.47% <0.00%> (+0.12%)`	⬆️
internal/scheduler/context/context.go	`30.71% <0.00%> (ø)`
internal/scheduler/nodedb/nodedb.go	`62.33% <72.00%> (+1.17%)`	⬆️
internal/scheduler/scheduling_algo.go	`73.36% <81.81%> (+0.14%)`	⬆️
internal/scheduler/preempting_queue_scheduler.go	`64.95% <85.71%> (+0.16%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

zuqq · 2023-06-28T18:37:17Z

internal/armada/server/lease.go

-		if skipNode {
-			continue
+		if node, err = nodedb.BindJobsToNode(q.schedulingConfig.Preemption.PriorityClasses, jobs, node); err != nil {
+			return nil, err


This behavior is now the same as in the new scheduler.

zuqq · 2023-06-28T18:39:16Z

internal/scheduler/nodedb/nodedb.go


+	if _, ok := node.AllocatableByPriorityAndResource[evictedPriority]; !ok {


We can get rid of this check once we stop storing Node values in the node database.

zuqq · 2023-06-28T18:42:24Z

internal/scheduler/scheduling_algo_test.go

+					jobs[j] = jobs[j].WithNewRun("executor-01", node.Name)
+				}
+			}
+			rand.Shuffle(len(jobs), func(i, j int) { jobs[i], jobs[j] = jobs[j], jobs[i] })


I'm shuffling this so that grouping jobs by node (which we didn't have to do before) is slightly harder; I haven't actually checked if it makes a big difference.

severinson · 2023-06-29T08:55:03Z

internal/scheduler/nodedb/nodedb.go

+//   - Within AllocatableByPriorityAndResource, the resources allocated to these jobs are moved from
+//     the jobs' priorities to evictedPriority; they are not subtracted from AllocatedByJobId and
+//     AllocatedByQueue.
+func EvictJobsFromNode(


We could avoid some copies here by passing in a slice to which the evicted nodes are appended.

That's a good idea; will follow up.

severinson · 2023-06-29T08:56:02Z

internal/scheduler/nodedb/nodedb.go

-				pMin = p
-				ok = true
-			}
+	if jobFilter == nil {


I think it makes more sense to interpret nil as evict everything, i.e., always true.

Thanks for catching this, I didn't actually mean to change the semantics. I think that I was looking at this from the perspective of "we never pass in nil here, so this is just a defensive check", but I agree that nil can actually make sense here.

zuqq added 5 commits June 28, 2023 15:51

Rename CreateNodeDb

754b978

Change the signature of constructNodeDb

4b9389a

Add a benchmark for constructNodeDb

29c3e43

Add BindJobsToNode

b3ac7c4

Add EvictJobsFromNode

bd5a56f

zuqq commented Jun 28, 2023

View reviewed changes

Merge branch 'master' into zuqq/avoid-copying-nodes

9436852

severinson reviewed Jun 29, 2023

View reviewed changes

severinson previously approved these changes Jun 29, 2023

View reviewed changes

severinson and others added 3 commits June 29, 2023 10:02

Merge branch 'master' into zuqq/avoid-copying-nodes

052c5f3

Handle nil filters correctly

4c2e797

Fix merge

c868156

zuqq dismissed severinson’s stale review via c868156 June 29, 2023 09:50

severinson approved these changes Jun 29, 2023

View reviewed changes

zuqq merged commit aa84bf0 into master Jun 29, 2023

zuqq deleted the zuqq/avoid-copying-nodes branch June 29, 2023 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid copying nodes where possible #2621

Avoid copying nodes where possible #2621

zuqq commented Jun 28, 2023 •

edited

Loading

codecov bot commented Jun 28, 2023 •

edited

Loading

zuqq Jun 28, 2023

zuqq Jun 28, 2023

zuqq Jun 28, 2023

severinson Jun 29, 2023

zuqq Jun 29, 2023

severinson Jun 29, 2023

zuqq Jun 29, 2023


		if _, ok := node.AllocatableByPriorityAndResource[evictedPriority]; !ok {

Avoid copying nodes where possible #2621

Avoid copying nodes where possible #2621

Conversation

zuqq commented Jun 28, 2023 • edited Loading

codecov bot commented Jun 28, 2023 • edited Loading

Codecov Report

zuqq Jun 28, 2023

Choose a reason for hiding this comment

zuqq Jun 28, 2023

Choose a reason for hiding this comment

zuqq Jun 28, 2023

Choose a reason for hiding this comment

severinson Jun 29, 2023

Choose a reason for hiding this comment

zuqq Jun 29, 2023

Choose a reason for hiding this comment

severinson Jun 29, 2023

Choose a reason for hiding this comment

zuqq Jun 29, 2023

Choose a reason for hiding this comment

zuqq commented Jun 28, 2023 •

edited

Loading

codecov bot commented Jun 28, 2023 •

edited

Loading