Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid copying nodes where possible #2621

Merged
merged 9 commits into from
Jun 29, 2023
Merged

Avoid copying nodes where possible #2621

merged 9 commits into from
Jun 29, 2023

Conversation

zuqq
Copy link
Collaborator

@zuqq zuqq commented Jun 28, 2023

Concretely, there are two situations in which we can avoid intermediate copies of nodes:

  1. When constructing a NodeDb, we were making a separate call to BindJobToNode for each already-running job (which meant that we had to copy a Node for each already-running job); instead, we now first group the slice of already-running jobs according to the nodes that they're running on, and then only copy each node at most once.

    I added a benchmark for this case, which is much faster now (after b3ac7c4, it runs in a few seconds on my machine; I don't know how long it took before that because I haven't had the patience to let it run for more than 15 minutes).

  2. When evicting jobs at the start of the scheduling round, we were making a separate call to EvictJobFromNode for each preemptible job (which, again, meant that we had to copy a Node for each preemptible job); instead, we now evict all preemptible jobs on a given node in one go.

    I already attempted to make this change in Avoid repeatedly copying node when evicting jobs #2590, but this PR comes with a cleaner API (no in-place node operations are exported anymore). As in the previous PR, this change makes a noticeable difference when running go test ./internal/scheduler -bench='^BenchmarkPreemptingQueueScheduler$' -count=10 -run='^$':

    goos: darwin
    goarch: amd64
    pkg: github.com/armadaproject/armada/internal/scheduler
    cpu: Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
                                                                │   old.txt    │                new.txt                │
                                                                │    sec/op    │    sec/op      vs base                │
    PreemptingQueueScheduler/100_nodes_1_queue_32000_jobs-4       541.0m ±  8%   414.8m ± 137%  -23.33% (p=0.023 n=10)
    PreemptingQueueScheduler/100_nodes_10_queues_32000_jobs-4     802.0m ± 17%   529.3m ±   9%  -34.00% (p=0.000 n=10)
    PreemptingQueueScheduler/1000_nodes_1_queue_320000_jobs-4      7.651 ± 35%    5.123 ±  12%  -33.05% (p=0.001 n=10)
    PreemptingQueueScheduler/1000_nodes_10_queues_320000_jobs-4    5.572 ± 47%    4.132 ±  25%  -25.85% (p=0.001 n=10)
    PreemptingQueueScheduler/1_node_1_queue_320_jobs-4            4.762m ±  6%   3.525m ±   4%  -25.99% (p=0.000 n=10)
    PreemptingQueueScheduler/1_node_10_queues_320_jobs-4          5.792m ±  6%   4.256m ±   5%  -26.51% (p=0.000 n=10)
    PreemptingQueueScheduler/10_nodes_1_queue_3200_jobs-4         49.45m ±  2%   38.04m ±   3%  -23.08% (p=0.000 n=10)
    PreemptingQueueScheduler/10_nodes_10_queues_3200_jobs-4       62.23m ±  6%   46.07m ±   2%  -25.97% (p=0.000 n=10)
    geomean                                                       188.1m         136.7m         -27.32%
    

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Patch coverage: 74.11% and project coverage change: +0.05 🎉

Comparison is base (295390f) 58.70% compared to head (c868156) 58.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2621      +/-   ##
==========================================
+ Coverage   58.70%   58.76%   +0.05%     
==========================================
  Files         238      238              
  Lines       30462    30477      +15     
==========================================
+ Hits        17883    17909      +26     
+ Misses      11230    11219      -11     
  Partials     1349     1349              
Flag Coverage Δ
armada-server 58.76% <74.11%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
internal/armada/server/lease.go 7.47% <0.00%> (+0.12%) ⬆️
internal/scheduler/context/context.go 30.71% <0.00%> (ø)
internal/scheduler/nodedb/nodedb.go 62.33% <72.00%> (+1.17%) ⬆️
internal/scheduler/scheduling_algo.go 73.36% <81.81%> (+0.14%) ⬆️
internal/scheduler/preempting_queue_scheduler.go 64.95% <85.71%> (+0.16%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

if skipNode {
continue
if node, err = nodedb.BindJobsToNode(q.schedulingConfig.Preemption.PriorityClasses, jobs, node); err != nil {
return nil, err
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is now the same as in the new scheduler.


if _, ok := node.AllocatableByPriorityAndResource[evictedPriority]; !ok {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of this check once we stop storing Node values in the node database.

jobs[j] = jobs[j].WithNewRun("executor-01", node.Name)
}
}
rand.Shuffle(len(jobs), func(i, j int) { jobs[i], jobs[j] = jobs[j], jobs[i] })
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm shuffling this so that grouping jobs by node (which we didn't have to do before) is slightly harder; I haven't actually checked if it makes a big difference.

// - Within AllocatableByPriorityAndResource, the resources allocated to these jobs are moved from
// the jobs' priorities to evictedPriority; they are not subtracted from AllocatedByJobId and
// AllocatedByQueue.
func EvictJobsFromNode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid some copies here by passing in a slice to which the evicted nodes are appended.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea; will follow up.

pMin = p
ok = true
}
if jobFilter == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to interpret nil as evict everything, i.e., always true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I didn't actually mean to change the semantics. I think that I was looking at this from the perspective of "we never pass in nil here, so this is just a defensive check", but I agree that nil can actually make sense here.

severinson
severinson previously approved these changes Jun 29, 2023
@zuqq zuqq merged commit aa84bf0 into master Jun 29, 2023
@zuqq zuqq deleted the zuqq/avoid-copying-nodes branch June 29, 2023 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants