Minor changes to increase the performance of nodeDb ScheduleMany #2318

d80tb7 · 2023-04-02T20:18:08Z

severinson JamesMurkin

ScheduleMany can be quite slow in the case where there are a large number of nodes and jobs cannot be assigned to any node. Here's a very simple benchmark:

func TestScheduleManyPerf(t *testing.T) {
	nodes := testNCpuNode(1000, testPriorities)
	nodeDb, err := createNodeDb(nodes)
	require.NoError(t, err)

	reqs := testNLargeCpuJob("foo", 0, 4000)
	for _, r := range reqs {
		_, _, err = nodeDb.ScheduleMany([]*schedulerobjects.PodRequirements{r})
		require.NoError(t, err)
	}
}

This benchmark create a 1000 node cluster and then tries to schedule 4000 jobs on each node. The first 1K jobs will fill up the cluster, the remaining 3K jobs will all be unable to schedule due to no free resource. On my laptop the current master code runs this in 13.5 seconds.

The changes in this PR, reduce this to 9 Seconds with no external changes to behaviour. Specifically:

node.AvailableQuantityByPriorityAndResource now uses r.AsApproximateFloat64() rather than armadaresource.QuantityAsFloat64 the former is significantly faster and allocates far less. This should be safe because armadaresource.QuantityAsFloat64 exists for the cases where we overflow the float value, which should not occur on a single node.
InsufficientResources.String() uses a string concat rather than fmt.Sprintf as the former is much faster and this code this ends up getting called in a tight loop

Note that there is much more that we can optimise about this use case but these are the changes that are completely non-invasive. I'll raise a discussion for further optimisations that will require larger and/or functional changes

Update: #2319 for discussion on bigger changes we can make to improve this.

┆Issue is synchronized with this Jira Task by Unito

codecov · 2023-04-02T20:26:56Z

Codecov Report

Patch coverage has no change and project coverage change: +0.02 🎉

Comparison is base (ec82b5c) 58.97% compared to head (b812cf3) 59.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2318      +/-   ##
==========================================
+ Coverage   58.97%   59.00%   +0.02%     
==========================================
  Files         225      225              
  Lines       28468    28470       +2     
==========================================
+ Hits        16790    16799       +9     
+ Misses      10386    10379       -7     
  Partials     1292     1292

Flag	Coverage Δ
armada-server	`59.00% <0.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
internal/scheduler/schedulerobjects/node.go	`0.00% <0.00%> (ø)`
...nternal/scheduler/schedulerobjects/nodematching.go	`84.16% <0.00%> (-0.71%)`	⬇️

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Chris Martin added 2 commits April 2, 2023 21:06

optimisations

2409d30

linting

c0e8531

d80tb7 and others added 3 commits April 3, 2023 21:54

Merge branch 'master' into f/chrisma/perf

12ae6a0

Merge branch 'master' into f/chrisma/perf

97de8cb

Merge branch 'master' into f/chrisma/perf

b812cf3

JamesMurkin approved these changes Apr 13, 2023

View reviewed changes

JamesMurkin merged commit c9456e7 into master Apr 13, 2023

JamesMurkin deleted the f/chrisma/perf branch April 13, 2023 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor changes to increase the performance of nodeDb ScheduleMany #2318

Minor changes to increase the performance of nodeDb ScheduleMany #2318

d80tb7 commented Apr 2, 2023 •

edited by sync-by-unito bot

Loading

codecov bot commented Apr 2, 2023 •

edited

Loading

Minor changes to increase the performance of nodeDb ScheduleMany #2318

Minor changes to increase the performance of nodeDb ScheduleMany #2318

Conversation

d80tb7 commented Apr 2, 2023 • edited by sync-by-unito bot Loading

codecov bot commented Apr 2, 2023 • edited Loading

Codecov Report

d80tb7 commented Apr 2, 2023 •

edited by sync-by-unito bot

Loading

codecov bot commented Apr 2, 2023 •

edited

Loading