Minor changes to increase the performance of nodeDb ScheduleMany #2318
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
severinson JamesMurkin
ScheduleMany
can be quite slow in the case where there are a large number of nodes and jobs cannot be assigned to any node. Here's a very simple benchmark:This benchmark create a 1000 node cluster and then tries to schedule 4000 jobs on each node. The first 1K jobs will fill up the cluster, the remaining 3K jobs will all be unable to schedule due to no free resource. On my laptop the current master code runs this in 13.5 seconds.
The changes in this PR, reduce this to 9 Seconds with no external changes to behaviour. Specifically:
node.AvailableQuantityByPriorityAndResource
now usesr.AsApproximateFloat64()
rather thanarmadaresource.QuantityAsFloat64
the former is significantly faster and allocates far less. This should be safe becausearmadaresource.QuantityAsFloat64
exists for the cases where we overflow the float value, which should not occur on a single node.InsufficientResources.String()
uses a string concat rather thanfmt.Sprintf
as the former is much faster and this code this ends up getting called in a tight loopNote that there is much more that we can optimise about this use case but these are the changes that are completely non-invasive. I'll raise a discussion for further optimisations that will require larger and/or functional changes
Update: #2319 for discussion on bigger changes we can make to improve this.
┆Issue is synchronized with this Jira Task by Unito