Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dominant resource fairness #2614

Merged
merged 12 commits into from
Jun 28, 2023
Merged

Conversation

severinson
Copy link
Contributor

@severinson severinson commented Jun 27, 2023

Do not merge yet. Should be merged after #2611
┆Issue is synchronized with this Jira Task by Unito

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Patch coverage: 17.80% and project coverage change: -0.01 ⚠️

Comparison is base (dac1cf3) 58.71% compared to head (4254943) 58.70%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2614      +/-   ##
==========================================
- Coverage   58.71%   58.70%   -0.01%     
==========================================
  Files         238      238              
  Lines       30397    30462      +65     
==========================================
+ Hits        17847    17883      +36     
- Misses      11195    11230      +35     
+ Partials     1355     1349       -6     
Flag Coverage Δ
armada-server 58.70% <17.80%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
internal/armada/server/lease.go 7.34% <0.00%> (-0.20%) ⬇️
internal/scheduler/context/context.go 30.71% <2.56%> (-2.29%) ⬇️
internal/scheduler/scheduling_algo.go 73.22% <66.66%> (-0.17%) ⬇️
internal/scheduler/preempting_queue_scheduler.go 64.78% <100.00%> (+0.22%) ⬆️
internal/scheduler/queue_scheduler.go 65.14% <100.00%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

internal/armada/configuration/types.go Outdated Show resolved Hide resolved
internal/scheduler/context/context.go Outdated Show resolved Hide resolved
internal/scheduler/context/context.go Outdated Show resolved Hide resolved
@@ -113,7 +113,11 @@ type SchedulingConfig struct {
DefaultJobTolerationsByResourceRequest map[string][]v1.Toleration
// Maximum number of times a job is retried before considered failed.
MaxRetries uint
// Weights used when computing fair share.
// Controls how fairness is calculated. Can be either AssetFairness or DominantResourceFairness.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list of possible values is bound to become out of date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once DRF is used everywhere, I want to remove asset fairness and leave DRF as the only option.

internal/armada/configuration/types.go Outdated Show resolved Hide resolved
Comment on lines 40 to 42
// Used to convert one resource into another when computing fair share.
// Only applies to DominantResourceFairness.
FairnessResourceMappingBySourceResource map[string]configuration.ResourceMapping
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used yet.

Also, it's not clear to me why resource mapping would only apply to DominantResourceFairness; the fractional GPU example you gave above seems like it also makes sense in the case of asset fairness.

internal/scheduler/context/context.go Outdated Show resolved Hide resolved
internal/scheduler/context/context.go Outdated Show resolved Hide resolved
@@ -348,7 +348,7 @@ func (it *CandidateGangIterator) fractionOfFairShareWithGctx(gctx *schedulercont
it.buffer.Zero()
it.buffer.Add(qctx.Allocated)
it.buffer.Add(gctx.TotalResourceRequests)
return qctx.FractionOfFairShareWithAllocation(it.buffer)
return qctx.TotalCostForQueueWithAllocation(it.buffer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method is now out of date, but this is obviously not a blocker.

@severinson severinson merged commit 6cc7e56 into master Jun 28, 2023
27 checks passed
@severinson severinson deleted the severinson/dominant-resource-fairness branch June 28, 2023 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants