Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption fixes and improvements #2264

Merged
merged 21 commits into from
Mar 14, 2023
Merged

Conversation

severinson
Copy link
Contributor

@severinson severinson commented Mar 14, 2023

This pr changes how resources are accounted for in lease.go. We currently send resources allocated per queue separately. With this pr, we change it so that the server aggregates the resource usage per queue based on the jobs the executor has reported exists in the cluster. This is necessary to ensure resource accounting is correct now that we send all jobs to the executor with the same priority class; without this change, all resource usage is reported at the armada-default priority class (the class all jobs are marked as when sent to the executor now).

Removes the job deletion grace period. Jobs are now deleted immediately when calling delete instead of setting an expiry on the key. This is necessary to ensure preempted jobs don't factor into the resources used by a queue.

@@ -189,59 +188,6 @@ func TestDeleteQueuedJob(t *testing.T) {
})
}

func TestDeleteJobShouldSetJobObjectToExpire(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we rely on being able to return leases for deleted jobs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not conceptually but possibly inadvertently - I can think of edge cases where it could happen - however as long as the server treats it as a noop it should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@codecov
Copy link

codecov bot commented Mar 14, 2023

Codecov Report

Patch coverage: 19.82% and project coverage change: +0.02 🎉

Comparison is base (ab48d1f) 57.41% compared to head (3ed7c2b) 57.44%.

❗ Current head 3ed7c2b differs from pull request most recent head 33ad923. Consider uploading reports for the commit 33ad923 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2264      +/-   ##
==========================================
+ Coverage   57.41%   57.44%   +0.02%     
==========================================
  Files         223      223              
  Lines       27913    27946      +33     
==========================================
+ Hits        16027    16053      +26     
- Misses      10642    10649       +7     
  Partials     1244     1244              
Flag Coverage Δ
armada-server 57.44% <19.82%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
internal/armada/server/lease.go 7.53% <0.00%> (-0.35%) ⬇️
internal/common/eventutil/eventutil.go 38.19% <0.00%> (ø)
...ernal/armada/repository/apimessages/conversions.go 78.76% <24.24%> (-2.24%) ⬇️
internal/armada/repository/job.go 65.60% <64.28%> (+0.35%) ⬆️
internal/scheduler/legacyscheduler.go 70.44% <66.66%> (+0.06%) ⬆️
internal/scheduler/scheduler.go 78.21% <100.00%> (+0.14%) ⬆️
...nternal/scheduler/schedulerobjects/resourcelist.go 61.57% <100.00%> (+16.89%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -259,24 +205,6 @@ func TestDeleteWithSomeMissingJobs(t *testing.T) {
})
}

func TestReturnLeaseForDeletedJobShouldKeepJobDeleted(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this Test still pass with your changes?

Most likely this was added as previously I imagine that wasn't the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since making returning lease for deleted jobs a no-op, this passes.

}
}
}
if jobs, err := q.jobRepository.GetExistingJobsByIds(jobIdsToDelete); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section actually needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I've removed it.

JamesMurkin
JamesMurkin previously approved these changes Mar 14, 2023
@severinson severinson merged commit 6fb52d2 into master Mar 14, 2023
@severinson severinson deleted the severinson/preemption-fixes branch March 14, 2023 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants