Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scheduler] Add more config to control scheduling frequency/duration #2481

Merged
merged 7 commits into from
May 19, 2023

Conversation

JamesMurkin
Copy link
Contributor

@JamesMurkin JamesMurkin commented May 18, 2023

┆Issue is synchronized with this Jira Task by Unito

@JamesMurkin JamesMurkin marked this pull request as ready for review May 18, 2023 14:21
@codecov
Copy link

codecov bot commented May 18, 2023

Codecov Report

Patch coverage: 87.09% and project coverage change: -0.03 ⚠️

Comparison is base (517d946) 58.51% compared to head (322de95) 58.48%.

❗ Current head 322de95 differs from pull request most recent head 4f4d09e. Consider uploading reports for the commit 4f4d09e to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2481      +/-   ##
==========================================
- Coverage   58.51%   58.48%   -0.03%     
==========================================
  Files         235      235              
  Lines       29526    29566      +40     
==========================================
+ Hits        17277    17293      +16     
- Misses      10930    10952      +22     
- Partials     1319     1321       +2     
Flag Coverage Δ
armada-server 58.48% <87.09%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
internal/scheduler/schedulerapp.go 0.00% <0.00%> (ø)
internal/scheduler/scheduler.go 77.79% <89.65%> (+0.06%) ⬆️
internal/scheduler/scheduling_algo.go 67.91% <90.32%> (+1.98%) ⬆️

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

// How often the job scheduling should run
// This is expected to be a greater value than CyclePeriod as we don't need to schedule every cycle
// This keeps the system more responsive as other operations happen in each cycle - such as state changes
SchedulePeriod time.Duration `validate:"required"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not suggesting you add this now, but I think the validation framework is sophisticated to enforce the constraint that SchedulePeriod > CyclePeriod

cyclePeriod time.Duration
// Minimum duration between Schedule() calls - calls that actually schedule new jobs.
schedulingPeriod time.Duration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename this to schedulePeriod to match the config value?

if err != nil {
return err
}
if s.clock.Now().Sub(s.previousSchedulingRoundEnd) > s.schedulingPeriod {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's log (maybe at debug) about whether we decided to schedule or not and why. It'll come in useful if we ever end up not scheduling and we can't work out why.

if executor.Id <= l.previousScheduleClusterId {
continue
}

log.Infof("scheduling on %s", executor.Id)
schedulerResult, sctx, err := l.scheduleOnExecutor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to have ended up with one variable called sctx and another called schedCtx. Is it possible to rename at least one of these to distinguish between them!


allExecutorsConsidered := false
for i, executor := range executorsToSchedule {
if schedCtx.Err() != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we record why we exited- e.g. either because we ran out of time or because we scheduled all executors? If not, how difficult would it be to add? Possibly this is something we can do when we do prom metrics

break
}

if i+1 == len(executorsToSchedule) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work quite how I imagined it. I'd assumed we'd always try and schedule at least len(executorsToSchedule). I.e. assume we have 4 executors, I would asssume:

Scheduling round 1: schedules on 1,2,3 (times out)
Scheduling round 2: schedules on 4,1,2,3 (finishes after scheduling on n executors)

Whereas I think this code would do:

Scheduling round 1: schedules on 1,2,3 (times out)
Scheduling round 2: schedules on 4 (finishes after we reach the end of the slice)

@JamesMurkin JamesMurkin enabled auto-merge (squash) May 19, 2023 09:09
@JamesMurkin JamesMurkin merged commit 6732b4c into master May 19, 2023
21 of 22 checks passed
@JamesMurkin JamesMurkin deleted the scheduler_loop_config branch May 19, 2023 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants