-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scheduler] Add more config to control scheduling frequency/duration #2481
Conversation
…/armada into scheduler_loop_config
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #2481 +/- ##
==========================================
- Coverage 58.51% 58.48% -0.03%
==========================================
Files 235 235
Lines 29526 29566 +40
==========================================
+ Hits 17277 17293 +16
- Misses 10930 10952 +22
- Partials 1319 1321 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
// How often the job scheduling should run | ||
// This is expected to be a greater value than CyclePeriod as we don't need to schedule every cycle | ||
// This keeps the system more responsive as other operations happen in each cycle - such as state changes | ||
SchedulePeriod time.Duration `validate:"required"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not suggesting you add this now, but I think the validation framework is sophisticated to enforce the constraint that SchedulePeriod
> CyclePeriod
cyclePeriod time.Duration | ||
// Minimum duration between Schedule() calls - calls that actually schedule new jobs. | ||
schedulingPeriod time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rename this to schedulePeriod
to match the config value?
if err != nil { | ||
return err | ||
} | ||
if s.clock.Now().Sub(s.previousSchedulingRoundEnd) > s.schedulingPeriod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's log (maybe at debug) about whether we decided to schedule or not and why. It'll come in useful if we ever end up not scheduling and we can't work out why.
if executor.Id <= l.previousScheduleClusterId { | ||
continue | ||
} | ||
|
||
log.Infof("scheduling on %s", executor.Id) | ||
schedulerResult, sctx, err := l.scheduleOnExecutor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to have ended up with one variable called sctx
and another called schedCtx
. Is it possible to rename at least one of these to distinguish between them!
|
||
allExecutorsConsidered := false | ||
for i, executor := range executorsToSchedule { | ||
if schedCtx.Err() != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we record why we exited- e.g. either because we ran out of time or because we scheduled all executors? If not, how difficult would it be to add? Possibly this is something we can do when we do prom metrics
break | ||
} | ||
|
||
if i+1 == len(executorsToSchedule) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't work quite how I imagined it. I'd assumed we'd always try and schedule at least len(executorsToSchedule)
. I.e. assume we have 4 executors, I would asssume:
Scheduling round 1: schedules on 1,2,3 (times out)
Scheduling round 2: schedules on 4,1,2,3 (finishes after scheduling on n executors)
Whereas I think this code would do:
Scheduling round 1: schedules on 1,2,3 (times out)
Scheduling round 2: schedules on 4 (finishes after we reach the end of the slice)
┆Issue is synchronized with this Jira Task by Unito