[scheduler] Add more config to control scheduling frequency/duration #2481

JamesMurkin · 2023-05-18T11:38:15Z

┆Issue is synchronized with this Jira Task by Unito

…/armada into scheduler_loop_config

codecov · 2023-05-18T14:51:00Z

Codecov Report

Patch coverage: 87.09% and project coverage change: -0.03 ⚠️

Comparison is base (517d946) 58.51% compared to head (322de95) 58.48%.

❗ Current head 322de95 differs from pull request most recent head 4f4d09e. Consider uploading reports for the commit 4f4d09e to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2481      +/-   ##
==========================================
- Coverage   58.51%   58.48%   -0.03%     
==========================================
  Files         235      235              
  Lines       29526    29566      +40     
==========================================
+ Hits        17277    17293      +16     
- Misses      10930    10952      +22     
- Partials     1319     1321       +2

Flag	Coverage Δ
armada-server	`58.48% <87.09%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
internal/scheduler/schedulerapp.go	`0.00% <0.00%> (ø)`
internal/scheduler/scheduler.go	`77.79% <89.65%> (+0.06%)`	⬆️
internal/scheduler/scheduling_algo.go	`67.91% <90.32%> (+1.98%)`	⬆️

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

d80tb7 · 2023-05-19T06:47:40Z

internal/scheduler/configuration/configuration.go

+	// How often the job scheduling should run
+	// This is expected to be a greater value than CyclePeriod as we don't need to schedule every cycle
+	// This keeps the system more responsive as other operations happen in each cycle - such as state changes
+	SchedulePeriod time.Duration `validate:"required"`


not suggesting you add this now, but I think the validation framework is sophisticated to enforce the constraint that SchedulePeriod > CyclePeriod

d80tb7 · 2023-05-19T06:48:32Z

internal/scheduler/scheduler.go

 	cyclePeriod time.Duration
+	// Minimum duration between Schedule() calls - calls that actually schedule new jobs.
+	schedulingPeriod time.Duration


should we rename this to schedulePeriod to match the config value?

d80tb7 · 2023-05-19T06:50:23Z

internal/scheduler/scheduler.go

-	if err != nil {
-		return err
-	}
+	if s.clock.Now().Sub(s.previousSchedulingRoundEnd) > s.schedulingPeriod {


let's log (maybe at debug) about whether we decided to schedule or not and why. It'll come in useful if we ever end up not scheduling and we can't work out why.

d80tb7 · 2023-05-19T06:54:29Z

internal/scheduler/scheduling_algo.go

+		if executor.Id <= l.previousScheduleClusterId {
+			continue
+		}
+
 		log.Infof("scheduling on %s", executor.Id)
 		schedulerResult, sctx, err := l.scheduleOnExecutor(


we seem to have ended up with one variable called sctx and another called schedCtx. Is it possible to rename at least one of these to distinguish between them!

d80tb7 · 2023-05-19T06:55:40Z

internal/scheduler/scheduling_algo.go

+
+	allExecutorsConsidered := false
+	for i, executor := range executorsToSchedule {
+		if schedCtx.Err() != nil {


do we record why we exited- e.g. either because we ran out of time or because we scheduled all executors? If not, how difficult would it be to add? Possibly this is something we can do when we do prom metrics

d80tb7 · 2023-05-19T07:49:34Z

internal/scheduler/scheduling_algo.go

+			break
+		}
+
+		if i+1 == len(executorsToSchedule) {


this doesn't work quite how I imagined it. I'd assumed we'd always try and schedule at least len(executorsToSchedule). I.e. assume we have 4 executors, I would asssume:

Scheduling round 1: schedules on 1,2,3 (times out) Scheduling round 2: schedules on 4,1,2,3 (finishes after scheduling on n executors)

Whereas I think this code would do:

Scheduling round 1: schedules on 1,2,3 (times out) Scheduling round 2: schedules on 4 (finishes after we reach the end of the slice)

JamesMurkin added 4 commits May 18, 2023 12:37

[scheduler] Add more config to control scheduling frequency/duration

e9cdd3f

wip

8e6f987

Add tests

43ecade

Merge branch 'master' into scheduler_loop_config

17c563f

JamesMurkin marked this pull request as ready for review May 18, 2023 14:21

JamesMurkin added 2 commits May 18, 2023 15:25

Improve comment

6fd6295

Merge branch 'scheduler_loop_config' of https://github.com/G-Research…

322de95

…/armada into scheduler_loop_config

d80tb7 reviewed May 19, 2023

View reviewed changes

d80tb7 approved these changes May 19, 2023

View reviewed changes

Merge branch 'master' into scheduler_loop_config

4f4d09e

JamesMurkin enabled auto-merge (squash) May 19, 2023 09:09

JamesMurkin merged commit 6732b4c into master May 19, 2023
21 of 22 checks passed

JamesMurkin deleted the scheduler_loop_config branch May 19, 2023 10:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[scheduler] Add more config to control scheduling frequency/duration #2481

[scheduler] Add more config to control scheduling frequency/duration #2481

JamesMurkin commented May 18, 2023 •

edited by sync-by-unito bot

codecov bot commented May 18, 2023 •

edited

d80tb7 May 19, 2023

d80tb7 May 19, 2023

d80tb7 May 19, 2023

d80tb7 May 19, 2023

d80tb7 May 19, 2023

d80tb7 May 19, 2023

[scheduler] Add more config to control scheduling frequency/duration #2481

[scheduler] Add more config to control scheduling frequency/duration #2481

Conversation

JamesMurkin commented May 18, 2023 • edited by sync-by-unito bot

codecov bot commented May 18, 2023 • edited

Codecov Report

d80tb7 May 19, 2023

Choose a reason for hiding this comment

d80tb7 May 19, 2023

Choose a reason for hiding this comment

d80tb7 May 19, 2023

Choose a reason for hiding this comment

d80tb7 May 19, 2023

Choose a reason for hiding this comment

d80tb7 May 19, 2023

Choose a reason for hiding this comment

d80tb7 May 19, 2023

Choose a reason for hiding this comment

JamesMurkin commented May 18, 2023 •

edited by sync-by-unito bot

codecov bot commented May 18, 2023 •

edited