Swarm Jobs CLI #2226

dperny · 2019-12-13T19:03:06Z

Swarm jobs is coming, and it needs the CLI designed.

Behavior

Currently, internally and from the API, Jobs are presented as a different Service mode. This is because they follow mostly the same rules as Services, with the key distinction that the Tasks spawned by a Job are designed to enter a terminal state of Complete and stop executing, instead of being restarted.

There are two modes for job, ReplicatedJob, and GlobalJob, which are analogous to Replicated and Global services.

A ReplicatedJob has two parameters: the maximum number of concurrent Tasks (MaxConcurrent) to execute (analogous to the Replicas parameter of a replicated Service) and the total number of desired completed Tasks (TotalCompletions). A running ReplicatedJob executes up to MaxConcurrent Tasks at the same time, and starts a new Task every time one is completed, until the sum of the Running and Completed tasks is equal to TotalCompletions.

A GlobalJob has no parameters, like a Global service, being primarily configured through its placement constraints. It executes on Task on each Node (that matches constraints), starting new Tasks only if a running Task fails (enters a terminal state other than Completed).

Jobs cannot be updated while in progress, only interrupted and restarted. Any chance to the ServiceSpec while a job is in progress will result in stopping all of that job's Tasks, and starting a whole new set of Tasks with the new spec. This behavior can be changed in the future, but is present for now. Because jobs cannot be updated, flags specifying update parameters have no effect.

To re-execute a job that has already been executed, the user can pass --force, which will cause the job Spec to change and result in a re-execution of all tasks. Each execution of a Job, whether fresh or interrupting, increments a JobIteration version number. This number is present on Tasks belonging to a Job, and is used to differentiate which Tasks belong to the current run of the job, and which Tasks belong to previous runs.

For ReplicatedJobs, the --replicas flag can be repurposed to mean MaxConcurrent. Additionally, if no other options are specified, the value of TotalCompletions would be set equal to the value of --replicas, meaning all Tasks desired for the job would execute simultaneously. ReplicatedJobs would require an additional flag, perhaps --total or --total-completions, which could specify the desired TotalCompletions independent of the value of --replicas.

When doing an ls of jobs, instead of displaying two numbers, Running over Desired Tasks, four numbers are needed to express the state of the job: the number of Running Tasks, the number of tasks desired to be Running, the number of Tasks Completed, and the number of Tasks desired to be completed. The resulting list table would look roughly like this:

ID            NAME     MODE           REPLICAS            IMAGE
c8wgl7q4ndfd  mybatch  replicated-job  5/5 (5/10 Complete) somejobimage
dmu1ept4cxcf  redis    replicated      3/3                 redis:3.0.6
iwe3278osahj  mongo    global          7/7                 mongo:3.3

From the CLI side, there is a smallish UI decision to make with how to represent jobs.
There are, as far as I can feel, three approaches

First Option: a `--job` flag

A flag, --job, could be added, which specifies that the service being executed is a Job-type service. This flag could be used in conjunction with the existing --mode flag to specify any of the now 4 possible modes. So, to create a job, the full command would be something like docker service create --job --mode replicated myimage

Second Option: Separate Modes

Instead of adding a --job flag, we express jobs as they are: different Service modes. This would simply add replicated-job and global-job as possible values to --mode. The command here would be docker service create --mode replicated-job myimage

Third Option: A `docker job` Sub-command

This is perhaps the most radical option, but has the opportunity to express the best UI. We would create a separate sub-command, importing most of the important bits from the existing docker service sub-command, and present jobs as a separate user interface object. docker service ls would only list Replicated and Global service, docker job ls would only list ReplicatedJobs and GlobalJobs. Any of the flags that are not applicable to jobs would not be present on the docker job sub-command, and the total-completions flag would not be present on the docker service sub-command. The command here would be docker job create --mode replicated myimage. In this case, the ls table would look slightly different:

ID            NAME       MODE        EXECUTING  PROGRESS  IMAGE
c8wgl7q4ndfd  mybatch    replicated  5/5        5/10      somejobimage
dmu1ept4cxcf  systask    global      0/0        6/6       someuser/updater:3.0.6
iwe3278osahj  backup     global      3/3        3/6       someuser/backup:3.3

Absent any dissenting opinion, I will go with Option 1 and add a --job flag

The text was updated successfully, but these errors were encountered:

thaJeztah · 2019-12-19T15:38:41Z

Need to give this some thought 🤗

docker service ls would only list Replicated and Global service, docker job ls would only list ReplicatedJobs and GlobalJobs.

One thing that could make this problematic (assuming behind the scenes they're all "services", is that it would not reflect "reality"; so docker job create foo would create a job. That job won't be visible in docker service ls, but docker service create foo would fail because a service with that name already exists.

thaJeztah · 2019-12-19T15:39:32Z

So, ideally things would really be namespaced, but that's of course a much bigger changed.

thaJeztah · 2019-12-19T15:40:27Z

/cc @docker/core-cli-maintainers for visibility

dovahcrow · 2020-08-15T06:43:30Z

Hi, any progress on the jobs cli?

silvin-lubecki · 2020-08-17T10:03:05Z

Hello @dovahcrow , this issue should be closed as the related PR has been merged #2262

thaJeztah · 2020-08-17T11:25:10Z

Yes, looks like this can be closed

rdehouss · 2020-09-16T14:03:49Z

Do we know which release of docker includes this new feature?

thaJeztah · 2020-09-16T14:31:21Z

Will be in the upcoming 20.xx release

GordonTheTurtle added the area/swarm label Dec 13, 2019

thaJeztah added the status/1-design-review label Dec 19, 2019

thaJeztah closed this as completed Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swarm Jobs CLI #2226

Swarm Jobs CLI #2226

dperny commented Dec 13, 2019

thaJeztah commented Dec 19, 2019

thaJeztah commented Dec 19, 2019

thaJeztah commented Dec 19, 2019

dovahcrow commented Aug 15, 2020

silvin-lubecki commented Aug 17, 2020

thaJeztah commented Aug 17, 2020

rdehouss commented Sep 16, 2020

thaJeztah commented Sep 16, 2020

Swarm Jobs CLI #2226

Swarm Jobs CLI #2226

Comments

dperny commented Dec 13, 2019

Behavior

First Option: a --job flag

Second Option: Separate Modes

Third Option: A docker job Sub-command

thaJeztah commented Dec 19, 2019

thaJeztah commented Dec 19, 2019

thaJeztah commented Dec 19, 2019

dovahcrow commented Aug 15, 2020

silvin-lubecki commented Aug 17, 2020

thaJeztah commented Aug 17, 2020

rdehouss commented Sep 16, 2020

thaJeztah commented Sep 16, 2020

First Option: a `--job` flag

Third Option: A `docker job` Sub-command