Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jobs] Merge Jobs Support into Master #2925

Merged
merged 21 commits into from Jan 13, 2020
Merged

[Jobs] Merge Jobs Support into Master #2925

merged 21 commits into from Jan 13, 2020

Conversation

dperny
Copy link
Collaborator

@dperny dperny commented Jan 10, 2020

Over time, development of the Swarm Jobs feature has taken place on a separate feature-jobs branch, to avoid disrupting the downstream engine until the feature was complete. I'm comfortable, now, saying that Jobs is complete enough to merge into the swarmkit master branch.

Jobs integration with the engine is tracked in moby/moby#40307.

Adds protocol buffers for implementing Jobs in swarmkit.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Adds orchestrators for replicated and global jobs, and the basic tests.
This commit exists mostly to keep the Ginkgo in mostly its own commit.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Adds service reconciliation logic for the replicated jobs orchestrator.
This code does not function in production, and is not actually called
except from the tests.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Expands the skeleton structure of the global jobs orchestrator.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Refactors the replicated job orchestrator to make testing simpler, and
then adds initialization logic to it.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Signed-off-by: Drew Erny <drew.erny@docker.com>
Refactors the global job orchestrator along the same lines as the
replicated job orchestrator, in order to better decouple the
event-driven orchestrator logic from the reconciliation logic.

Signed-off-by: Drew Erny <drew.erny@docker.com>
It became evident in the process of writing the Global Jobs orchestrator
that the Orchestrators required by both Replicated and Global jobs are
essentially identical. This commit merges them into one combined
orchestrator, which dispatches to the appropriate Reconcilers to do the
actual work.

Unlike existing services, these orchestrators can be combined because
the requirements of jobs are much simpler than that of services.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Adds support to the controlapi for creating and updating job-mode
services. This still does not include correct plumbing to execute
job-type services.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Adds the jobs orchestrator to the swarmkit manager. Jobs orchestrator
will now start and run with the manager.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Adds the beginnings of the integration tests between the controlapi and
the jobs orchestrator. These tests don't actually check much more than
creation right now, as update handling for the jobs orchestrator is
pending, but further tests will be able to leverage the groundwork here
to a high degree.

Signed-off-by: Drew Erny <drew.erny@docker.com>
In order to make the jobs reconcilers work correctly with the restart
supervisor, they have been altered to never replace failed tasks
directly Replacing failed tasks is the purview of the restart
supervisor. The jobs reconcilers will only create new tasks when needed.

Additionally, this alters the behavior of the replicated job reconciler
with regards to slots -- each new task will get a new slot, and when the
job is completed, there will be a Completed task in each slot from 0 to
TotalCompletions-1.

Then, makes the tweaks necessary for the Restart Supervisor to support
Jobs, which are different from other services in that they deliberately
have a desired state of Completed.

Finally, wires up the replicated and global orchestrators to call the
restart supervisor to restart tasks that have failed.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Updates manager components to support jobs, which have a desired state
of Completed.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Updates the ListServiceStatuses RPC to work with jobs. Includes adding a
new field to the responses showing the number of completed Tasks in a
job.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Adds support for updating global jobs by adding code to shut down tasks
belonging to previous job iterations.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Adds support for updating replicated jobs by adding code to shut down
tasks belonging to previous job iterations.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Signed-off-by: Drew Erny <drew.erny@docker.com>
This reverts the commits from #2899, which shut down tasks of old
iterations. This is the wrong approach and the right approach will be
fixed in a later commit.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Addresses a couple of TODOs in the jobs orchestrator.

Most notably, updates the reconcilers to set the DesiredState of Tasks
belonging to older job iterations to Remove, which will cause them to be
cleaned up and deleted.

Signed-off-by: Drew Erny <drew.erny@docker.com>
Signed-off-by: Drew Erny <drew.erny@docker.com>
@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "feature-jobs" git@github.com:docker/swarmkit.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354456720
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@codecov
Copy link

codecov bot commented Jan 10, 2020

Codecov Report

Merging #2925 into master will increase coverage by 0.23%.
The diff coverage is 75.45%.

@@            Coverage Diff             @@
##           master    #2925      +/-   ##
==========================================
+ Coverage   61.58%   61.82%   +0.23%     
==========================================
  Files         139      142       +3     
  Lines       22616    22986     +370     
==========================================
+ Hits        13928    14210     +282     
- Misses       7207     7275      +68     
- Partials     1481     1501      +20

Removes code from the global job reconciler that prevented global jobs
from executing on newly created nodes. This behavior probably would not
have worked well in real-world use, and closes off a few use cases (like
using a global job to perform some initialization on newly-created
nodes).

Signed-off-by: Drew Erny <derny@mirantis.com>
@dperny
Copy link
Collaborator Author

dperny commented Jan 13, 2020

Alright, I'm merging it. Here we go.

@dperny dperny merged commit ef128ab into master Jan 13, 2020
@dperny dperny deleted the feature-jobs branch May 18, 2020 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants