Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manager: Always run the watch server #2323

Merged
merged 1 commit into from
Jul 20, 2017

Conversation

aaronlehmann
Copy link
Collaborator

The watch server was wrongly changed to only run on the leader node. It needs to run on all managers because this is one of the RPC services that is not proxied to the leader (since all nodes receive events through Raft).

This is regression introduced by #2310 (sorry). It never made it into the moby tree. It is covered by tests in that tree.

cc @cyli @aluzzardi

The watch server was wrongly changed to only run on the leader node. It
needs to run on all managers because this is one of the RPC services
that is not proxied to the leader (since all nodes receive events
through Raft).

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@codecov
Copy link

codecov bot commented Jul 20, 2017

Codecov Report

Merging #2323 into master will decrease coverage by 0.03%.
The diff coverage is 25%.

@@            Coverage Diff             @@
##           master    #2323      +/-   ##
==========================================
- Coverage   60.27%   60.23%   -0.04%     
==========================================
  Files         128      128              
  Lines       25972    25972              
==========================================
- Hits        15654    15645       -9     
- Misses       8918     8946      +28     
+ Partials     1400     1381      -19

@@ -491,6 +491,10 @@ func (m *Manager) Run(parent context.Context) error {
healthServer.SetServingStatus("Raft", api.HealthCheckResponse_NOT_SERVING)
localHealthServer.SetServingStatus("ControlAPI", api.HealthCheckResponse_NOT_SERVING)

if err := m.watchServer.Start(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: probably doesn't matter much, but would it make sense to start this after the raft node has started up and joined (when the other watches are set)? Otherwise if something starts watching right away, would they get the events from the raft node starting up and loading all of its state from disk?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, where in the code would you suggest? I don't think the end of this startup process is signaled to higher-level code, but I may be forgetting something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, there's no guarantee or anything. But we start the other watches at https://github.com/docker/swarmkit/pull/2323/files#diff-8077df928eb040c7c69eea83f15e3c9dL542, after we've finished loading raft state from disk and we've observed a leader and cluster state - possibly starting the watch server can also happen at that time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion, but I'm worried it would trigger the error here, which is not recoverable:

https://github.com/moby/moby/blob/8d703b98b5c403743bf17e22395e32a7271b8d3c/daemon/cluster/noderunner.go#L212-L215

I'd prefer to just fix the regression in this PR, and a future change that's coordinated with some added retry/backoff in moby/moby could delay starting the watch server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, sounds good.

@cyli
Copy link
Contributor

cyli commented Jul 20, 2017

LGTM!

@aaronlehmann aaronlehmann merged commit 51c1f1f into moby:master Jul 20, 2017
@aaronlehmann aaronlehmann deleted the watch-server-availability branch July 20, 2017 18:36
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Feb 3, 2020
- moby/swarmkit#2323 (fix for watch server being run only on leader)

Signed-off-by: Ying <ying.li@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Feb 3, 2020
- moby/swarmkit#2309 (updating the service spec version when rolling back)
- moby/swarmkit#2310 (fix for slow swarm shutdown)
- moby/swarmkit#2323 (run watchapi server on all managers)

Signed-off-by: Ying <ying.li@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Feb 3, 2020
- moby/swarmkit#2309 (updating the service spec version when rolling back)
- moby/swarmkit#2310 (fix for slow swarm shutdown)
- moby/swarmkit#2323 (run watchapi server on all managers)

Signed-off-by: Ying <ying.li@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Feb 3, 2020
- moby/swarmkit#2323 (fix for watch server being run only on leader)

Signed-off-by: Ying <ying.li@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 10, 2020
- moby/swarmkit#2309 (updating the service spec version when rolling back)
- moby/swarmkit#2310 (fix for slow swarm shutdown)
- moby/swarmkit#2323 (run watchapi server on all managers)

Signed-off-by: Ying <ying.li@docker.com>
glours pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 11, 2020
- moby/swarmkit#2323 (fix for watch server being run only on leader)

Signed-off-by: Ying <ying.li@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 17, 2020
- moby/swarmkit#2323 (fix for watch server being run only on leader)

Signed-off-by: Ying <ying.li@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 23, 2020
- moby/swarmkit#2323 (fix for watch server being run only on leader)

Signed-off-by: Ying <ying.li@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 23, 2020
- moby/swarmkit#2309 (updating the service spec version when rolling back)
- moby/swarmkit#2310 (fix for slow swarm shutdown)
- moby/swarmkit#2323 (run watchapi server on all managers)

Signed-off-by: Ying <ying.li@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants