New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add connectionbroker package #1850

Merged
merged 1 commit into from Jan 9, 2017

Conversation

Projects
None yet
3 participants
@aaronlehmann
Collaborator

aaronlehmann commented Jan 9, 2017

This package is a small abstraction on top of Remotes that provides a
gRPC connection to a manager. If running on a manager, it uses the unix
socket, otherwise it will pick a remote manager using Remotes. This
will allow things like agent and certificate renewal to work even on a
single node with no TCP port bound.

This is a piece of #1826. The next PR will convert the CA and agent to
use connectionbroker.

@aaronlehmann aaronlehmann referenced this pull request Jan 9, 2017

Merged

Allow managers not to expose a remote API port #1826

6 of 6 tasks complete
@dperny
  • Would writing unit tests for this package be worthwhile?
  • Are you wanting to merge this PR before the work to integrate it is done, or is the PR just opened to get review? I see the original PR now.
// Remotes returns the remotes interface used by the broker, so the caller
// can make observations or see weights directly.
func (b *Broker) Remotes() remotes.Remotes {

This comment has been minimized.

@dperny

dperny Jan 9, 2017

Member

What's the use case for this method? It seems to me that this breaks separation of concerns.

@dperny

dperny Jan 9, 2017

Member

What's the use case for this method? It seems to me that this breaks separation of concerns.

This comment has been minimized.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

The agent needs access to Remotes to update the set of managers based on the list it gets from the dispatcher.

https://github.com/docker/swarmkit/pull/1826/files#diff-6ea70e0a4de93ebcb83000a028edc163L344

The alternative to exposing Remotes through this method is to implement Observe, Weights, and Remove as connectionbroker methods, and that feels like it would pollute the connectionbroker method set, instead of keeping that functionality specific to Remotes.

Or we could pass the underlying Remotes into the agent separately, but I don't like that much either.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

The agent needs access to Remotes to update the set of managers based on the list it gets from the dispatcher.

https://github.com/docker/swarmkit/pull/1826/files#diff-6ea70e0a4de93ebcb83000a028edc163L344

The alternative to exposing Remotes through this method is to implement Observe, Weights, and Remove as connectionbroker methods, and that feels like it would pollute the connectionbroker method set, instead of keeping that functionality specific to Remotes.

Or we could pass the underlying Remotes into the agent separately, but I don't like that much either.

This comment has been minimized.

@dperny

dperny Jan 9, 2017

Member

Sounds good to me.

@dperny

dperny Jan 9, 2017

Member

Sounds good to me.

@dperny

This comment has been minimized.

Show comment
Hide comment
@dperny

dperny Jan 9, 2017

Member

Looks good to me.

Member

dperny commented Jan 9, 2017

Looks good to me.

@dperny

dperny approved these changes Jan 9, 2017

// records a positive experience with the remote peer if success is true,
// otherwise it records a negative experience. If a local connection is in use,
// Close is a noop.
func (c *Conn) Close(success bool) error {

This comment has been minimized.

@dongluochen

dongluochen Jan 9, 2017

Contributor

It's a little counter-intuitive to close an "unsuccessful" connection. Would it be better to move if success logic to SelectRemote.

@dongluochen

dongluochen Jan 9, 2017

Contributor

It's a little counter-intuitive to close an "unsuccessful" connection. Would it be better to move if success logic to SelectRemote.

This comment has been minimized.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

Close won't be called when SelectRemote returns an error. The success parameter is determined later on. For example, if something tries an RPC and that RPC fails, success will be set to false when closing the connection so that a negative observation is made and future attempts will prefer a different remote.

I think Dial generally doesn't fail immediately (since Dial is an async call), but it's probably a good idea to automatically perform a negative observation if Dial returns an error. I'll add that.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

Close won't be called when SelectRemote returns an error. The success parameter is determined later on. For example, if something tries an RPC and that RPC fails, success will be set to false when closing the connection so that a negative observation is made and future attempts will prefer a different remote.

I think Dial generally doesn't fail immediately (since Dial is an async call), but it's probably a good idea to automatically perform a negative observation if Dial returns an error. I'll add that.

This comment has been minimized.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

Added an observation if Dial fails.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

Added an observation if Dial fails.

This comment has been minimized.

@dongluochen

dongluochen Jan 9, 2017

Contributor

Can success be internal to Conn, like derived from operation failure on Conn?

@dongluochen

dongluochen Jan 9, 2017

Contributor

Can success be internal to Conn, like derived from operation failure on Conn?

This comment has been minimized.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

Unfortunately not, because Conn doesn't know whether RPCs that use it fail or not.

We could split it out into a separate Observe call on Conn. But I think it's nice to have it built into Close. That way, the use of a connection has to be either successful or unsuccessful. You can't forget to call Observe (we've had bugs like this before).

If it's weird for Close to take a parameter, we could rename this to something else like CloseAndObserve.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

Unfortunately not, because Conn doesn't know whether RPCs that use it fail or not.

We could split it out into a separate Observe call on Conn. But I think it's nice to have it built into Close. That way, the use of a connection has to be either successful or unsuccessful. You can't forget to call Observe (we've had bugs like this before).

If it's weird for Close to take a parameter, we could rename this to something else like CloseAndObserve.

This comment has been minimized.

@dongluochen

dongluochen Jan 9, 2017

Contributor

because Conn doesn't know whether RPCs that use it fail or not.

Could this be moved to where failure is observed? If Conn cannot observe failures, maybe weight handling shouldn't be done here. How about doing it at RPC calls? When RPC fails, decrease the node's weight. On RPC success, adjust weight according to previous state (no need to increase weight on continuous success operations).

@dongluochen

dongluochen Jan 9, 2017

Contributor

because Conn doesn't know whether RPCs that use it fail or not.

Could this be moved to where failure is observed? If Conn cannot observe failures, maybe weight handling shouldn't be done here. How about doing it at RPC calls? When RPC fails, decrease the node's weight. On RPC success, adjust weight according to previous state (no need to increase weight on continuous success operations).

This comment has been minimized.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

Could this be moved to where failure is observed?

We can do that, but as I mentioned above, I think it's better this way. We had a several bugs before where certain code paths returned early and skipped adjusting the weight. If weight adjustment is part of closing the connection, it can't be skipped accidentally without leaking a connection.

@aaronlehmann

aaronlehmann Jan 9, 2017

Collaborator

Could this be moved to where failure is observed?

We can do that, but as I mentioned above, I think it's better this way. We had a several bugs before where certain code paths returned early and skipped adjusting the weight. If weight adjustment is part of closing the connection, it can't be skipped accidentally without leaking a connection.

This comment has been minimized.

@dongluochen

dongluochen Jan 9, 2017

Contributor

Make sense. We have similar problem in class Swarm where operation failure should adjust node preference. It's hard to do that on all the calls.

@dongluochen

dongluochen Jan 9, 2017

Contributor

Make sense. We have similar problem in class Swarm where operation failure should adjust node preference. It's hard to do that on all the calls.

Add connectionbroker package
This package is a small abstraction on top of Remotes that provides a
gRPC connection to a manager. If running on a manager, it uses the unix
socket, otherwise it will pick a remote manager using Remotes. This
will allow things like agent and certificate renewal to work even on a
single node with no TCP port bound.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen Jan 9, 2017

Contributor

LGTM

Contributor

dongluochen commented Jan 9, 2017

LGTM

@aaronlehmann aaronlehmann merged commit 387bc93 into docker:master Jan 9, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/project 54.39% (target 0.00%)
Details
dco-signed All commits are signed

@aaronlehmann aaronlehmann deleted the aaronlehmann:connectionbroker branch Jan 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment