-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi: Add Balancer for round-robin redundancy #1078
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This should be quite nice for some services that use rpcclient
to connect to multiple backends.
This is just a first pass where I focused on style-related issues such as consistency and things that don't follow the Effective Go Guidelines.
I'll spend more time digging into the logic and testing it in subsequent reviews, but I wanted to go ahead and get most of these more minor things out of the way first.
Just an FYI that I will be doing some thorough testing and review on this after the upcoming release. While we always try to make sure everything is well tested and functioning before it goes to master, this is doubly the case when a release is looming, especially for things that have huge implications such as the RPC client upon which wallets, stakepools, and mining software rely. |
Thanks @davecgh , will wait for the feedback. |
hi @davecgh any luck :) ... |
Thanks @davecgh , working on it... |
Great. Thanks! Please either leave a comment on the PR or ping me via one of the chat apps once you've updated as github isn't the best about notifying on updates to PRs otherwise. |
Hi @davecgh , Incorporating davecgh’s feedback
|
ce6983d
to
2e7f3a7
Compare
Please enter the commit message for your changes. Lines starting
Updated comments and changed method names as per guidelines. Made one bugfix in NextConn() function where it was doing multiple lookups.
…ed the Check and setting flag to indicate need for client restart to balancer.close() - Added balancer.CloseAll() - Other formatting changes
- Balancer remembers conn used for a notification once used. Hence, no round-robin for these notification methods. - Moved Balancer interface to balancer.go - Fixed other comments from davecgh
Ran rpctest locally and goes fine
This change strikes me as being backwards. I would expect a load balancer to be created from many clients, and not a single client load balancing over multiple servers. It should be possible to create a load balancing client in another package using only the exported features of the rpcclient package. Creating a new type to handle load balanced requests also means we can choose to not introduce any of the RPC methods that keep state and can't be load balanced. Is there some reason I'm missing why it wasn't designed this way? |
Hi @jrick this was the base requirement for issue #884 . |
Hi @dnldd , have resolved all previous comments by @davecgh .
Reconnection:
|
noted, I plan on reviewing soon. Will keep you updated. |
here is the config I tested with: connCfg := &rpcclient.ConnConfig{
User: "user",
Pass: "pass,
DisableAutoReconnect: false,
Certificates: certs,
HostAddresses: []rpcclient.HostAddress{
rpcclient.HostAddress{
Host: "127.0.0.1:19110",
Endpoint: "ws",
},
rpcclient.HostAddress{
Host: "127.0.0.1:19111",
Endpoint: "ws",
},
rpcclient.HostAddress{
Host: "127.0.0.1:19109",
Endpoint: "ws",
},
}, I initially started with 2018-10-23 20:53:16.248 [INF] RPCC: Connection state update for host 127.0.0.1:19110: CONNECTING
2018-10-23 20:53:16.250 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:16.250 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:1
2018-10-23 20:53:16.250 [INF] RPCC: Connection state update for host 127.0.0.1:19111: CONNECTING
2018-10-23 20:53:16.250 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:16.250 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:1
2018-10-23 20:53:16.250 [INF] RPCC: Connection state update for host 127.0.0.1:19109: CONNECTING
2018-10-23 20:53:16.283 [INF] RPCC: Connection state update for host 127.0.0.1:19109: READY
2018-10-23 20:53:16.283 [INF] RPCC: Balancer: Established connection to RPC server 127.0.0.1:19109
2018-10-23 20:53:16.283 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:16.284 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:16.284 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:2
2018-10-23 20:53:16.284 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:16.284 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:2
2018-10-23 20:53:16.284 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:16.284 [INF] RPCC: Balancer: Will be using 127.0.0.1:19109 for notification: notifyblocks
2018-10-23 20:53:16.287 [INF] RPCC: NotifyBlocks: Registration Complete
2018-10-23 20:53:21.293 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:21.293 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:3
2018-10-23 20:53:21.293 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:21.293 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:3
2018-10-23 20:53:21.293 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:21.294 [INF] RPCC: Block count: 55353
2018-10-23 20:53:26.289 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:26.289 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:4
2018-10-23 20:53:26.289 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:26.289 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:4
2018-10-23 20:53:26.289 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:26.290 [INF] RPCC: Block count: 55353
2018-10-23 20:53:31.292 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:31.292 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:5
2018-10-23 20:53:31.292 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:31.292 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:5
2018-10-23 20:53:31.293 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:31.293 [INF] RPCC: Block count: 55353
2018-10-23 20:53:36.290 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:36.290 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:6
2018-10-23 20:53:36.291 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:36.291 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:6
2018-10-23 20:53:36.291 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:36.291 [INF] RPCC: Block count: 55353
2018-10-23 20:53:41.294 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:41.294 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:7
2018-10-23 20:53:41.294 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:41.294 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:7
2018-10-23 20:53:41.295 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:41.295 [INF] RPCC: Block count: 55353
2018-10-23 20:53:46.289 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:46.289 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:8
2018-10-23 20:53:46.289 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:46.289 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:8
2018-10-23 20:53:46.289 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:46.290 [INF] RPCC: Block count: 55353
2018-10-23 20:53:50.696 [ERR] RPCC: Websocket receive error from 127.0.0.1:19109: websocket: close 1006 (abnormal closure): unexpected EOF
2018-10-23 20:53:50.697 [INF] RPCC: Connection state update for host 127.0.0.1:19109: DISCONNECTED
2018-10-23 20:53:50.698 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:53:50.698 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 10s
2018-10-23 20:54:00.701 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:54:00.701 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 15s
2018-10-23 20:54:15.702 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:54:15.702 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 20s
2018-10-23 20:54:35.708 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:54:35.708 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 25s
2018-10-23 20:55:00.713 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:55:00.713 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 30s
2018-10-23 20:55:30.719 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:55:30.719 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 35s
2018-10-23 20:56:05.725 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:56:05.725 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 40s
2018-10-23 20:56:45.728 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:56:45.728 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 45s
2018-10-23 20:57:30.734 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:57:30.734 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 50s
... From the logs, it's continuously retrying I'm yet to peruse your updates on the comments, expect a review on that soon. |
Hi @dnldd , Do you want me to change the reconnect handler to work on these as well ? |
Yeah, let's do that. On a disconnection retry all host addresses. |
connections in the HostAddres list that are not in connected state.
hi @dnldd , have updated the code accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issues highlighted here are not exhaustive, use them to correct similar comment related issues throughout the pr. I noticed variable names were still being used in comments. I also do not think its necessary to refer to a connection as ws connection
particularly because of the abbreviation ws
, just connection
should be fine.
Will be testing the updates made shortly.
When all hosts are unavailable, the initial retry cycle does not back off.
50 retries for each host are exhausted in a matter of seconds as a result. Is there any reason why you're not backing off here as compared to a dropped connection? Here is an example:
|
Hi @dnldd , I assumed those which never got connected, need to be handled differently. |
Apologies for the delay. Its due to office load and will get back to this by 3rd week of dec. |
Retry on initial connection to follow backoff as during reconnect. Fixed few bugs for getting next connection during reconnects in progress.
during initial connection.
causing tests to timeout as code tries to reconnect.
Hi @dnldd have incorporated your suggestions. Please try with latest. |
Hi @dnldd, please let me know if you want me to make any more changes... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you built and ran this with go build -race
?
FYI - I have no plans on pulling this change into dcrwallet due to my aforementioned concerns of its design. Having multiple RPC servers to sync the wallet with is useful, but the syncer should have direct knowledge of each of these servers (similar to how the SPV syncer directly handles all connected peers). I was already planning on removing the rpcclient dependency from dcrwallet entirely for other reasons (it's already more complex to work with than it should be), but this would just accelerate that process. |
Hi @dajohi, had run with -race but it did not detect the code you pointed out in picker. |
This never really got any traction, since, as it turns out, all primary consumers wanted to control the connection management themselves. One of them in particular, the wallet, decided to move away from rpcclient altogether. Consequently, while I really am not happy about closing this without merging after all this effort and time, none of the consumers want it, and it would just end up being code that has to be maintained for no benefit. Closing. |
Added a RoundRobinBalancer to do a round robin over a list of
hosts using a picker. List of hosts to be set in connection config
thats used to create a new Client.
Changes in file roundrobin.go:
Picker
RoundrobinBalancer
state for a host
Changes in file infrastructure.go:
messages
message
and attempts to reconnect.
If all were disconnected, it would flag to call start()
This fixes rpcclient: Add round-robin redundancy. #884