Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi: Add Balancer for round-robin redundancy #1078

Closed
wants to merge 27 commits into from

Conversation

maicalal
Copy link
Contributor

Added a RoundRobinBalancer to do a round robin over a list of
hosts using a picker. List of hosts to be set in connection config
thats used to create a new Client.
Changes in file roundrobin.go:

  • Picker

    • responsible to pick the next usable connection
  • RoundrobinBalancer

    • holds a picker instance to get the next connection
    • maintains a map for storing the current connection
      state for a host
    • maintains a map of socket connections by host

Changes in file infrastructure.go:

  • Client now has a reference to roundrobinbalancer
  • Changes to following workflows
    • Creating New client
      • Does a rrrb.Get() to have one ready connection
    • wsInHandler
      • iterates over all active connections to check for
        messages
    • wsOutHandler
      • picks a connection using balancer to send out
        message
    • wsReconnectHandler

Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This should be quite nice for some services that use rpcclient to connect to multiple backends.

This is just a first pass where I focused on style-related issues such as consistency and things that don't follow the Effective Go Guidelines.

I'll spend more time digging into the logic and testing it in subsequent reviews, but I wanted to go ahead and get most of these more minor things out of the way first.

rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Show resolved Hide resolved
rpcclient/roundrobin.go Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
@davecgh
Copy link
Member

davecgh commented Mar 5, 2018

Just an FYI that I will be doing some thorough testing and review on this after the upcoming release. While we always try to make sure everything is well tested and functioning before it goes to master, this is doubly the case when a release is looming, especially for things that have huge implications such as the RPC client upon which wallets, stakepools, and mining software rely.

@maicalal
Copy link
Contributor Author

maicalal commented Mar 6, 2018

Thanks @davecgh , will wait for the feedback.

@maicalal
Copy link
Contributor Author

hi @davecgh any luck :) ...

@davecgh davecgh added this to the 1.3.0 milestone Apr 30, 2018
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
@maicalal
Copy link
Contributor Author

maicalal commented May 8, 2018

Thanks @davecgh , working on it...

@davecgh
Copy link
Member

davecgh commented May 9, 2018

Great. Thanks! Please either leave a comment on the PR or ping me via one of the chat apps once you've updated as github isn't the best about notifying on updates to PRs otherwise.

@maicalal
Copy link
Contributor Author

maicalal commented May 13, 2018

Hi @davecgh ,
Comments got screwed up in my last checkin. Here is the same:

Incorporating davecgh’s feedback

  • WsInHandler to be per wsConn
  • Moved the Check and setting flag to indicate need for client restart to balancer.close()
  • Added balancer.CloseAll()
  • Other formatting changes

rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/roundrobin.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
rpcclient/infrastructure.go Show resolved Hide resolved
rpcclient/infrastructure.go Outdated Show resolved Hide resolved
@maicalal maicalal force-pushed the master branch 4 times, most recently from ce6983d to 2e7f3a7 Compare June 7, 2018 19:12
 Please enter the commit message for your changes. Lines starting
Updated comments and changed method names as
per guidelines.
Made one bugfix in NextConn() function where
it was doing multiple lookups.
…ed the Check and setting flag to indicate need for client restart to balancer.close() - Added balancer.CloseAll() - Other formatting changes
- Balancer remembers conn used for a notification once
used. Hence, no round-robin for these notification
methods.
- Moved Balancer interface to balancer.go
- Fixed other comments from davecgh
@jrick
Copy link
Member

jrick commented Jun 15, 2018

This change strikes me as being backwards. I would expect a load balancer to be created from many clients, and not a single client load balancing over multiple servers. It should be possible to create a load balancing client in another package using only the exported features of the rpcclient package. Creating a new type to handle load balanced requests also means we can choose to not introduce any of the RPC methods that keep state and can't be load balanced. Is there some reason I'm missing why it wasn't designed this way?

@maicalal
Copy link
Contributor Author

Hi @jrick this was the base requirement for issue #884 .
It made sense to have a pure client side balancer where it has multiple servers that client can connect to, so went ahead with the changes.
Moreover, a load balancer over multiple clients, would change its meaning and would be more like a message broker in a pub/sub paradigm.

@davecgh davecgh removed this from the 1.3.0 milestone Aug 1, 2018
@maicalal
Copy link
Contributor Author

maicalal commented Oct 10, 2018

Hi @dnldd , have resolved all previous comments by @davecgh .
Also have fixed the issues you faced.
The flow is as below:
Say con1 and con2 are specified to be used by balancer.
Initial connection (i.e. first successful connection):

  • code would make 50 attempts for each connection, then mark it as shutdown.

Reconnection:

  • after a successful initial connection, if either one of con1 and con2, disconnects,
    this would result into making reconnection attempts as per the settings.
  • during this phase, if there are no active connections, code will not execute any sendcmd (this is same as earlier, only difference is, earlier it would have used state of "disconnect" channel, but now we would need to rely on available active connections)

@dnldd
Copy link
Member

dnldd commented Oct 10, 2018

noted, I plan on reviewing soon. Will keep you updated.

@dnldd
Copy link
Member

dnldd commented Oct 23, 2018

here is the config I tested with:

	connCfg := &rpcclient.ConnConfig{
		User:                 "user",
		Pass:                 "pass,
		DisableAutoReconnect: false,
		Certificates:         certs,
		HostAddresses: []rpcclient.HostAddress{
			rpcclient.HostAddress{
				Host:     "127.0.0.1:19110",
				Endpoint: "ws",
			},
			rpcclient.HostAddress{
				Host:     "127.0.0.1:19111",
				Endpoint: "ws",
			},
			rpcclient.HostAddress{
				Host:     "127.0.0.1:19109",
				Endpoint: "ws",
			},
		},

I initially started with :19109(live), :19110(shutdown) and 19111(shutdown). I then killed :19109 and started :19110, resulting in :19109(shutdown), :19110(live) and 19111(shutdown). Here is the output for that:

2018-10-23 20:53:16.248 [INF] RPCC: Connection state update for host 127.0.0.1:19110: CONNECTING
2018-10-23 20:53:16.250 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:16.250 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:1
2018-10-23 20:53:16.250 [INF] RPCC: Connection state update for host 127.0.0.1:19111: CONNECTING
2018-10-23 20:53:16.250 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:16.250 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:1
2018-10-23 20:53:16.250 [INF] RPCC: Connection state update for host 127.0.0.1:19109: CONNECTING
2018-10-23 20:53:16.283 [INF] RPCC: Connection state update for host 127.0.0.1:19109: READY
2018-10-23 20:53:16.283 [INF] RPCC: Balancer: Established connection to RPC server 127.0.0.1:19109
2018-10-23 20:53:16.283 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:16.284 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:16.284 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:2
2018-10-23 20:53:16.284 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:16.284 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:2
2018-10-23 20:53:16.284 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:16.284 [INF] RPCC: Balancer: Will be using 127.0.0.1:19109 for notification: notifyblocks
2018-10-23 20:53:16.287 [INF] RPCC: NotifyBlocks: Registration Complete
2018-10-23 20:53:21.293 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:21.293 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:3
2018-10-23 20:53:21.293 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:21.293 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:3
2018-10-23 20:53:21.293 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:21.294 [INF] RPCC: Block count: 55353
2018-10-23 20:53:26.289 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:26.289 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:4
2018-10-23 20:53:26.289 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:26.289 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:4
2018-10-23 20:53:26.289 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:26.290 [INF] RPCC: Block count: 55353
2018-10-23 20:53:31.292 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:31.292 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:5
2018-10-23 20:53:31.292 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:31.292 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:5
2018-10-23 20:53:31.293 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:31.293 [INF] RPCC: Block count: 55353
2018-10-23 20:53:36.290 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:36.290 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:6
2018-10-23 20:53:36.291 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:36.291 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:6
2018-10-23 20:53:36.291 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:36.291 [INF] RPCC: Block count: 55353
2018-10-23 20:53:41.294 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:41.294 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:7
2018-10-23 20:53:41.294 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:41.294 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:7
2018-10-23 20:53:41.295 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:41.295 [INF] RPCC: Block count: 55353
2018-10-23 20:53:46.289 [INF] RPCC: Connection state update for host 127.0.0.1:19110: IDLE
2018-10-23 20:53:46.289 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused, attempt:8
2018-10-23 20:53:46.289 [INF] RPCC: Connection state update for host 127.0.0.1:19111: IDLE
2018-10-23 20:53:46.289 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused, attempt:8
2018-10-23 20:53:46.289 [INF] RPCC: Balancer: Connection pick for RPC 127.0.0.1:19109
2018-10-23 20:53:46.290 [INF] RPCC: Block count: 55353
2018-10-23 20:53:50.696 [ERR] RPCC: Websocket receive error from 127.0.0.1:19109: websocket: close 1006 (abnormal closure): unexpected EOF
2018-10-23 20:53:50.697 [INF] RPCC: Connection state update for host 127.0.0.1:19109: DISCONNECTED
2018-10-23 20:53:50.698 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:53:50.698 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 10s
2018-10-23 20:54:00.701 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:54:00.701 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 15s
2018-10-23 20:54:15.702 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:54:15.702 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 20s
2018-10-23 20:54:35.708 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:54:35.708 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 25s
2018-10-23 20:55:00.713 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:55:00.713 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 30s
2018-10-23 20:55:30.719 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:55:30.719 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 35s
2018-10-23 20:56:05.725 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:56:05.725 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 40s
2018-10-23 20:56:45.728 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:56:45.728 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 45s
2018-10-23 20:57:30.734 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-10-23 20:57:30.734 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 50s
...

From the logs, it's continuously retrying :19109 when at this point it should be trying to connect to :19109, :19110 and :19111 like it did on startup. On a disconnect cycle though all host addresses for a connection, if none is found you back off the retry cycle eg. Retrying all host addresses in 10s, 20s, 30s, 40s ...

I'm yet to peruse your updates on the comments, expect a review on that soon.

@maicalal
Copy link
Contributor Author

Hi @dnldd ,
The reconnect handler kicks in once a connection gets disconnected. As we can see in the logs
Retrying connection to 127.0.0.1:19109 in ...s
The connections for :19110 and :19111 never got connected, so they do not get into the reconnect workflow.

Do you want me to change the reconnect handler to work on these as well ?
This will be a small change. Didn't do it earlier because I thought they do not qualify, as they never got connected.

@dnldd
Copy link
Member

dnldd commented Oct 26, 2018

Yeah, let's do that. On a disconnection retry all host addresses.

connections in the HostAddres list that are not in connected
state.
@maicalal
Copy link
Contributor Author

maicalal commented Nov 6, 2018

hi @dnldd , have updated the code accordingly.
Please try with latest.

Copy link
Member

@dnldd dnldd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues highlighted here are not exhaustive, use them to correct similar comment related issues throughout the pr. I noticed variable names were still being used in comments. I also do not think its necessary to refer to a connection as ws connection particularly because of the abbreviation ws, just connection should be fine.

Will be testing the updates made shortly.

rpcclient/balancer.go Outdated Show resolved Hide resolved
rpcclient/balancer.go Outdated Show resolved Hide resolved
rpcclient/balancer.go Show resolved Hide resolved
rpcclient/balancer.go Outdated Show resolved Hide resolved
rpcclient/balancer.go Outdated Show resolved Hide resolved
rpcclient/balancer.go Show resolved Hide resolved
rpcclient/balancer.go Outdated Show resolved Hide resolved
rpcclient/balancer.go Outdated Show resolved Hide resolved
@dnldd
Copy link
Member

dnldd commented Nov 13, 2018

When all hosts are unavailable, the initial retry cycle does not back off.

2018-11-13 02:21:26.137 [INF] RPCC: Connection state update for host 127.0.0.1:9110: IDLE
2018-11-13 02:21:26.137 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9110: dial tcp 127.0.0.1:9110: connect: connection refused, attempt:1
2018-11-13 02:21:26.137 [INF] RPCC: Connection state update for host 127.0.0.1:9111: CONNECTING
2018-11-13 02:21:26.138 [INF] RPCC: Connection state update for host 127.0.0.1:9111: IDLE
2018-11-13 02:21:26.138 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9111: dial tcp 127.0.0.1:9111: connect: connection refused, attempt:1
2018-11-13 02:21:26.138 [INF] RPCC: Connection state update for host 127.0.0.1:9109: CONNECTING
2018-11-13 02:21:26.139 [INF] RPCC: Connection state update for host 127.0.0.1:9109: IDLE
2018-11-13 02:21:26.139 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9109: dial tcp 127.0.0.1:9109: connect: connection refused, attempt:1
2018-11-13 02:21:26.139 [INF] RPCC: Connection state update for host 127.0.0.1:9110: IDLE
2018-11-13 02:21:26.149 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9110: dial tcp 127.0.0.1:9110: connect: connection refused, attempt:2
2018-11-13 02:21:26.151 [INF] RPCC: Connection state update for host 127.0.0.1:9111: IDLE
2018-11-13 02:21:26.151 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9111: dial tcp 127.0.0.1:9111: connect: connection refused, attempt:2
...
2018-11-13 02:21:27.052 [INF] RPCC: Connection state update for host 127.0.0.1:9110: IDLE
2018-11-13 02:21:27.052 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9110: dial tcp 127.0.0.1:9110: connect: connection refused, attempt:47
2018-11-13 02:21:27.054 [INF] RPCC: Connection state update for host 127.0.0.1:9111: IDLE
2018-11-13 02:21:27.089 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9111: dial tcp 127.0.0.1:9111: connect: connection refused, attempt:47
2018-11-13 02:21:27.090 [INF] RPCC: Connection state update for host 127.0.0.1:9109: IDLE
2018-11-13 02:21:27.090 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9109: dial tcp 127.0.0.1:9109: connect: connection refused, attempt:47
2018-11-13 02:21:27.091 [INF] RPCC: Connection state update for host 127.0.0.1:9110: IDLE
2018-11-13 02:21:27.091 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9110: dial tcp 127.0.0.1:9110: connect: connection refused, attempt:48
2018-11-13 02:21:27.091 [INF] RPCC: Connection state update for host 127.0.0.1:9111: IDLE
2018-11-13 02:21:27.091 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9111: dial tcp 127.0.0.1:9111: connect: connection refused, attempt:48
2018-11-13 02:21:27.092 [INF] RPCC: Connection state update for host 127.0.0.1:9109: IDLE
2018-11-13 02:21:27.113 [INF] RPCC: Balancer: Failed to make initial connection to 127.0.0.1:9109: dial tcp 127.0.0.1:9109: connect: connection refused, attempt:48
...

50 retries for each host are exhausted in a matter of seconds as a result. Is there any reason why you're not backing off here as compared to a dropped connection? Here is an example:

...
2018-11-13 02:26:24.092 [INF] RPCC: Retrying connection to 127.0.0.1:19110 in 10s
2018-11-13 02:26:34.095 [INF] RPCC: Failed to connect to 127.0.0.1:19111: dial tcp 127.0.0.1:19111: connect: connection refused
2018-11-13 02:26:34.095 [INF] RPCC: Retrying connection to 127.0.0.1:19111 in 10s
2018-11-13 02:26:44.101 [INF] RPCC: Failed to connect to 127.0.0.1:19109: dial tcp 127.0.0.1:19109: connect: connection refused
2018-11-13 02:26:44.101 [INF] RPCC: Retrying connection to 127.0.0.1:19109 in 10s
2018-11-13 02:26:54.104 [INF] RPCC: Failed to connect to 127.0.0.1:19110: dial tcp 127.0.0.1:19110: connect: connection refused
2018-11-13 02:26:54.104 [INF] RPCC: Retrying connection to 127.0.0.1:19110 in 15s
...

@davecgh davecgh modified the milestones: 1.4.0, 1.5.0 Nov 14, 2018
@maicalal
Copy link
Contributor Author

Hi @dnldd , I assumed those which never got connected, need to be handled differently.
Looks like am wrong here.
Will fix the code to match the backoff strategy as in the case of dropped connections.
Working on it.

@maicalal
Copy link
Contributor Author

maicalal commented Dec 8, 2018

Apologies for the delay. Its due to office load and will get back to this by 3rd week of dec.

Retry on initial connection to follow
backoff as during reconnect.
Fixed few bugs for getting next connection
during reconnects in progress.
causing tests to timeout as code tries to reconnect.
@maicalal
Copy link
Contributor Author

Hi @dnldd have incorporated your suggestions. Please try with latest.

@maicalal
Copy link
Contributor Author

Hi @dnldd, please let me know if you want me to make any more changes...

rpcclient/roundrobin.go Outdated Show resolved Hide resolved
Copy link
Member

@dajohi dajohi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you built and ran this with go build -race?

@jrick
Copy link
Member

jrick commented Jan 18, 2019

FYI - I have no plans on pulling this change into dcrwallet due to my aforementioned concerns of its design. Having multiple RPC servers to sync the wallet with is useful, but the syncer should have direct knowledge of each of these servers (similar to how the SPV syncer directly handles all connected peers).

I was already planning on removing the rpcclient dependency from dcrwallet entirely for other reasons (it's already more complex to work with than it should be), but this would just accelerate that process.

@maicalal
Copy link
Contributor Author

maicalal commented Feb 1, 2019

have you built and ran this with go build -race?

Hi @dajohi, had run with -race but it did not detect the code you pointed out in picker.
Maybe because it goes into this if block only once, then it errors out saying no connection available.
Will checkin the changes in sometime.

@davecgh
Copy link
Member

davecgh commented Oct 26, 2019

This never really got any traction, since, as it turns out, all primary consumers wanted to control the connection management themselves. One of them in particular, the wallet, decided to move away from rpcclient altogether. Consequently, while I really am not happy about closing this without merging after all this effort and time, none of the consumers want it, and it would just end up being code that has to be maintained for no benefit. Closing.

@davecgh davecgh closed this Oct 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rpcclient: Add round-robin redundancy.
5 participants